Calling Football Stats People - Download Box Score Data

Submitted by stbowie on September 17th, 2014 at 9:52 PM

Does anyone know a good free source for downloading box score data? Primarily interested in this year's games, but somewhat interested in previous years as well. I'm looking to put together some statistical summaries of B1G games and would love to not have to enter data by hand if it's already out there (and free).

Thanks and Go Blue!

Comments

LSAClassOf2000

September 17th, 2014 at 10:07 PM ^

It depends on the scope really - if it is just Michigan games, then the stats archive goes back to 1949, I believe, but ESPN has gmes going back to the early 2000s if you search them. Most of the team pages are still active in their original format, I believe, so if you needed stats, you could copy and paste from their site like that. I think the team pages go back for several years as well.

stbowie

September 18th, 2014 at 7:49 AM ^

Primarily looking for current season B1G games, and as many UM games as I can get. Looks like stats archive will work great for older UM games and ESPN will work just fine for current season (and the last 10 years as well).

MCalibur

September 18th, 2014 at 12:10 AM ^

Unfortunately cfbstats is no longer free. Packages start at $750/year.Historic data costs $1500. I'd be willing to pay for what my man was doing but that is just too steep. I has sad.

Seth was saying that mgoblog had a project going where the the free service cfbstats was providing would reside here. Not sure where that stands though.

[EDIT: here's a USAToday story to what happened to cfbstats. Basically these guys were doing similar things and BIG TIMERS (coaching staffs, etc) were subsribing to their stuff. That data is being provided to the CFB Playoff Selection Committee. So, the gobbled up cfbstats and now they're charging for it. I'd love to be part of an open source project to develop a free parser in python or something. Whatever man, this sucks.]

Seth

September 18th, 2014 at 2:49 PM ^

They made Marty their resident nerd; it's just two brothers. They wanted us to get together with some other blogs to pay $7500 for stats and i was like: Uh no we will build our own.

SO WE ARE BUILDING OUR OWN

We're now in the parsing stage, where there's all sorts of hellish things to fix like drives that start over because of fumbles and whatnot. We'll have the data first, and eventually pages for every team with sensical stats (SACKS ARE PASS PLAYS!!!). We'll also be doing a lot of things marty (CFBStats) never did, like have the recruiting profiles linked to performance. Eventually. First is getting the PBP data to not suck.

Nate, the guy who jumped on this for us, should be posting soon.

ShoelaceToJunior46

September 18th, 2014 at 10:32 PM ^

Hey all, Nate here.

Sorry for the slow response, my day job as a security consultant here in Chicago was pretty crazy today.

So yes, we're building a scraper in Ruby that reads the raw PBP data from the HTML, JSON, and other files from stats.ncaa.org. That data gets parsed into CSV files that I will eventually be building into a relational database, front-ended by a Ruby on Rails application (maybe with some new hotness JavaScript framework that makes it easy to browse around).

Initial goals are just to recreate useful stats in a way that people can consume them without having to go through the hell of parsing the data like I have. Longer term goals are to come up with interesting queries that can be run against the data set, maybe create an API, etc. For those familiar with Rails, ActiveRecord relationships make this really nice and straightforward to do interesting things.

HOW CAN YOU HELP ME

A lot of you are already interested in this, asking how you can help, etc. first off, I've created an email that you can get me at for this project [email protected]... second, here's the list of things we need help with:

  1. The biggest thing we need right now is some QA testing... essentially, people who are willing to take a set of CSV files, go to the corresponding game page, and make sure our data looks reasonably accurate. The more people help out here, the more accurate our data, the more things I can fix in the parser for future data, and the quicker I can get to building features that actually are interesting.
  2. If you have interesting data sets in a not sucky format that I can consume to add to what we hope will be a pretty sweet setup, that's wonderful.
  3. If you have Ruby/Rails experience and want to contribute, that's wonderful (frankly, I'm not a full time programer, I'm a computer security guy... I consult companies on security of their applications... read: I break applications, I don't build them).
  4. If you have graphic or front-end experience, especially if you've done some of this in something like Ruby/Rails, or one of those fancy new single-page JavaScript frameworks that can do really nice single-page applications to make load times fast and what not, that could be interesting down the road.
  5. If you have ideas on what would make this super awesome, well, we already have a lot of features to get done, but shoot them my way.
  6. If you just think this is awesome and want to provide words of encouragement, hey sure, I'll take it... if you see this data raw, you will come to learn pain when you try to parse it :).

Side note, I don't really post on MGoBlog often, but I read it religiiously. I took this project on because I thought this would be a nice way to give something back... so I'm really hopeful that I can make that happen for everyone and that that will turn into more awesome content for the blog.