"I love it that Ivy League coaches are coming to our camp and Big Ten coaches are coming to our camp. South Florida is coming. We've got about 70 schools that are coming to our camp."
Ontario Sneed Haunts My Dreams
Ok, ok. You may have noticed a distinct slowdown in content of late. There is a very good reason for this that will hopefully have a very cool end result. (Granted, this is only for a certain uncool value of "cool," one that does not involve helicopters or babes or luxurious mustaches.) This involves grabbing the data available on the NCAA statistics page, cramming it into my own database, and then, um, unleashing it and stuff. With authorita.
No doubt this would be a simple process, right? Take some regularly-formatted text, extract the appropriate data, and repeat for thousands of plays over hundreds of games: viola, totally awesome weapon of mass statistics at my disposal. This, eh... has not so much happened. If I had only one word to describe the state of this data it would be "I hate you, NCAA person who either designed or did not design this thing, more likely the latter, you useless git, though you probably don't exist."
What the hell am I talking about?
Okay, this is a fairly normal play sequence from the Central Michigan-Akron contest:
- (1st and 10) SNEED, Ontario rush for 10 yards to the CMU41, 1ST DOWN CMU (PACE, Chevin).
- (1st and 10) PENALTY CMU false start 5 yards to the CMU36.
- (1st and 15) SMITH, Kent pass incomplete to HARPER, Justin.
After much blithering about, this works. Here's a fumble. Notice that the yardage is totally omitted?
- (1st and 10) BIGGS, Brett rush to the AKRON41, fumble forced by WILLIAMS, T., fumble by BIGGS, Brett recovered by AKRON SCHEPP, Mike at AKRON43.
Now I have to tease out that piece of data from the yard lines provided. This is what's colloquially referred to in the coding business as a "giant pain in the ass." With sprinkles. Here's a play from the Akron-Kent State tussle:
- (1st and 10) MACHEN pass complete to PRUDEN for 4 yards to the KSU5 (CORNER, Reggie).
- (2nd and 6) JENKINS rush for loss of 4 yards to the KSU1 (REID, Jermaine).
- (3rd and 10) MACHEN pass incomplete to HILL.
That's right, everyone on Kent State has decided to go with just the one name. Maybe they're all Brazilian. Or models. I dunno. What I do know is that my hacked up parsing code vomited on this, causing me to revisit it. More hours down the drain. Oh, and then there's this spectacular sequence:
|2||1||Akron||FUMB||13:36||opp 16||TD||13:36||opp 16||0||0||0:00|
Maybe this is just indication that the MAC blows something fierce and can't be bothered to submit correct reports. They don't even list punts! There's not a goddamned punt in any of the MAC games! I mean, I saw that UW-BGSU game but that's an abberation, right? Meanwhile, the overzealous SEC does this...
- (2nd and 6) [SHOT], Rafael Little rush for 3 yards to the UT42 (Ryan Karl).
- (3rd and 3) Timeout Kentucky, clock 01:04.
- (3rd and 3) [SHOT], Curtis Pulley rush for 1 yard to the UT41 (Roshaun Fellows;Justin Harrell).
Medic! And then there's this, from extra-overzealous Alabama, of course:
- (2nd and 8) Brodie Croyle sideline pass complete to Keith Brown for 26 yards to the LSU30, 1ST DOWN ALABAMA, out-of-bounds (Chevis Jackson).
- (1st and 10) Kenneth Darby rush over left end for 1 yard to the LSU29 (Ronnie Prude;Kyle Williams).
- (2nd and 9) Brodie Croyle RF pass incomplete to Le'Ron McClain, QB hurry by Kyle Williams.
- (3rd and 9) Brodie Croyle crossing pass incomplete to Ezekial Knight, dropped pass.
... which is potentially cool--it would be fascinating to see the statistical breakdowns of the different routes--but mostly just makes my mind, which is futilely attempting to reconcile all these different data presentations in one hunk of code that does not make any goddamn sense to anyone, including me, bleed. So, yeah, the NCAA spent all that time spanking naughty mascots when they could have been normalizing their play-by-play submission process. Before I didn't care. Now I'm wicked pissed.
I'm pretty sure this post--being a discussion of how hard it is to parse football play by play data--is the most boring in the history of this blog, but this is by way of explaining any (extra) crotchety-old-man-ness you may see in this space over the next few days. Don't blame me. Blame the NCAA.