UFR Database

Submitted by BlockM on July 14th, 2009 at 11:06 AM

This has been an extremely long offseason for me up to this point, and I've been struggling to find ways to get my Wolverine Football fix, as I'm sure just about everyone that frequents MGoBlog has. My internship has also been extremely slow, so I've had a lot of desk time to sit and do nothing.

While perusing the site I began to wonder if there was some sort of UFR database that could be queried in order to research different trends, etc. I'm a computer engineering major with a math minor and statistics are not exactly my cup of tea, but I would at least like to poke through that information if it was all in one place.

After emailing Brian and finding that there was no such database, I found myself pondering how one might go about making such a database. I started writing an application in C++ that would take the HTML table and convert it into XML that could be imported into Access. I was relatively close to getting that to work, but it was a bit of a hassle, so I decided to try something else.

I decided to write a Perl script that would parse the HTML, clean out all the tags (<table>, <tr>, <td>, etc), and put in their place more useful XML tags (<line>, <player>, <analysis>, etc). The result was a pretty nifty script that does almost all the work for me. After converting each UFR table into XML, I needed to add some extra information like drive IDs, play IDs, and things of that nature so that the database would be easily queryable. I did some of this by hand (lots of search and replace), and wrote another script to take care of a bunch of the nit-picky stuff that would have taken hours. I didn't really want to spend the rest of the day adding <PLAYID>some number</PLAYID> to over a thousand plays... I decided to be an engineer because I'm lazy and there's ALWAYS an easier way to do something.

If you're interested in seeing the Perl scripts, especially if you're proficient in Perl, let me know and I'll send them to you. This was my first useful Perl script (I taught myself, and hadn't ever done anything more significant than Hello World!) and a lot of my techniques came straight off a Google search and probably aren't following best practices. Suggestions/critiques would be greatly appreciated.

After some trial and error, I ended up with what I think is a pretty nifty database that includes information about every game, drive, and play that was charted over the course of last season. OSU is missing because Brian didn't do a UFR, along with halves of some games (a couple were only charted for offense or defense, but not both), and some drives (I believe a couple games were so painful that the final drives were non-UFR-able).

I've hosted the database file on FileFactory, along with the final XML file for the complete season. Hopefully that works...

If I get really bored in the future, I'd ideally like to find a way to parse the text to record all the +/- information for each player, but I haven't thought about that much... seems like it would be quite a bit of work, and I might not have time for it now that my boss and my manager are back from vacation.

So anyway, feel free to download and look through that. Do whatever you'd like with it. I didn't really do this for any reason other than I wanted something to do, and it seemed more useful than Facebooking all of our potential recruits for the next five years. Sorry about all the parentheses... no idea how they've become so prevalent in my writing over the years, but everything sounds choppy when I write without them.

Go Blue!

EDIT: As was mentioned in the comments, the Miami game, the Northwestern game, and the other half of the PSU and Illinois games were actually available. These have been added to the database and the final file is located HERE. (The corresponding XML has been updated as well.)

Comments

Brian

July 14th, 2009 at 11:27 AM ^

"I might not have time for it now that my boss and my manager are back from vacation." Nonsense. Look at how well that worked out for me.

Joined: 05/26/2008

MGoPoints: 118859

UMFootballCrazy

July 14th, 2009 at 11:55 AM ^

I have not looked at the database and likely will never look at it, but what an awesome resource. You are to be commended for the effort and I look forward to guys bashing each other with arguments they pulled out of your database. Thank you.

Joined: 06/30/2008

MGoPoints: 489

BlockM

July 14th, 2009 at 12:37 PM ^

That's what I'm hoping for as well. Should be interesting to see if anything comes of it.

Joined: 07/03/2008

MGoPoints: 40727

Maximinus Thrax

July 14th, 2009 at 12:55 PM ^

I would have to imagine that the amount of lost productivity due visiting this blog during working hours amounts to several thousand dollars daily, in the aggregate.

Joined: 10/26/2008

MGoPoints: 5182

NYWolverine

July 14th, 2009 at 12:23 PM ^

probably a dumb question around here, but what application opens .mdb and .xml files? I don't know how to utilize your database without it looking like an engineering script in textedit.

Joined: 08/16/2008

MGoPoints: 1018

BlockM

July 14th, 2009 at 12:40 PM ^

The XML is readable with Notepad. I had to remove all the line breaks because they were causing some problems with queries in the database. I didn't need to remove all of them, but it was quicker than going through 20,000 lines of text and only removing what I had to. You can also open the XML file in Internet Explorer or Firefox, and that will show the data structure quite nicely. The .mdb file is readable with Microsoft Access. You can open it using OpenOffice as well, which is free at www.OpenOffice.org. Instructions for doing that are at http://salahuddin66.blogspot.com/2007/09/mdb-file-in-openofficeorg.html. There are some extra things you may need to do in the comments at the bottom of the blog post as well... I just did a quick Google search, so there may be better resources for figuring out how to do that.

Joined: 07/03/2008

MGoPoints: 40727

a non emu

July 14th, 2009 at 2:03 PM ^

I was hoping you would at least say VIM. But yeah, notepad is alright. I guess... :P But thanks for the amazing work! Stuff like this is what keeps me coming back to this site (apart from Brian's posts of course).

Joined: 06/30/2008

MGoPoints: 720

BlockM

July 14th, 2009 at 2:26 PM ^

It opens in VIM (I had never heard of VIM before, but I just downloaded it quickly and tried it... seems to work, which makes sense because it's just a text file.), and I actually did any hand editing in Notepad++.

Joined: 07/03/2008

MGoPoints: 40727

a non emu

July 14th, 2009 at 2:46 PM ^

vim is this famous (I thought :D) text editor that a lot of the *NIX community swears by. I just like using it because I do a lot of code editing, and it has this really powerful regular expressions engine that you can do a lot of clever things with (things like search, search/replace etc are a breeze). It has a bit of a learning curve, but once you get used to it, editing with anything else becomes next to impossible. Having said that, I like Notepad++ too, and have used it quite a bit. But, don't want to hijack this thread though, there are plenty of other places on the web to discuss the merits/demerits of various code editors :) Once again, great post! I like the idea of turning this into an HTML page as someone had suggested. Will definitely make it a lot more accessible for people not used to working with raw DBs/Access... You could maybe even get Brian to host it on the site. I am sure he will find it useful as well (probably more so than most of us).

Joined: 06/30/2008

MGoPoints: 720

BlockM

July 14th, 2009 at 2:52 PM ^

Gotcha. As an Electrical/Computer Engineering student I don't get as deep into the CS side of things as I'd like. I'll play around with it at some point just so I can have it on my list of "things I've at least tried to use."

Joined: 07/03/2008

MGoPoints: 40727

a non emu

July 14th, 2009 at 2:59 PM ^

I am an EE/CE too. I just happen to work on the firmware level. Sort of straddling that boundary between EE, CE and CS. But yeah, perl is a great skill to have, and knowing perl, you might appreciate vim even more :)

Joined: 06/30/2008

MGoPoints: 720

NYWolverine

July 14th, 2009 at 4:32 PM ^

Thanks for posting, and the follow up. This is really cool.

Joined: 08/16/2008

MGoPoints: 1018

AnnarRankings

July 14th, 2009 at 1:22 PM ^

If there is any interest I could convert this to a SQL database and create a simple HTML user interface to make it easier for people to search all this information from their favorite browser.

Joined: 08/06/2008

MGoPoints: 26

BlockM

July 14th, 2009 at 1:26 PM ^

That would be amazing. Just out of curiosity... where would you host something like that?

Joined: 07/03/2008

MGoPoints: 40727

AnnarRankings

July 14th, 2009 at 2:29 PM ^

I have business site that I could put it on to avoid hosting fees. I was also wondering how complicated it would be to add it on mgoblog somewhere (and keep it updated after each UFR). That may not be feasible or even wanted by Brian. Either way, I'm sure something could be put up in the next few weeks.

Joined: 08/06/2008

MGoPoints: 26

BlockM

July 14th, 2009 at 2:44 PM ^

Keeping it updated after each game would be fairly easy... I'd have to polish up the Perl stuff to make it one automated step, but then it's just a matter of copy/pasting the table HTML into a file, converting it, and importing it into Access (or whatever type of database it ends up in). Like you said though, if it's housed on MGoBlog, it's up to Brian.

Joined: 07/03/2008

MGoPoints: 40727

fender36

July 14th, 2009 at 1:38 PM ^

That was exactly what I was thinking. I would be happy to help with this project if you need it.

Joined: 08/30/2008

MGoPoints: 15

ImSoBlue

July 14th, 2009 at 1:25 PM ^

properties list to open. See what a little boredom can do?

Joined: 07/10/2008

MGoPoints: 1038

July 14th, 2009 at 2:43 PM ^

Unfortunately, I have a bunch of "real" work to do today, but I will definitely check this out tonight and over the weekend. Thanks!

Joined: 07/03/2009

MGoPoints: 141

benjahen

July 14th, 2009 at 3:39 PM ^

and the interaction with the programming community that makes me sad that all the further I got with programming was 101 and then I fell into ChemE. Cocktails on an epic scale.

Joined: 02/10/2009

MGoPoints: 24

joeyb

July 14th, 2009 at 5:25 PM ^

I was thinking about doing something like this, but definitely did not have the drive to do it. I have done similar things in the past and the way I have done it is to use Regular Expressions and create objects from the regex results. Then, in Access (I use SQL Server) create an identity on the tables which will automatically add the id's as you insert records. Very cool. Thank you for putting the effort into this so that I may reap the benefits.

Joined: 10/12/2008

MGoPoints: 16845

joeyb

July 14th, 2009 at 9:21 PM ^

I was just importing the file into my database and noticed which games you had missing. I did a quick search and it looks like there are UFRs for everything except the OSU game. Would you mind either running the script again with those in there or posting the script so that I can do it. I want to make use of your work, but I think it might be more useful with everything included. Thanks

Joined: 10/12/2008

MGoPoints: 16845

BlockM

July 14th, 2009 at 11:03 PM ^

Sure. I'll get them up tomorrow at some point. I'd send you the script, but there's some tedious manual stuff that needs to be done at this point and it'll be easier this way. Thanks for the heads up! EDIT: Or tonight... gosh I'm bored lately. Enjoy!

Joined: 07/03/2008

MGoPoints: 40727

BlueBulls

July 15th, 2009 at 11:31 AM ^

Apparently WLA did the OSU UFR, I would check over there.

Joined: 04/06/2009

MGoPoints: 137

ShockFX

July 15th, 2009 at 11:34 AM ^

I don't think we want this in the database. http://mgoblog.com/content/upon-further-review-michigan-vs-ohio-state

Joined: 07/16/2008

MGoPoints: 4073

BlueBulls

July 15th, 2009 at 4:08 PM ^

Yeah, UFR of the UFR I wouldn't recommend it being incorporated into the database. Overall a good post though.

Joined: 04/06/2009

MGoPoints: 137

ShockFX

July 14th, 2009 at 10:24 PM ^

I will be more than happy to help with this as well. I know stats, SQL, XML, HTML, other initials. This would be hella fun and useful. Edit: Please use this not Access!

Joined: 07/16/2008

MGoPoints: 4073

BlockM

July 14th, 2009 at 11:01 PM ^

I can play around a bit with the SQL Express and see what I can do, Access was just what I had at the time and I know that's not ideal for most situations. With all of the available data in there now, feel free to do any analysis/tweaking you'd like.

Joined: 07/03/2008

MGoPoints: 40727

ShockFX

July 14th, 2009 at 11:03 PM ^

Hmm, once I'm at work tomorrow I'll import to SQL 2008 and I'll post the MDF/LDF files for loading into SQL Express. Edit: Thanks again for all the work you put in.

Joined: 07/16/2008

MGoPoints: 4073

joeyb

July 15th, 2009 at 10:31 AM ^

I already have a SQL script to import the raw data. Right now, I am working on a script to normalize the data and convert it into proper values. The problem that I have is that, for example, the down is not always 1-4, but sometimes 2pt. Or, the TOGO is sometimes G for goal. The line is M28 or O36. Basically getting the data in is not that difficult but making it usable is a little more challenging.

Joined: 10/12/2008

MGoPoints: 16845

BlockM

July 15th, 2009 at 10:39 AM ^

Yeah, not sure exactly how to go about fixing that... Also, FYI, I realized that I had missed the Northwestern game in the newer updated version of the database. I'm changing the link in the diary again, but here it is as well: http://www.filefactory.com/file/ahd2a8b/n/08_UFR_mdb

Joined: 07/03/2008

MGoPoints: 40727

ShockFX

July 15th, 2009 at 10:57 AM ^

I have a few thoughts on this. First, tag 2pt with "Special Teams" and leave it at that. It's an anomaly that is easier excluded. For TOGO with G, just change 1st and G at the 5 to 1st and 5. The line could range from 0 (our goaline) to 100 (their goaline). Once I get a chance to look at the database, I'll see if I can come up with schema for efficient querying. I just need a couple free hours at work with a whiteboard.

Joined: 07/16/2008

MGoPoints: 4073

joeyb

July 15th, 2009 at 11:15 AM ^

Those were my thoughts as well. I have already made my way through most of it. I will try to finish the script tonight and get it up for you to see.

Joined: 10/12/2008

MGoPoints: 16845

ShockFX

July 15th, 2009 at 11:29 AM ^

gChat me at [email protected] and we can work through it on a shared google doc if you want. I'll be online from 830pm EST until midnight. Also, I have a formatted/indented version of the above XML and also have the Access DB converted into SQL 2008 if you'd like either.

Joined: 07/16/2008

MGoPoints: 4073