this may be of some local interest
so, despite the fact that the NFL rule change is so much hot air, the one thing it does accomplish is that it reopens the debate on how overtime should be handled. there seems to be general consensus that pure sudden death is stupid and broken. the college OT system—equal possessions from the 25—is better, but has never seemed perfect to me. here are my primary gripes with it:
- the 25 is too close. starting every possession in field goal range encourages conservative play. the only way to not have a legitimate shot at 3 points is to take a long sack or two short sacks/TFLs (out of 3 plays!), or to give up a turnover. lots of overtime games turn into field goal penalty shootouts.
- no special teams. overtime strictly pits offense versus defense. got a great punter? return man? too bad, they're sitting on the bench.
- no game clock. college overtime is nearly 15 years old, and every time i see a score bug sans game clock, it still weirds me out. this makes overtime play slow and deliberate. the NFL's sudden death OT suffers from the same problem, with the philosophy "pretend it's the 1st quarter again".
anyhow, those are just some ideas that i've been kicking around for a while, and think could work well and make for pretty compelling OT football. would you want to see them implemented in the NFL? the NCAA? i'm interested to hear comments.
A home game is now in the books as Michigan downed the IPFW Mastodons for a three game series sweep. Michigan brought the offense, collecting 47 hits for 30 runs. IPFW was the bad team that was expected, and our players didn't disappoint, at least on offense. Pitching had some less than stellar moments, but in the end, the guys that got into trouble kept the losses to a minimum.
Game by game review, series thoughts, and a look at Tuesday's Eastern Michigan game follow:
I am MGoData, a senior undergraduate at Michigan majoring in various technical fields. I am very interested in data and its ability to shed light on human behavior. Before embarking on a 5-6 year tour of PhD studies, I plan to take some time this summer to relax and do some "fun" projects. Given the emphasis this community places on actual facts and numbers over other more subjective metrics, I thought this would be a good place to look for a project or a two.
Jumping right into it then...
Given the success (luck?) Michigan State has had over the last few seasons, there has been a reasonable amount of discussion over Little Brother's "obsession" with everything Michigan. So I ask the question:
Is Michigan State obsessed with Michigan and if so, how can we quantify it?
To answer this question we need some data. We want this data to reasonably capture behaviors associated with constantly thinking about or wanting information on Michigan and we want there to some way to identify this behavior as being directed from Michigan State fans/students towards Michigan fans/students.
Google is a great place to start. People search google for millions of things, millions of times per day, and they search for things they are interested in. Google has had recent success predicting flu outbreaks 2 weeks ahead of the CDC, just be looking for places where people are searching for things like "flu symptoms".
Conveniently, Google has build a tool for called Google Trends with which you can compare the popularity of search terms, including where most searches are coming from. So what happens when we compare people searching for the "University of Michigan" to the people searching for "Michigan State University". Lets look at the data.
The most immediate thing to notice is the obvious fact that more people are searching for The University of Michigan than Michigan State University. Sadly, it appears that "University of Michigan" is becoming less and less popular over the last few years.
This isnt surprising given the national (and world) recognition that Michigan recieves. Google scales this data as follows:
In relative mode, the data is scaled to the average search traffic for your term (represented as1.0) during the time period you’ve selected. For example, if you entered the term dogs, the graph you’d see would be scaled to the average of all search traffic for dogs from January 2004 to present. But if you chose a specific time frame – say 2006 – the data would then appear relative to the average of all search traffic for dogs in 2006. Then, let’s suppose that you notice a spike in the graph to 3.5; this spike means that traffic is 3.5 times the average for 2006.
Now lets look at the other information Google gives us. Conviently, we also get information on geographic regions that are searching for these two topics. The top cities break down as follows:
Again we look to Google's FAQ to understand how these results are calculated and interpreted.
To rank the top regions, cities, or languages, Google Trends first looks at a sample of all Google searches to determine the areas or languages from which we received the most searches for your first term. Then, for those top cities, Google Trends calculates the ratio of searches for your term coming from each city divided by total Google searches coming from the same city.
It's possible that Google uses the IP address of the searcher to tailor results to the geographic location of the user, but others can test this by performing the same Trends search and seeing what you get.
Interpreting these results is a little bit tricky so bear with me. First thing to notice is that everything is scaled to the number of relative search popularity of "University of Michigan" in Ann Arbor. Essentially they use the percentage of all Ann Arbor Google searches that are for "University of Michigan" as a baseline and compare everything else to this number. Again though, the search popularity is scaled by the total searches coming from Ann Arbor so population of other cities does not skew results. If they didn't do it this way, places like New York City would always be on top just because there are so many people Googling for everything.
What are some qualitative things we can learn just from the bar graph. Well first of all, people in Ann Arbor are searching for "University of Michigan" way more than people everywhere else, and people in East Lansing are searching for "Michigan State University" more than any place else. Furthermore, it seems that people in East Lansing are searching for "University of Michigan" more than people in Ann Arbor are searching for "Michigan State University". We can see this by comparing the the Blue Bar next to East Lansing, to the Red Bar next to Ann Arbor.
Even better, though, is the fact that Google will actually let you download text files with actual numbers. The following table are Google's measure for search popularity coming out these top cities. Again notice that Ann Arbor searching for "University of Michigan" is the baseline at 1.000.
|City||university of michigan||university of michigan (std error)||michigan state university||michigan state university (std error)|
|Ann Arbor (USA)||1||0%||0.1||3%|
|East Lansing (USA)||0.27||2%||0.52||2%|
|Bay City (USA)||0.175||2%||0.09||3%|
From these numbers we can see that "Michigan state University" is about ten times less popular than "University of Michigan" in Ann Arbor (not surprising) where as East Lansing is searching for "University of Michigan" a little more than a quarter as much as Ann Arbor is. Comparing the search popularity of opposing universities in each town, we see that East Lansing is searching for "University of Michigan" a whole 2.7X as much as Ann Arbor is searching for "Michigan State University". Furthermore search popularity of "Michigan State University" in East Lansing is about half that of "University of Michigan" in Ann Arbor, so maybe they just don't like searching for themselves (its no fun reading more "SPARTY NOOO!" articles).
Finally some disclaimers about this analysis. The entire arguement hinges on the assumption that people's search behavior reflect something about things they are interested in and particularly something they are obsessing over. This is probably a long shot conclusion. Even if we accept that people might Google things they are obsessed with, there is no gaurentee that the search trends for people obsessing won't be washed out by the everyday searches of people who just need information. The fact is we have no idea WHY these people are searching for things, just that they are. Finally, there are probably tons of demographic differences between East Lansing and Ann Arbor that make these numbers really difficult to compare. If, for example, students make up a larger portion of the population if one city, it will skew the data because students use Google in a much different way than other demographics.
However, if you buy that people's search behavior is a reasonable proxy for things they are thinking about a lot, and that the demographic breakdown of two communities, largely driven by students and tech-savy individuals, is similar, then these results are kind of need. We can basically see that people in East Lansing are trying to get info on The Univerity of Michigan nearly 3 times as much as Ann Arbor is trying to find out whats going on in East Lansing. I assume Brain keeps info on who is visiting his site and from where so it might be possible to look at just people trying to get information on Michigan sports.
This is the first of many small projects I'd be interested in doing over the summer. If anyone has questions, comments, or ideas for the future, leave them in the comments.
I should probably have post more about this in anticipation of the argument that doing this research is proof of obsession or whatever. My answer to this is mainly in two parts. First of all I am an out-of-state student so I don't have the complicated relationship with State that most people do. I dont have friends or family there and I dont plan on getting a job in Michigan so I wont have to deal with co-workers.
My main point in this is that there are interesting ways to try to test some of the very subjective debates that go on in these parts. Plus I just plain find it interesting that you can attempt to make sociological claims from behavioral data being generated and made available these days.
|Friday 3:05pm, Ray Fisher Stadium|
|Alan Oaks (2-3, 2.67 ERA)||vs||TBA|
|Notes: Michigan is 5-0 all time against the Mastadons, including a |
3 game sweep to open the home schedule in 2009
|Saturday 1:05pm, Ray Fisher Stadium|
|Bobby Brosnahan(1-2, 5.14 ERA)||vs||TBA|
|Saturday 30min after Game 2, Ray Fisher Stadium|
|Notes: My guess is Miller starts. Game time change!|
Ah yes, home opening day. The true signal of spring has come to Ann Arbor is here at last. No longer shall the Michigan team have to travel from Lubbock, TX to St. Petersburg, FL to Chapel Hill, NC to Conway, SC to Port St. Lucie. This weekend, the baseball team returns to it's friendly confines of Ray Fisher Stadium at the Wilpon Complex. They return home, where they haven't lost a home opener since 2000 (30-3 since 1975).
Preview and such after jump…
To follow up on the previous KenPom charts and graphs, I decided to pick my NCAA tourney bracket based on Ken's predictions and see how accurate he is. The way I used the data is as follows: I assumed that M = (AdjOffence - AdjDefense)* (AdjTempo)/100 gives an average margin of victory for the dataset. Then, M1 - M2 = margin of victory difference between the two team playing. To apply to the Michigan/Ohio State games gives
Michigan = (107.0 - 92.7) * (62.7/100) = 8.99
Ohio State = (118.9 - 89.8) * (65.8 / 100) = 19.1
Which predicts a 10.1 point margin of victory for OSU, pretty close to the actual KenPom prediction.
I'll save you all the eye chart of the data table. If you're interested, it's here:
The relevant data:
|team||adjusted tempo||adjusted offense||adjusted defense||difference||rd1|
|Texas El Paso||69.45||107.47143||88.3081||13.30893269|
|Nevada Las Vegas||67.3486||109.46974||90.69505||12.64449087|
|San Diego St.||64.6273||110.97383||92.09351||12.20184105|
Calculating the probable winners in this fashion gave a win/loss of 24/8. And, four of those that are wrong were predicted to be 3 point games, and ended up +/- 3. Here's the corresponding chart.
I calculated the total average margin of error (absolute value) for all games at 7.44, and margin of error in games correct at 7.07, and margin of error in games wrong at 10.6.
I next calculated the distribution of error. Since I used absolute value in the previous calculation, I ended up with half a bell-curve distribution. Data:
What's interesting is that this is a better prediction than just using KenPom as a relative rating. By picking solely based on the higher ranked team, the record is 23/9.
If you can draw any conclusion from all this, it is that Ken is pretty accurate, except when he's not. I didn't expect to be 100%, because I don't think any system out there will predict Georgetown, or Kansas or Villanova to lose, based on the numbers. But, by this point in the season, the system is remarkably accurate in predicting probable outcomes. It has some margin for error in predicted close games, but I don't think there's any system that would be able to predict close games, either. They just come down to the luck of the draw.
So I was planning on interviewing Michigan baseball coach Rich Maloney tomorrow, but this morning's Michigan Insider with Sam Webb had their time with Rich Maloney this morning and they more than asked pretty much all of my same questions. So I'll summarize Maloney's comments here (full audio):
- The rotation is still up in the air. Oaks and Brosnahan appear to have solidified their spot in the weekend. Burgoon is going back to the closer role for now, but if Miller continues to struggle, Burgoon will be the third guy. If for some reason we don't have to use Burgoon in a Friday and Saturday, we could see him on Sunday as well.
- Ryan LaMarre is due to have the pins out of his thumb tomorrow (Wednesday). It should be at least a week of rehab to build up strength. Maloney expects LaMarre to battle to get in quicker against Indiana, but "but that might be a bit of a reach."
- Coley Crank has been a surprise to Maloney. They knew he'd one day be an offensive force, but his explosion this early was surprising. He's also greatly improved his defense.
- "Chris Berset is playing at an unbelievable level right now. He's truly one of the best catchers in the country right now."
- "Dufek hasn't been hitting them out of the park, but he's been starting to come alive with the bat." We'll have some more on this later in the week here at mgoblog.
- We're playing at Stanford next year, at LSU in 2012.
- Rain delays in Atlanta pushed back getting into Ann Arbor from 10pm Sunday via airplane to 2pm Monday (with a 12 hour bus trip).
- WTKA will broadcast 10 games this season between WTKA and WLBY
So it sounds like LaMarre is still on schedule. I wouldn't be surprised to see Ryan get a few at bats against Indiana, at least in a DH or pinch hitter role, most likely the latter.