alternate headline: man does job
Apparently the Big Ten Network's website is running a poll to determine which Big Ten team has the "best home-field advantage". Popularity contests do not good data sets make, so I figured I'd apply a lot of counting and a little math and see what I came up with.
- For each Big Ten team, I tallied up their total wins over the last 11* years, and seperately tallied how many of those wins came at home.
- I ignored nonconference games. Those will naturally boost home winning percentages as you invite the baby seals to get clubbed at your house, and play home-and-homes against teams that might actually beat you.
- I wanted to compare how well a team did at home compared to how well it did on average, rather than just totalling home wins and saying "golly, Ohio State must have the best home field advantage because they won at home a lot". Well, unfortunately, they won on the road a lot too, so it doesn't tell you much.
- Of course, the inverse of saying a team has a "Strong home field advantage" would be to say that same team "Sucks on the road". I'm looking at you, Indiana.
*I had planned to look at the last 10 years, but made my spreadsheet a big too large and went on my merry way entering in data. I was all done by the time I realised my mistake and I saw no reason to discard the 1999 season just because it was one more than I had planned to look at.
First, and just for the record, here's your overall Big Ten winning percentages for the last 11 years:
|Rank||TEAM||WINNING %||Home Wins||Home Wins Rank|
Yeah, I know. I don't like it any more than you. Anyhow, as you can see, there's not a lot of difference between a team's overall rank and its rank in terms of raw number of home wins. A bad team is a bad team at home or on the road, and ditto for a good team.
Surely there must be something to the fearsome reputations to such locations as Beaver Stadium and the Horseshoe though, right?
At first, I tried expressing home field advantage as the percentage increase of home winning percentage over total winning percentage. However, I found that this simply weighted the home success of bad teams much higher. Instead, I totaled the number of wins each team had at home, subtracted the number of wins each team had on the road, and averaged over 11 years to yield a number I'm calling the Expected Increase in Wins at Home (EIWH). In other words, every year each team plays 4 Big Ten home games and 4 Big Ten road games. How many more wins, on average, does a given team expect to claim at home than it will on the road? The results are as follows:
The results have some suprises. Iowa, a slightly-above-average team overall, earns an average of one more win at home than it does on the road, as does celler-dwelling Indiana. Indiana has only won five Big Ten road games in the past 11 years. Iowa has a reputation as a tough place to play, especially at night, but the Indiana results are inexplicable.
On the other end of the spectrum, Illinois has only earned 16 of its 30 victories at home, which makes for an interesting contrast with Indiana in spite of the two school's proximity at the bottom of the overall standings. Strangest of all, the feared Horseshoe in Columbus grants a very modest advantage to the hated Buckeyes. They have less of a home field advantage than such teams as Northwestern (a school which, from my personal experience, barely fills half its stadium with home fans) and Minnesota (who played in the sterile Metrodome for all of the period of this study).
What's the message here? It seems that the level of hype attached to particular stadiums has little relation to the advantage those stadiums grant to the team playing there.
Much has been made of the recent UM record. However, whenf statisticians seek a more reliable measure of a team’s quality and the direction of a program, they look at the bigger picture by (1) comparing that season record with records from other schools and (2) considering not a single year, but groups of years (called a moving average).
(1) I looked at the records of the two most recent coaches among our rivals. I found that ND had a 3 win season, OSU had a four win season; and MSU had three four-win seasons. Some of these occurred during coaching transitions, like UM’s. But others had no such excuse.http://cid-4bf9d75c782b05b1.skydrive.live.com/self.aspx/notre%20dame%20trends/ND+trends+vs+UM.jpg
(2) As in prior threads (see footnote*), I now report the analysis of the records of the ND coaches, based on the victories averaged over each of 4 successive seasons.**
Results: Under Lou Holz, the trend was positive overall (with an increase of .125 victories per year). Yet, much as occurred during LC’s initial years, the gains were all early, and were followed by a gradual decline. For all the subsequent coaches at ND, the trends were consistently negative (a decrease in average victories of -.25 per season for Davies, -.25 per season for Willingham, -.10 per season for Weiss. However, the trends appear downward at a uniform rate, starting at Holtz’s peak.
1. The ND program is progressively deteriorating.
2. One wonders if the many coaching changes
contributed to this. I have given mixed
shades to the transition years, in which one coach has at least 2 years of the
other one’s players. From this, one
wonders whether Willingham would have continued the upward trend if he was kept
and could play his recruits during what were the first two years of Weiss’
3. Since ND faces massive losses next year, including the OL, RB and probably Clausen and Tate, in addition, with a completely inexperienced backup QB who will be unable to practice and coming off ACL surgery next August, one must seriously wonder when—no, whether—the ND program will get back on track.
If UM uses ND as an example of what might happen to a program, the questions for UM now is whether it will follow the pattern of Holtz, who began with a decline in average wins—similar to what is likely for RR (although Holtz did not have the big immediate dropoff in average wins from his predecessor, since that average was already quite low). The promising thing is that, unlike ND, UM has more, not less, starters coming back for the next two years. Clearly, it’s way too early to tell—as Brian has intimated today—but I can't help worrying that we might end up like ND if we keep getting rid of coaches before they can build their program.
* In two previous threads titled “Reasons for Hope” (for UM), and “reasons for MSU hopelessness.” Another interesting and pertinent link from another poster is: http://mgoblog.com/diaries/what-two-losing-seasons-start-tenure-means**Note that it’s not a simple average. At the beginning of a coach's tenure, his record is shown as an average that includes the prior coach's average--which may be either better or worse than the current record. As, such the first two years of each coach’s tenure are shown as mixed colors, as they reflect the recruits of the previous coach as well as the performance of the current coach. (just ask yourself, if Bo were alive and took over the coaching job of the perennial celler-dweller Northwestern team in the 60's, would he be responsible for the first few years?)
In a previous thread titled “Reasons for Hope” (for UM), I looked at the trends in average victories from LC to RR (based on an average of four consecutive years). The conclusion was: that RR--after a significant hemorrhage that occurred during his first year of surgery on the program--is close to stopping the slow bleeding that actually began after LC's first three years.
One critic objected in a heated manner to the methods I used. A few posters rebutted the critic, pointing out that his tunnel vision of only the worst possible portion of UM’s recent record ignored the bigger picture. I will not speculate on the motivations for this tunnel vision. However, one supportive poster--whom I thank-- suggested looking at the record of MSU compared to UM. So, taking this excellent suggestion, I tried using similar methods to look at the trends in average win pct at MSU under various head coaches.
I found that under Nick Saban, the trend in average victories was positive (with an increase of .06 victories per year, much as occurred during LC’s initial years). But after that, the trends were consistently negative (a decrease of .17 victories per year under Williams and Smith and a decrease of .25 victories per year under Mark Dantonio).* So, MSU declined at a pretty steady rate.
The only way that Dantonio can stop the bleeding and just stay even with the average victory record of his esteemed predecessor, John Smith, is to win 2 out of the 3 next games. So, this analysis does not support the often voiced idea—some will call it wishful thinking—that MSU has turned the program around under MD.
To stop the bleeding (decline in average), RR also needs to become bowl eligible (winning 2 out of the next 4 games including a bowl). To be fair, however, his task is much more formidable. UM’s current average, which is at a low point for UM during this period (7.5 victories per season) is still 3 victories per season more than MSU’s average (4.5).
Methods of Analysis (repeated)
I looked at the trends since Saban took over in 1995 (based on a moving average involving each four year period).
Toal wins and average wins for four successive seasons beginning in 1995 to present.
1995 6.5, 6, 7, 6 avg 6.25 Nick Saban trend +.06 per year
1996 6, 7, 6, 10 avg 7.25
1997 7, 6, 10,5 avg 7.0
1998 6, 10,5,7 avg 7.0
1999 10,5,7,4 avg 6.5
2000 5,7,4,8 avg 6.0 Bobby Williams -.17 per year
2001 7,4,8,5 avg 6.25
2002 4,8,5,5 avg 5.5
2003 8,5,5,4 avg 5.5 John Smith -.17 per year
2004 5,5,4,7 avg 5.5
2005 5,4,7, 5 avg 5.5
2006 4,7, 5,4 avg 5.0
2007-8 5,4 avg 4.5 until Mark Dantonio only -.25 per year (not including this year)
5,4,6 avg 5.0 -0.0 per year (assuming two more victories = 6 total this year)
*considering only his complete seasons---only if we assume he gets two more victories this year does he stay even with John Smith’s average when Smith left.
First off, I largely agree with ikestoys's diary (http://mgoblog.com/diaries/down-14-and-going-2). I have often thought that football is a game that rewards aggressive play calling, like going for two and on fourth down more often, and fake punts from your own 20... Eh...
Anyway, I disagree with a couple of points ikestoys made, both explicitly and implicitly, and I thought I'd chuck 'em out here.
Trials are not independent
This point was made by a commenter in the original diary, but the basic idea of treating the different sorts of trials (going for 2, going for 1, overtime) as independent events (and therefore as amenable to the application of the mathematics of garden-variety probability theory) is flawed.
In football the outcome of one trial affects the probability of another trial even occurring, and not in predictable ways. Let's say UM had made the first two-point conversion. Would State have played their next drive differently than they did? Maybe, maybe not. Perhaps they would have come out throwing and scored a field goal to go up by nine. We have no way of knowing how things would have unfolded in that alternative universe.
Relative frequencies are not probabilities
Second, and another point made by a commenter, is that ikestoys treats relative frequencies (the proportion of successful two-point conversions) as the same thing as probabilities of success. They are not. That's like saying that because 1% of adults die of lung cancer, you have a 1 in 100 chance of dying of lung cancer. Do you smoke? If so, then your probability is surely higher. If not, it's lower. The point here is that the probability of success of a two point conversion depends on many factors, as various people have noted.
Because relative frequencies =/= probabilities, I thought it would be interesting to see how the probabilities of winning fared if you didn't assume the probability of a successful two-point conversion was 0.44. So, two graphs for your viewing pleasure. The y-axis is the probability of winning the game after all events have unfolded (post-touchdown try after TD 1, TD 2, and possibly overtime). The x-axis is the probability of success of the two-point conversion (I limited the range of this probability to between 20 and 80%).
Graph the first
In the first graph, I have plotted the cumulative probabilities of winning for two strategies: going for 2 after scoring a TD to be down by 8 (iketoys's strategy--the black line), and going for 1 (RichRod's strategy--the blue line). The only thing I have allowed to vary is the probability of success of a two-point conversion (on the x-axis).
- Note that I have reproduced the probability ikestoys does, where the dashed red line intersects with the black curve at about 57% when Pr(success) for a two-point conversion is 44%.
- Note also that despite ikestoys's implicit claim that going for two is always the better move, if the probability of success falls below 35.5%, it is better to go for 1, as RichRod did. I'm not suggesting that this is what the probability would have been, though people's comments about a dog-tired Tate, a driving rain, etc., make this idea not too farfetched).
There are two other variables in the process: the probability of a successful PAT (which I held constant at 0.95), and the probability of winning in OT. The latter probability doesn't change the black curve below much, so I left it at 50/50, as did ikestoys.
In the graph below, the three non-black curves represent three different probabilities of winning in overtime: 40% (orange), 50% (blue), and 60% (green).
The only thing to take away here is that if you believe your probability of winning in overtime is high (based on your style of play, being at home, etc.) and if you believe your probability of a successful 2-point conversion is less than 44%(ish), then you should adopt RichRod's strategy. If you believe that your chances of winning in OT is 50/50, and you believe your chances of scoring on a two-pointer are > 35%, then you should follow ikestoys's strategy.
In conclusion (I know, finally)
Of course, coaches don't think this way in the heat of a game. Again, I basically agree with ikestoys, but the story is a bit more complex.
The situation: You are down 14 and probably only have 2 possessions left. Obviously, it will take two touchdowns to get back into the game. My question for you is, what combination of 2 point and 1 point conversions should you take to maximize your chance of winning the game?
Let's start off with a few assumptions. According to this rivals article, the average 2pt conversion rate in the NFL is 44%. I'll assume that it's about the same for CFB and that our team's conversion rate will be about the same in whatever specific situation we're in. We'll assume that we can estimate a PA kick as a sure thing. We'll also assume that we have a 50-50 chance of winning in OT.
So working with these assumptions, what is the optimal combination of 1pt/2pt tries?
Kicking 1pt tries only
This one is easy. Assuming we get 2 TDs to come back, taking 1pt each time will give us a 50-50 chance to win
In this situation, we get the first TD and take the 1pt. On the second TD, we 'man up' and go FTW BABY! Our chances of winning are equal to the chance of converting obviously, so 44%.
In this situation, we'll go for 2 after the first TD. If we convert, then we'll kick a 1pt try. If we do not convert, then we'll go for 2 again.
This is a slightly more complicated calculation, but here we go:
1.) 44% of the time we make the first 2pt conversion and go on to win the game.
2.) (.56)*(.56) = 31% of the time we miss both 2pt tries and lose despite making two TDs
3.) (.56)*(.44) = 25% of the time we miss the first but make the second 2pt. This ties the game and we go to overtime.
So what is our final equity? It is:
.44*1 + .31*0 + .25*(.5) = .57 or 57%
A quick explanation of this equation. We basically multiply the probability of an event by the outcome of the event. So 44% of the time we win (1), 31% of the time we lose (0) and 25% of the time we go to OT with a 50-50 shot (.5).
Now why isn't this done in the real world? Well part of it is that some of our assumptions aren't known. However, mostly it is coaches covering their ass. No one gets criticized for taking the safe route to force OT, only to lose. If you go for 2 twice and don't make it, you'll be torn apart in the press. Not to mention that football coaches don't focus much of their time on equity calculations.
The common belief of kicking 1pt to tie or going FTW! at the end with a 2pt conversion is clearly wrong, even if it is most commonly done.
While I am a relative neophyte when it comes to understanding how recruiting works, the one aspect that has really interested me is how the concentration of D-1 prospects breaks down amongst the states. Anecdotally, states like Florida, California, and Texas always seemed to create top-notch prospects, but that kind of made sense - those are three of the four most populous states in America. I always presumed, erroneously at it turns out, that fast, strong kids exist everywhere, and that the percentage of the population which embodied these desirable characteristics was pretty constant across the board. Thus, the reason the Big 3 fielded more D-1 football recruits than, say, Utah was more the result of population and "math" than something in the drinking water or the focus certain states place on football. Of course, there also seemed to be two glaring holes with this logic - the fact that many states in the Southeast (Alabama, Georgia, Mississippi, Louisiana, etc.) produce an inordinate number of recruits compared to their populations, and the fact that relatively populous states in the Northeast (New York and Massachusetts) produce far fewer recruits than their populations predicted. But was this really true, or did these two anomalies exist more as a figment of recruiting services and media hype than reality.
Now, I was going to do all of this research myself, but then I was luckily able to stumble upon this page that broke down each state by number of recruits, population, and ratio of people to recruits for 2004-2008. I then wondered how this translated to the NFL - in other words, were the states that produced a large number of D-1 prospects also sending kids to the NFL. So after some more scouring of the interwebs, I came upon this page, which provided a really awesome user-friendly chart. After some more finagling and Excel-assisted sorting, I came upon this chart:
Big Chart of recruits/NFL players home states 2004-2008
|State||College Recruits||State Pop.||State Citizens/Per Recruit||NFL Players||State Citizens/Per Pro|
|District of Columbia||27||591,833||21,920||3||197,278|
So that really wasn't that surprising. Presuming that the distribution of football players was constant across the population (i.e. for every x people, y recruits exist), the ratio should be 1:40,380 - in other words, the population at large holds about 1 D-1 recruit per 40,000 people. Similarly, of those kids who went to the pros, the number was truly astronomical - 1:241,575, an astounding number considering that some of those positions are held by international players that were not listed on my chart. And yes, this statistic is not perfect, since the actual number of high school boys every year who could become D-1 athletes, and thus future NFL players, is far less than the population at large, people move in and out of states, etc. But for illustrative purposes I think it still supports my points, and I don't have the time or inclination to peruse government population numbers for a more true number. Plus, I doubt the ratios would be so greatly skewed as to dramatically alter the clear trends present.
So these results alone somewhat shocked me, but it has more to do with the illogical hopes so many kids even becoming D-1 college recruits, let alone professional football players. To put this into perspective, there are about 3 people sitting in the stands during a Michigan home game, on average, who have or will become D-1 recruits in their lifetimes. In another way, my hometown of Royal Oak has a little over 60,000 people in it, or about 1.5 D-1 football recruits per year if the model holds true. As for those who go on to play in the NFL, the entire state of Vermont, if my model held true, would produce 3 NFL-quality players per year - and that really isn't even true over the 2004-2008 span (0 players over that span).
But clearly, football talent is not evenly distributed across the country. While some more populated states come pretty close to the proposed distribution, such as California, Pennsylvania, and North Carolina, outliers exist in the expected regions of plenty (Southeast) and barren (NY, MA). Both Michigan and Illinois also seemed to produce far fewer recruits than their populations suggest while places like Hawaii and D.C. seem more fertile than expected, but not to an extreme degree that you see with some other states. And in Hawaii's case, a large percentage of those recruits are taken by University of Hawaii, so that situation is clearly atypical.
So what does this mean? - college
For one thing, some traditional "hotbeds" of talent may actually "under"perform their expected ratio of recruits given a linear distribution - I'm looking at you, Pennsylvania and California. At the same time, maybe some people are underselling certain areas, such as Virginia and Oklahoma/Kansas, who have decent-to-great in-state programs that recruit nationally but also seem to have pretty fertile backyards to pick from as well. But the real focus, though, must fall on the Southeast, where states like Mississippi, Alabama, and Georgia continually churn out top-notch kids at a far greater rate than their populations suggest.
Despite what some Freep "columnists" opine as RR's apparent idiocy in not recruiting in-home talent at MSU's rate, it clearly makes sense to focus more of the staff's efforts on Florida and the Southeast compared to other regions in America. Sure, California and Texas are hotbeds that should be scoured, but the Southeast is where the money tends to be. Michigan produces a decent amount of recruits, but it is clear that outside of Ohio, the rustbelt just isn't a fount of top-notch talent the way some envision it. I'm sure there are a millions reasons why this may be, and I'll leave it to people in the comments to hash them out. My guess is that high school/college football has always been a more communal activity in areas of the South compared to the North, especially considering how few professional teams used to be located below the Mason-Dixon line compared to the population. Simply put, people "care" more about football down there, and that fervor translates to the youngest of children. They see football as a way to make a living, as a way to succeed and be a "god" in the community, and their environments seemed geared around making this dream a reality.
I don't think it has that much to do with the weather - sure, it helps to be able to play and practice outside more than in the north, but receivers can still catch balls, RBs can still squat and run wind sprints, and linemen can still work on their techniques indoors just as easily as outdoors. Plus, warm-weather states like New Mexico and Arizona produce recruits at a lower rate than expected, while some cold-weather states are able relative factories. To put it bluntly, I think kids in the Southeast "care" more about football than kids in the North. Now, that doesn't mean high school boys in Michigan and New York don't work hard or lack a will to win, but by and large I don't think the community rewards kids in the North as much for the success they experience on the football field as they do in places like Mississippi and Florida. I'm sure there are some socio-economic undertones to it, and some will say that kids in the Southeast see football as a way to escape the communities they are "trapped" in - see the Pahokee (?) pipeline as an example for crushing poverty pushing kids toward sports. But irrespective of the cause, it is clear that if you want the biggest payoff for your recruiting efforts, learning to whistle Dixie might as well become a requirement for major college recruiters. Now, that might not seem like a revelation to some, but it is interesting to see that anecdote play out in the numbers. I'm interested, though, to see how others feel.
So what does this mean? - NFL
As I mentioned above, I think a big reason more D-1 recruits emerge from the Southeast and Texas has to do with the relative importance the community places on football as a means to succeed. For better or for worse, a ticket to a D-1 school is viewed as a stepping-stone to playing in the NFL, and all the millions of dollars and notoriety that entails. So it shouldn't come as any surprise that the states which produce the most D-1 recruits per person also generate the most NFL players per person as well. Louisiana leads the way, with approximately every 82,000 residents producing an NFL player - a ratio about 3X greater than the expected! The same held true for most of the Southeast, with those states sending far more to pros than they have any business doing so. By comparison, Michigan is pretty average - it may be a little low on the D-1 recruits, but those who do emerge have a pretty average shot of making it to the NFL. So kudos to the Wolverine state.
By comparison, a pair of Ks - Kentucky and Kansas - seem to be the biggest "frauds" of the group in terms of overvaluing its D-1 recruits - both have pretty average or above-average number of D-1 recruits per population, but about half as many of those recruits wind up making it to the NFL as expected. So once again, Kentucky and Kansas underwhelm. As for New York and Massachusetts, they might as well focus on baseball - they just don't know how to create top-notch football talent.
But overall, this analysis proved what I expected - the Southeast produces a disproportionate number of D-1 recruits, and an inordinate number of these recruits are high-caliber enough to break into the NFL. Again, I have no scientific proof for the cause of this inequity, but I have stated my guesses. I am intrigued to see what other people believe is the cause, and I welcome anyone with more statistical knowledge than my one 400-level probability and statistics course to prove me wrong/drill down deeper.
What I'd like to do in the future:
* Breakdown for each state by high-school-aged boys, not the state population as a whole.