There’s a widespread theory that Lloyd Carr’s career can be split in two phases: a “good” young Carr and a “senile” old Carr. But is it statistically sound?
If you look at straight winning percentage, this seems, well, inconclusive. But, the argument goes, there’s more to success or failure than just winning more than you lose. There’s whom you beat, and who beats you. There’s wins-versus-expectations-of-wins. There’s where you end up ranked. There’s whether you play in a major bowl game, and if you win it. Most importantly, there are those three pesky rivalries, particularly the one with Columbus.
As some have argued, Carr put really competitive teams on the field early on, but later ones tended to disappoint, to flag late in the game, and to underachieve. The four-game stretch between OSU 2006 and Oregon 2007, it has been said, is the worst in recent memory, and this is mentioned as proof that Lloyd Carr had lost it at the end. But did he really?
You could try answering this with winning percentages, bowl appearances and clever argumentation, but mgoblog is a well-known haven for quantification nerds, whose denizens crave robust new measures that capture things the dinostats can’t. After all, aren’t away wins more dramatic than home wins, and home losses more embarrassing than the away ones? Isn’t it more consequential to lose in-conference than outside of it? Doesn’t it feel just that much better to beat Sparty than Purdue? Notre Dame than Illinois? OSU than everyone? My mission was to create new indices of success, measured across the course of a season, that capture more than just wins and losses—also heights soared to, depths plumbed, the intangibles. I created two, which are related to one another, but capture somewhat different aspects of success or failure.
Constructing the Indices
Constructing the indices begin with regular season games. A baseline score is produced for wins and losses, valued at 10 and 0 respectively. To this baseline measure, a series of intangible weights are added for all regular season games. It’s all a little long-winded for here, but I can make it available to anyone who wants to know. The categories are: 1) Who the opponent is; 2) Relative ranking to UM; 3) Home/away; 4) Margin-of-victory; and 5) Performance versus expectations. All scores are ordinal, so it required some subjective decisions on relative worth of these categories, but the same criteria were applied to each case, so it should be reasonably objective.
Let me break down a couple. The best single-game score for the period 1994-2010 was the ecstatic 1996 win at Ohio State, which received a score of 21:
10 (win) + 5 (OSU) + 3 (top 5 opponent) + 1 (away win) + 0 (win by less than 20) + 2 (performed well above expectations) = 21
The worst single-game score (surprise surprise) is The Horror, which received a score of -8. It breaks down like this:
0 (loss) - 0 (non-conference, non-BCS game) -5 (lower league + FCS opponent) -1 (home loss) -0 (loss by less than 20) -2 (performed well below expectations) = -8
Bowl Games and Ranking Bonuses
Bowl games are treated somewhat differently. On the one hand, it’s not right to penalize a team for what’s basically a value-added bonus to the season. On the other, winning is still better than losing. So scoring looks like this:
+5: making any bowl game
+2: making a BCS bowl game
+5: winning the bowl game
+2: winning a BCS bowl game
+/-2: failing to meet/exceeding expectations (broadly defined)
Some examples: 1997 vs. Washington State = 14; 2001 vs. Tennessee = 3; 2004 vs. Texas = 7; and 2007 vs. Florida = 12.
Ranking bonus averages the final BCS and AP rankings, or if prior to the BCS, the Coaches Poll and AP rankings. It works like this:
Which are then granted a bonus or penalty based on preseason expectations. So the 1996 team, with a preseason ranking of 12/11, and which ended up with a final rank of 20, gets a penalty of 1 for ending up below preseason expectations: 2 – 1 = 1. 1997, which ended up with a rank of 1 (we all know the Coaches’ Poll was fixed), began with a preseason rank of 13/14, so that team gets this bonus: 9 + 2 = 11.
EVG and IVG
Total points are added together, and then divided by the number of games played to produce the expected value per game (EVG). The intangible value per game (IVG) index compares subtracts the baseline value for 10 per win with no intangibles, and 0 per loss with no intangibles from total points, and then divides by number of games played (with a 0 value for a missed bowl game). This measures the intangibles solely. Yes, wins produce more positive scores (and losses negative scores), but this measure basically measures elation minus disappointment. As you’ll see, the distributions are similar, but actually more variant than EVG.
Winning PCT, IVG and EVG by Year, 1995-2007
As you can see, EVG and IVG capture more fluctuation from season to season than straight winning percentage does. IVG is something of a counterbalance to Winning PCT, looking solely at the aforementioned intangibles. EVG takes both into account.
A number of things are immediately apparent.
1. EVG and IVG capture more fluctuations than Winning PCT. Carr had an average winning percentage of 0.753. There were 5 seasons when Carr’s teams beat this average, 2 which were basically at the average, and 6 below it.
By contrast, only 4 seasons beat the average EVG of 9.51, while 9 fell below it (while 5 seasons beat the average IVG of 1.99 in terms of IVG, and 8 fell below it). As you can see, there are more discernable peaks and troughs in these indicators than with straight Winning PCT. EVG in particular appears to successfully capture the big picture while taking the significance of individual games into account.
2. Though The Horror was the single-worst game of the Carr era, 2007 as a whole wasn’t Carr’s worst season. It was still on the bottom half of the Carr years, but in terms of EVG it was third worst, after 2001 and 2005. In terms of IVG, it was only fourth worst, after 2001, 2002 and 2005. By EVG, 2005 was Carr’s worst season; by IVG, it’s 2002.
3. Carr’s career does not divide neatly into a “good” early period and a “senile” later period. As the figures show, Carr’s career had four peaks—1997, 1999, 2003 and 2006. By both measures, 1997 was far and away his best season. I had thought that the intangibles might have elevated Tom Brady’s near-NC year in 1999 and/or the Navarre-led 2003 squad that lost to (compliance-dodging) AP national champion USC in the Rose Bowl above the 2006 squad, but they don’t. 2006 scores as Carr’s second best according to Winning PCT and IVG, and third according to EVG. What’s more, when I ran a regression of EVG and IVG by year for 1995-2007, neither produced a statistically significant result, meaning there’s no clear upward or downward trend over time during this period.* Unless we decide to completely ignore the great 2006 team, or some of the disappointing teams from earlier in his career, the good/senile theory looks like a myth we can safely bust.
4. Carr’s teams were most consistent in the middle of his tenure. In terms of EVG, we can say that the years 1995-2000 were more consistent, and less prone to dramatic fluctuations from year to year, than 2001-2007. While not quite good/senile, this does potentially lend itself to critical arguments. With IVG, there’s a sustained trough in the middle (1998-2002), which reflects higher expectations due to the 1997 national championship and too many losses to Michigan State, Notre Dame and marquee non-conference opponents. That makes them the most disappointing stretch of years, when solely considering results versus expectations. That jives with what I remember, especially the 1999 team, my sentimental favorite of the Carr years and one that got so tantalizingly close, but just didn’t make it. 2005 and 2007 also factor in as IVG troughs, but are broken up by 2006, which got a very high IVG score.
So what does this all mean? Some things should already be obvious—Carr had some good years and some bad years, The Horror was horrific, 1997 was awesome, etc. On the other hand, the strongly suggest the “early good/late senile” theory is a myth. Statistically speaking, it didn’t shake out that way. Doesn’t mean we can’t, or shouldn’t, criticize some aspects of Carr’s head coaching career—but let’s look at it dispassionately. The man gave us some great years, and some disappointing ones; they were just more evenly distributed than we remember them.
If enough people want, I’ll do a second round looking at only Big 10 games for Carr. Additionally, I’ve already collected the data for Rodriguez’s 3 years, and thought I could do Moeller’s 4 as well. It’s a lot of work, so I doubt I’ll ever expand to include other teams, though if anyone else finds it interesting enough, I’d be happy to share the methodology.
*True, this violates assumptions of sufficient randomness and sample size, so it’s not conclusive. But it does show that there’s no evident trend among the small number of data points we have.