Mythbusting: Lloyd Carr Edition

Submitted by Eye of the Tiger on July 30th, 2011 at 1:38 PM

There’s a widespread theory that Lloyd Carr’s career can be split in two phases: a “good” young Carr and a “senile” old Carr. But is it statistically sound?

If you look at straight winning percentage, this seems, well, inconclusive.  But, the argument goes, there’s more to success or failure than just winning more than you lose.  There’s whom you beat, and who beats you. There’s wins-versus-expectations-of-wins. There’s where you end up ranked. There’s whether you play in a major bowl game, and if you win it. Most importantly, there are those three pesky rivalries, particularly the one with Columbus.

As some have argued, Carr put really competitive teams on the field early on, but later ones tended to disappoint, to flag late in the game, and to underachieve. The four-game stretch between OSU 2006 and Oregon 2007, it has been said, is the worst in recent memory, and this is mentioned as proof that Lloyd Carr had lost it at the end. But did he really?

You could try answering this with winning percentages, bowl appearances and clever argumentation, but mgoblog is a well-known haven for quantification nerds, whose denizens crave robust new measures that capture things the dinostats can’t. After all, aren’t away wins more dramatic than home wins, and home losses more embarrassing than the away ones? Isn’t it more consequential to lose in-conference than outside of it? Doesn’t it feel just that much better to beat Sparty than Purdue? Notre Dame than Illinois?  OSU than everyone? My mission was to create new indices of success, measured across the course of a season, that capture more than just wins and losses—also heights soared to, depths plumbed, the intangibles. I created two, which are related to one another, but capture somewhat different aspects of success or failure. 

Constructing the Indices

Constructing the indices begin with regular season games. A baseline score is produced for wins and losses, valued at 10 and 0 respectively. To this baseline measure, a series of intangible weights are added for all regular season games. It’s all a little long-winded for here, but I can make it available to anyone who wants to know. The categories are: 1) Who the opponent is; 2) Relative ranking to UM; 3) Home/away; 4) Margin-of-victory; and 5) Performance versus expectations. All scores are ordinal, so it required some subjective decisions on relative worth of these categories, but the same criteria were applied to each case, so it should be reasonably objective.

Let me break down a couple. The best single-game score for the period 1994-2010 was the ecstatic 1996 win at Ohio State, which received a score of 21:

10 (win) + 5 (OSU) + 3 (top 5 opponent) + 1 (away win) + 0 (win by less than 20) + 2 (performed well above expectations) = 21

The worst single-game score (surprise surprise) is The Horror, which received a score of -8.  It breaks down like this:

0 (loss) - 0 (non-conference, non-BCS game) -5 (lower league + FCS opponent) -1 (home loss) -0 (loss by less than 20) -2 (performed well below expectations) = -8

Bowl Games and Ranking Bonuses

Bowl games are treated somewhat differently. On the one hand, it’s not right to penalize a team for what’s basically a value-added bonus to the season. On the other, winning is still better than losing. So scoring looks like this:

+5: making any bowl game

+2: making a BCS bowl game

+5: winning the bowl game

+2: winning a BCS bowl game

+/-2: failing to meet/exceeding expectations (broadly defined)

Some examples: 1997 vs. Washington State = 14; 2001 vs. Tennessee = 3; 2004 vs. Texas = 7; and 2007 vs. Florida = 12. 

Ranking bonus averages the final BCS and AP rankings, or if prior to the BCS, the Coaches Poll and AP rankings.  It works like this:

Unranked: 0

+2: 20-25

+3: 19-15

+4: 14-10

+5: 9-5

+6: 4

+7: 3

+8: 2

+9: 1

Which are then granted a bonus or penalty based on preseason expectations. So the 1996 team, with a preseason ranking of 12/11, and which ended up with a final rank of 20, gets a penalty of 1 for ending up below preseason expectations: 2 – 1 = 1. 1997, which ended up with a rank of 1 (we all know the Coaches’ Poll was fixed), began with a preseason rank of 13/14, so that team gets this bonus: 9 + 2 = 11.


Total points are added together, and then divided by the number of games played to produce the expected value per game (EVG). The intangible value per game (IVG) index compares subtracts the baseline value for 10 per win with no intangibles, and 0 per loss with no intangibles from total points, and then divides by number of games played (with a 0 value for a missed bowl game). This measures the intangibles solely. Yes, wins produce more positive scores (and losses negative scores), but this measure basically measures elation minus disappointment. As you’ll see, the distributions are similar, but actually more variant than EVG. 

Winning PCT, IVG and EVG by Year, 1995-2007

As you can see, EVG and IVG capture more fluctuation from season to season than straight winning percentage does. IVG is something of a counterbalance to Winning PCT, looking solely at the aforementioned intangibles. EVG takes both into account.


A number of things are immediately apparent.

1. EVG and IVG capture more fluctuations than Winning PCT. Carr had an average winning percentage of 0.753. There were 5 seasons when Carr’s teams beat this average, 2 which were basically at the average, and 6 below it.

By contrast, only 4 seasons beat the average EVG of 9.51, while 9 fell below it (while 5 seasons beat the average IVG of 1.99 in terms of IVG, and 8 fell below it). As you can see, there are more discernable peaks and troughs in these indicators than with straight Winning PCT. EVG in particular appears to successfully capture the big picture while taking the significance of individual games into account.   

2. Though The Horror was the single-worst game of the Carr era, 2007 as a whole wasn’t Carr’s worst season. It was still on the bottom half of the Carr years, but in terms of EVG it was third worst, after 2001 and 2005. In terms of IVG, it was only fourth worst, after 2001, 2002 and 2005. By EVG, 2005 was Carr’s worst season; by IVG, it’s 2002.

3. Carr’s career does not divide neatly into a “good” early period and a “senile” later period. As the figures show, Carr’s career had four peaks—1997, 1999, 2003 and 2006.  By both measures, 1997 was far and away his best season. I had thought that the intangibles might have elevated Tom Brady’s near-NC year in 1999 and/or the Navarre-led 2003 squad that lost to (compliance-dodging) AP national champion USC in the Rose Bowl above the 2006 squad, but they don’t. 2006 scores as Carr’s second best according to Winning PCT and IVG, and third according to EVG. What’s more, when I ran a regression of EVG and IVG by year for 1995-2007, neither produced a statistically significant result, meaning there’s no clear upward or downward trend over time during this period.* Unless we decide to completely ignore the great 2006 team, or some of the disappointing teams from earlier in his career, the good/senile theory looks like a myth we can safely bust.

4. Carr’s teams were most consistent in the middle of his tenure. In terms of EVG, we can say that the years 1995-2000 were more consistent, and less prone to dramatic fluctuations from year to year, than 2001-2007. While not quite good/senile, this does potentially lend itself to critical arguments. With IVG, there’s a sustained trough in the middle (1998-2002), which reflects higher expectations due to the 1997 national championship and too many losses to Michigan State, Notre Dame and marquee non-conference opponents. That makes them the most disappointing stretch of years, when solely considering results versus expectations. That jives with what I remember, especially the 1999 team, my sentimental favorite of the Carr years and one that got so tantalizingly close, but just didn’t make it. 2005 and 2007 also factor in as IVG troughs, but are broken up by 2006, which got a very high IVG score.

So what does this all mean?  Some things should already be obvious—Carr had some good years and some bad years, The Horror was horrific, 1997 was awesome, etc. On the other hand, the strongly suggest the “early good/late senile” theory is a myth.  Statistically speaking, it didn’t shake out that way. Doesn’t mean we can’t, or shouldn’t, criticize some aspects of Carr’s head coaching career—but let’s look at it dispassionately.  The man gave us some great years, and some disappointing ones; they were just more evenly distributed than we remember them.   

Next Steps?

If enough people want, I’ll do a second round looking at only Big 10 games for Carr.  Additionally, I’ve already collected the data for Rodriguez’s 3 years, and thought I could do Moeller’s 4 as well. It’s a lot of work, so I doubt I’ll ever expand to include other teams, though if anyone else finds it interesting enough, I’d be happy to share the methodology.

*True, this violates assumptions of sufficient randomness and sample size, so it’s not conclusive. But it does show that there’s no evident trend among the small number of data points we have.



July 30th, 2011 at 1:53 PM ^

I commend the effort, but these aren't statistics. These are arbitrary indices of performance. What if I want to weight bowl games and games against Ohio State three times more heavily than you did, why shouldn't I be able to do that?

Eye of the Tiger

July 30th, 2011 at 2:46 PM ^

I decided not to give negative scores for bowl games, because that would penalize a team for making the bowl game versus one that did not. 

An alternative way of doing this would be to score the bowl games just like the rest, but then add a weight penalizing not making one, and rewarding making a BCS caliber game/making a game where the national championship is at stake. 

However, means for indices do not generally need to be zero, by the way.  They simply need to be normally distrubuted when N is greater or equal to 200 and there is sufficient randomness.  I'll reach N=200 when I do Mo and RR's years. 



July 31st, 2011 at 1:17 AM ^

I think it'd be more interesting to include the bowl game. You can ignore the issue of not making a bowl game in this case, since Lloyd never had that problem (to make the statistic more generalizable is good, but to answer the question you're posing it's not necessary).

There is a bowl-related issue, that always intrigues me with comparing seasons: if you're a better team, you ought to play a better opponent in the bowl. If you're a worse (but still bowl eligible team), you ought to play a worse team in the bowl. Comparing opponents across years always comes up with more or less difficult years, obviously, but it may be particularly difficult to know how to account for a bowl game that attempts to always match fairly even teams.

Eye of the Tiger

July 30th, 2011 at 2:41 PM ^

More than half of all indicators used in sports or social research rely on human, qualitative assessments.  What makes them statistically useful is when they are applied evenly across all cases.  For example, a public opinion poll might ask "do you feel more or less at risk of job loss than you did last year?"  In football, penalties are awarded by human beings, or "refs."  Batting average in baseball relies on decisions of umpires, not computer-accurate assessments of whether or not a ball was in or out of the strike zone. 

These are basic examples.  Indices are more complicated, but are standard practice when trying to compare things outside normal data collection, by creating new indicators based on standardized performance measures. 

If you are interested in looking deeper into this, here's a good example:…

Eye of the Tiger

July 30th, 2011 at 2:57 PM ^

The truism is that when you look deeper into how we performed, game by game and in terms of expectations vs. results, you'd see something different from existing statistics.  I found, when looking closely at the intangibles, that you do see some variation, but it doesn't support the idea of early = good/late = bad. 

So it's a "confirm other existing theory" kind of result.

Also, it's just something I did for fun--the fact that it's got neither an N of 200 nor sufficient randomness makes it problematic for real statistical analysis.  Could get those by expanding the data, but that's too much work for me :)


July 30th, 2011 at 2:29 PM ^

I think you took a really interesting look at those years, but I don't think I would interperet your methodology (or what I gather from what you explained of it) as how good a team (or certainly coach) is.  It seems more accurate to say that your method measures the fan's feeling of how good a team is, not how good it actually is. 

That is not to say that it is not an interesting method, however. Your intangables are all centered around the perception of a fan, so it could be interesting to look at a similar method centered around the perception of someone who is outside the fanbase, and doesn't care about OSU, etc. 

Alternatively, applying similar ideas to other programs might be a good way to compare and see which program is the "easiest" to be a fan of.



July 30th, 2011 at 2:55 PM ^

This is a good way to measure how desirable the football teams performance was for the fans, not how talented the football team itself was.

Before you call me a hater, I think EotT did an excellent job with what he is doing, but as pointed out below the process you used to weight bowl games and so forth are pretty debatable.  For example, I would rather lose at home to IOWA than lose an away game in Camp Randall.  My reasoning for this is that Camp Randall has a large, obnoxious student section.  It makes me feel more happy and the high from the victory lasts longer than the burn would last from losing to IOWA at home.  The problem with judging how desirable a team is to watch by a fan is that every fan desires to see something different, a Michigan fan who grew up in West Lafayette (like myself) would rather see Michigan beat Purdue than Illinois, therefore when I make my chart it would be skewed a little bit different.

All that being said, you did an excellent job, and I appreciate the time you have spent on this.

Eye of the Tiger

July 30th, 2011 at 3:01 PM ^

So in-conference rivalries get +/-2 regardless of which one it is...except for MSU (+/-4) and OSU (+/-5) and ND (+/- 4), reflecting their rivalry status.  Home/away is also standardized, so a home loss is -1 extra over an away loss, while an away win is +1 over a home win. 

For "expectations," I looked at relative ranking, the point spread and so on in order to gauge whether we were favored or not.  Beating expectations got a bonus; failing to meet them got a penalty.



July 30th, 2011 at 3:13 PM ^

The 96 win against OSU is worth 21 points (okay) which was the greatest win in the Carr years (okay)
<br>The 07 horror loss is only -8?!?!
<br>That's a spread of +13 to the good so of course it's going to be weighted positively.
<br>The horror should be worth -21 if not more. But that's just like my opinion man.

Eye of the Tiger

July 30th, 2011 at 3:31 PM ^

Whether the proverbial "0" is at 0 or not, is not important.  What matters is a normal distribution around it.  IVG is distributed around 0, though.  

The reason why the EVG scores are weighted as they are is to approximate the way a single game factors into winning pct, i.e. win = 1.000 and loss = 0.000.  Or, by a factor of 10, win = 10 and loss = 0.  

The index for EVG goes lower than 0 and higher than 10, in order to capture how intangibles make some losses "worse than 0," others "not quite as bad as 0," and some wins "still disappointing" and others "better than just a win."

The reason why the range doesn't--in practical terms--go from -8 to 18, or -11 to 21 is because when I tried to look at things through standardized awards/penalties applied evenly to all cases, I didn't find The Horror's penalties to quite match OSU 1996's awards.  

The disparity comes in large part because our rivalry game against OSU is, from year to year, much more important than a non-conference opponent from an inferior conference.  That is a 5 points range right there.  Then, losing to FCS gets a -5 while beating a top 5 team gets a +3.  So now we're at 2 points farther above 10 than The Horror is below 0.  Add to that the fact that The Horror was a home los and you get arange between the two games from -8 to 21.  







Eye of the Tiger

July 30th, 2011 at 3:33 PM ^

Might as well post it:




W = 10; L = 0


Opponent type


Non-C = 0

BCS Non-C = +/-1

B10 = +/-2

ND/MSU = +/-4

OSU = +/- 5




Home win/away loss = 0

Home loss = -1

Away win = +1


Subjective: performance vs. expectations


Well below: -2

Below expectations: -1

Broadly within expectations: 0

Above expectations: +1

Significantly above: +2


Opponent vs. Ranking (all unranked treated as "26")


Loss against FCS: -5

Loss against lower FBS: -3

Loss against unranked when ranked/lower ranked: -1

Win/loss against opponent of equal ranking: 0

Win against ranked/higher ranked: +1

Win against top 5: +3

Win against #1: +5


Margin of victory/loss


Blowout loss: -1

Close loss/close win: 0

Blowout win: +1


[IVG is just the intangible stuff, and no baseline assessment to follow scores for individual games within Winning PCT]

skunk bear

July 30th, 2011 at 4:30 PM ^

During his retirement speech, Lloyd talked about leaving the program in better shape than he found it.

Clearly, he did not do this.

This, for me, is the biggest difference between "early" Lloyd and "late" Lloyd.

True Blue Grit

July 30th, 2011 at 6:28 PM ^

But I still feel the "young Carr" vs. "older Carr" theory described in the first sentence in the post is accurate - overall.  There were not many ugly losses in the earlier years.   More in later years.  In his first 6 years, he only had what I would call one embarassing loss - the Syracuse game in 1998 against Donovan McNabb that still makes my skin crawl.  Then, you have until 2001 until you find the next awful lost - the Citrus Bowl blowout against Tennessee.   The homecoming loss to Iowa in 2002 was extremely ugly.  But really, up until 2007, there weren't many bad losses.  Still his abysmal record in the 2000's against OSU has to count heavily. 

On the positive side, Lloyd ran a tight ship.  And he had a lot of great wins too over his tenure.  I think near the end, recruiting did fall off in the last few years.  And clearly, Big Ten defensive coordinators all had "the book" on how to stop Michigan's offense in the last number of years.  They couldn't always do it because we still had formidable talent much of the time. 

The debate with Lloyd will always center around whether he got the most out of what he had.   You can make strong arguments he didn't much of the time.  But, I still enjoyed a lot of the great games during his tenure. 

And we should all appreciate Lloyd's 13 years vs. the last three. 


July 30th, 2011 at 7:00 PM ^ this simple fact.

Michigan won a national championship in 1997 and didn't win another one for the remainder of Carr's tenure.

Look at my NACDA points chart. During the second half of his tenure (2002-07), Carr's team earned ~/>50 points 5 times with 2 Big Ten championships, 0 national championships, and 1 subpar season. During the first half of his tenure (1995-2001), his teams earned ~/>50 points 5 times with 3 Big Ten championships, 1 national championship, and 1 subpar season.

There's one striking difference between the first 7 years and the final 6 years.


July 30th, 2011 at 10:39 PM ^

Think I'm going to make a quantitative presentation on how much sex people who make quantitative presentations about football actually have.

Initial projection: Low

Elno Lewis

July 31st, 2011 at 10:00 AM ^

I think Carr was a great coach.  I just do.  Yeah, maybe wins and losses not great--maybe--but he was a stand up dude in what has proven to be a slimely business, and won an NC in the process.  It can't just be about wins and losses.  That's for the NFL.


July 31st, 2011 at 10:53 AM ^

All of the stats and graphs in the world can't erase the memory of Carr's last team starting out the season with the Horror, followed by the Oregon Debacle.  It also can't erase the memory of his record against Jim Tressel, which happened during the later part of his tenure.  

Sorry, but this "mythbusting" looks like just another opinion to me.  I admire your passion, though.


Eye of the Tiger

July 31st, 2011 at 12:19 PM ^

The idea here was to go past selective memory and see what happens if you rate every game according to standardized criteria on performance.  If you do that, The Horror and Oregon do stand out as the worst 2-game stretch under Carr (scores of -8 and -5).  But let's put it in perspective.  The Horror was sort of a unique loss, and the only one that even comes close in embarrassment terms is the 2008 loss to Toledo.  But it was also a game that didn't mean much in terms of our bowl trajectory.  Oregon was more consequential, but still less than an in-conference game.  Neither of those games had as much impact on our bowl trajectory as the losses to Wisconsin and OSU that year. 


That team also did beat ND 38-0, win 8 straight and beat Florida with Heisman winner Tim Tebow.  It still scores at the low end of Carr's tenure, but that makes it the second or third worst team, depending on which index you follow--not the worst.  






coastal blue

July 31st, 2011 at 1:24 PM ^

That loss stands alone as the single most destructive upset in college football history. So many of you are excited that Michigan is setting itself up with a "traditional guy" and at the same time fail to realize that that tradition fell away in 2007. 

You can look at a game like Stanford upsetting USC or Michigan losing to Toledo and say "See, Sagarin/Vegas/Whatever proves those were bigger upsets, so it's really not that big of a deal!", but it's not true.

While Stanford was a terrible team and USC was a very good team, they are still in the same conference. They play frequently. There is some semblence of equality. Two years later, Stanford beat USC again and then again last year. While it was a shocker it didn't have the same effect that losing to Appalachian State did on Michigan.

That game effectively ended the whole idea that "tradition" means anything once you step on the football field. The winged helmets were there. The banner was there. The Big House was full. Our team was stocked with NFL talent. We were ranked #5 in the country. And we lost to an FCS team. In the Big House. That day ensured that no team would ever fear coming to Ann Arbor to play football for a long time. Until we rip off a streak of 25+ wins or so at home, that fear won't be back. 

That's why when people say that losing to Toledo was worse, it comes off as the worst kind of "Pro everything Lloyd, shit on everything Rodriguez" commentary you can think of. We were a 3-9 team. We lost to another 3-9 team. And I garauntee you that Toledo had thoughts of Appalachian State in their heads before they played us that day. Every underdog will. 

To try and quantify that loss is futile. 


July 31st, 2011 at 3:46 PM ^

I think it is generally accepted that the apex of the Carr years starts in late 96 and ends after Henson left in 2000. Oddly, you have highlighted some of those very years as the worst. I believe that this circumstance can be explained by the false friction you've created from increased expectation and coaching ability.

Expectation is subjective and a sign of successful coaching. Your analysis uses the high expectations that Carr created against him. Those expectations were, as you note, highest after Carr's greatest success--the 97 season. The three years that followed were some of Carr's most successful seasons. Michigan went 29-8 over that three year span, were 2-1 over OSU, and ended each of those three seasons beating an SEC team in a major bowl. The fact that Michigan won the it all in 97 and were so good that people expected more from them is not a knock on Carr. You've created a system that sets the successful up for failure because there is no where to go but down.

On a lighter note, purporting to quantitate intagibles is kind of oxymoronic.

Eye of the Tiger

August 1st, 2011 at 12:08 PM ^

Take a look at them again.  1999 is one of the four "peak years" according to both EVG and IVG (the others are 1997, 2003 and 2006), and the other years--1998 and 2000--are only very slightly below average.  This is also true if you just look at winning percentage.  

In terms of all three indicators, 1998 and 2000 score as rather average.  1998 wasn't really that great of a season, as we lost to ND, to Syracuse and to OSU.  Nor was 2000: we lost to UCLA, Purdue, Northwestern.  While both very good seasons if you compare them to 2005, 2007 or to the Rich Rodriguez years (which I haven't included here), they were average for Carr.  Actually slightly below average.     

That said, I don't know how you concluded that the graphs say 1997-2000 was "the worst."  1997 and 1999 are 2 of the 4 best seasons we had under Carr, according to all three graphs I  included.  

On the other hand, both EVG and IVG suggest that 2000-2002 was Carr's longest trough (though not necessarily deepest).  The index that only measures intangibles (like losing at home, losing to rivals or failing to meet expectations) says 2002 was our worst season under Carr.  Keep in mind, this ONLY measures "intangibles."

 EVG, which that takes both these and wins/losses into account suggests it was the third worst (after 2005 and 2007).  


Eye of the Tiger

August 1st, 2011 at 12:08 PM ^

Take a look at them again.  1999 is one of the four "peak years" according to both EVG and IVG (the others are 1997, 2003 and 2006), and the other years--1998 and 2000--are only very slightly below average.  This is also true if you just look at winning percentage.  

In terms of all three indicators, 1998 and 2000 score as rather average.  1998 wasn't really that great of a season, as we lost to ND, to Syracuse and to OSU.  Nor was 2000: we lost to UCLA, Purdue, Northwestern.  While both very good seasons if you compare them to 2005, 2007 or to the Rich Rodriguez years (which I haven't included here), they were average for Carr.  Actually slightly below average.     

That said, I don't know how you concluded that the graphs say 1997-2000 was "the worst."  1997 and 1999 are 2 of the 4 best seasons we had under Carr, according to all three graphs I  included.  

On the other hand, both EVG and IVG suggest that 2000-2002 was Carr's longest trough (though not necessarily deepest).  The index that only measures intangibles (like losing at home, losing to rivals or failing to meet expectations) says 2002 was our worst season under Carr.  Keep in mind, this ONLY measures "intangibles."

 EVG, which that takes both these and wins/losses into account suggests it was the third worst (after 2005 and 2007).  



August 1st, 2011 at 12:29 PM ^

First, please stop using the word “senile.” In Carr’s second-to-last year, he was one score away from an undefeated regular season. That doesn’t sound like a senile coach to me.

To establish the validity of your measurements, a longer evaluation period is necessary, going back at least to the Schembechler years. Moeller and Rodriguez were head coaches for only 5 and 3 seasons respectively, making it difficult to perceive significant trends in their tenures.

Another problem is that Michigan fans have outsized expectations. For upwards of 40 years, Michigan fans expected to win almost every game. Although you can never predict which games Michigan will lose, there has been only one undefeated season in the last 40, so the life of a Michigan fan is practically guaranteed to include a lot of disappointing years. How often has Michigan done better than its fans expected? Probably not more than 5 years in the last 40.

You didn’t publish your detailed formulas, but I would want to know whether you’re baking in unrealistic expectations, as fans tend (overwhelmingly) to do.

Eye of the Tiger

August 1st, 2011 at 10:03 PM ^

1. I was being sarcastic/humorous.  I don't think Carr was senile at all, but some people think he "lost it" towards the end.  I don't, and the data here doesn't support that.  

2. I mentioned in the post that I'm planning to go through Mo's and RR's tenures as well

3 and 4.  I posted the methodology in a reply above.  You can see "expectations" only accounts for a 4 pt range.  The other bits are whether game was played home or away, whether the opponent was higher or lower ranked, how vital the game was to our bowl trajectory (plus extra consideration for rivalry games), etc.