Probabilistic Simulation of '09 Football Season
Long-time reader, first-time contributor, frequent cliché abuser. I thought the mgomasses (or at least the mgonerds) might be interested in a probabilistic simulation of M’s 2009 season that I recently generated with SAS. This diary contains in order:
- Back story and discussion of methodology.
- Analysis of whether M will win at least 10 games (including post season).
- Analysis of whether M will have 4 losses in the first 8 games.
- Analysis of Vegas setting M’s over under at 6 regular-season wins.
And I saved the best for last:
- Simulation of 2008 season for comparison with 2009 simulation.
Win at Least 10 Games
In discussing M’s ’09 prospects, one friend guaranteed a 10-win season, and a second immediately proposed a wager. The optimistic friend was very interested in the action, but only if appropriate odds could be assigned. Unable to find such odds on the interwebs, I volunteered to compute them. My qualifications: B.S. Statistics (M ’05), M.S. Market Research (Northwestern ’08).
I used Brian's 2009 Season Prediction on Bucknuts as my guide, which I hope we all agree is an authoritative and relatively objective source.
Brian assigned each game to one of 5 groups based on likelihood of victory. I translated those groups into win probabilities, giving a worst, middle and best case.
This was the most subjective part of the exercise. If you take issue with any of the win probabilities, please opine in the comments. I’m always interested in constructive criticism. Next, I applied the win probabilities to M’s ’09 schedule.
Summing the 13 probabilities for each of the 3 scenarios gives the expected number of wins. If the worst case befalls M in all of ’09’s contests, the expected win total is 6 games (5.75 rounded up). Note, I didn't give a bowl-game win probability in the above table's worst-case scenario, as M didn't technically win enough games to qualify for a bowl (as 6 > 5.75). In later parts of this analysis, if M wins 6 regular-season games in the worst scenario, a 25% bowl-game win probability will be used. In the middle case, M is expected to win 7 games, and in the best case, M is expected to win 8 games. I then used the worst, middle and best win probabilities to simulate seasons. The simulations used random, independent Bernoulli trials to decide game outcomes, and I simulated each season type 50,000 times. The results are summarized in the tables below. Please note, a random trial for the bowl game was only conducted if M won at least 6 regular-season games.
The simulations indicate that the probability of winning at least 10 games is 0.82% in the worst case, 4.56% in the middle case and 19.7% in the best case. I also recalculated the best case with a lower bowl-game win probability of 50%, and the resulting probability of winning at least 10 games is 15.46%. I did this because more wins equals a better bowl opponent, a fact I failed to account for in my original simulation strategy. So, M's chance of winning 10 or more games is around 5% but definitely not more than 20%. If I was making a line right now, I'd put it at 10% (half way between the middle and recalculated best case, as I’m moderately optimistic about ‘09), so the odds are 9:1. 4 Losses in the First 8 Games Unfortunately, that action was too rich for the pessimistic friend, and they settled on 4:1. Then the optimistic friend suggested a hedge bet, whether the 10-win wager would be lost by the PSU game, and I was again called upon to set odds. The phrase “by the PSU game” is vague, so I investigated all possible meanings. The probabilities follow:
Given the probabilities above and my moderate optimism, I set the odds at: Lose 4+ Games in First 8: 3:2 (40%) Lose 4 Games in First 8: 3:1 (25%) PSU is 4th Loss: 7:2 (22%) M’s Over Under at 6 Regular-Season Wins Then, last Thursday, Brian’s post alerted us that Vegas released M’s over under, and I was asked to evaluate against my model. The results:
Comparing the over under with my middle case is probably best, as professional bookmakers are likely less optimistic than this mgofanboy (and certainly more objective than your average buck-stachioed troll or sparty brahsephus). The 6-win middle case isn’t exactly what casinos want to see, but the 7-win alternative below is slightly worse from their perspective, with a lower probability for the “Win Exactly” scenario (a.k.a. casino payday).
So despite Brian’s mild protest, it makes sense that Vegas has us at 6 regular-season wins, and the payouts are on target with the probabilities (i.e. less than even money, -165, on a 48% chance of M winning more than 6 games, and better than even, +135, on a 26% chance that M is under 6). 2008 Season Simulation and Comparison with 2009 After all that work, I decided to investigate if a similar simulation could explain last season’s disaster. Once again, I used Brian as my guide. Last year, he assigned each game to one of 3 groups based on likelihood of victory. Categories equating to “Win or Ann Arbor Burns” and “Eh … Not So Much” were conspicuously missing from his assessment, but given his use of “auto-win” and “auto-loss” in other ’08 predictions (e.g. PSU and Minn), I assume the omission was deliberate, owing to his uncertainty about the ’08 season. Probabilities and expected wins follow.
The number of expected wins is almost identical to ’09 in each category, so the Vegas prediction of 8 regular-season wins was inexcusably optimistic. Too bad my DeLorean’s in the shop; otherwise, I’d get straight Biff Tannen on that under bet. But I digress. On to the more important question, what’s the probability that M would have a 3-9 season? Let’s look at some more tables.
Anyone else see something strange? No? Scroll up and look at the tables again. I’ll wait. You got it now? Great! So yes, as you correctly pointed out, while M had a higher probability of winning 9 or 10 games in ’08 (or even a MNC), the probability of going 3-9 was also higher. Very curious. The reason is a lack of “auto-wins” padding M’s total, and given ’08’s actual outcome, Brian seems prescient in his unwillingness to guarantee victories. The absence of “auto-wins” and “auto-losses” increases the variance of the ’08 win distribution, which creates higher probabilities for both high and low win totals. The ’09 win distribution has less variance, increasing probabilities for middle win totals. For those who took Stats 350, think of this like the difference between the shape of a t-distribution (red) and a normal distribution (blue). What does this all mean? Well, first, M’s ’08 season was a fluke, an anomaly, a statistical improbability. The odds of losing 9 or more games were 30:1. Second, and more importantly, the odds of repeated calamity in ’09 are even lower – 55:1. Awesome! Another catastrophe is highly improbable. Rejoice! Now the bad news: I have no idea how accurate this model is. But applying it to ’06 and ’07, the expected number of total wins for both seasons was 9 in the worst case, 10 in the middle case and 11 in the best case. I’ll spare you the tables since we’re winding down, and I’m lazy, and I have a life. No, srsly, I do. I swear. But I digress, again. If we concede that ’06 qualifies as a best case, given M’s 11-0 start and relatively injury-free season, and all unanimously agree that, between the horror and angry Michigan health hating god, ’07 defines a worst case scenario, then the model is hella accurate. So my call is, M will have 6 or 7 regular season wins in ’09, and 7 wins total including the bowl game. Progress! I hope everyone found this interesting, or at least, not a total waste of time. If you notice mistakes or have ideas for other analysis projects, hit up the comments. One thought is making this more of a true Monte Carlo model by randomizing the use of worst, middle and best win probabilities across the 13 games, simulating any luck or tragedy that might befall M over a season. Thoughts?
Brian’s Win Group | Worst Case | Middle Case | Best Case |
Win or Ann Arbor Burns | 90% | 95% | 99% |
Should Be Victory | 70% | 75% | 80% |
Who Knows? | 45% | 50% | 55% |
Probably Not | 25% | 30% | 45% |
Eh … Not So Much | 10% | 20% | 30% |
This was the most subjective part of the exercise. If you take issue with any of the win probabilities, please opine in the comments. I’m always interested in constructive criticism. Next, I applied the win probabilities to M’s ’09 schedule.
Opponent | Brian's Win Group | Worst Case | Middle Case | Best Case |
WMU | Should Be Victory | 70% | 75% | 80% |
Notre Dame | Probably Not | 25% | 30% | 45% |
EMU | Win or Ann Arbor Burns | 90% | 95% | 99% |
Indiana | Should Be Victory | 70% | 75% | 80% |
MSU | Who Knows? | 45% | 50% | 55% |
Iowa | Probably Not | 25% | 30% | 45% |
DSU | Win or Ann Arbor Burns | 90% | 95% | 99% |
PSU | Eh … Not So Much | 10% | 20% | 30% |
Illinois | Probably Not | 25% | 30% | 45% |
Purdue | Should Be Victory | 70% | 75% | 80% |
Wisconsin | Who Knows? | 45% | 50% | 55% |
OSU | Eh … Not So Much | 10% | 20% | 30% |
Bowl Game | Unknown | N/A | 50% | 75% |
Expected Wins | 5.75 | 6.95 | 8.18 | |
Rounded | 6 | 7 | 8 |
Summing the 13 probabilities for each of the 3 scenarios gives the expected number of wins. If the worst case befalls M in all of ’09’s contests, the expected win total is 6 games (5.75 rounded up). Note, I didn't give a bowl-game win probability in the above table's worst-case scenario, as M didn't technically win enough games to qualify for a bowl (as 6 > 5.75). In later parts of this analysis, if M wins 6 regular-season games in the worst scenario, a 25% bowl-game win probability will be used. In the middle case, M is expected to win 7 games, and in the best case, M is expected to win 8 games. I then used the worst, middle and best win probabilities to simulate seasons. The simulations used random, independent Bernoulli trials to decide game outcomes, and I simulated each season type 50,000 times. The results are summarized in the tables below. Please note, a random trial for the bowl game was only conducted if M won at least 6 regular-season games.
Worst Case Simulation | |||
Total Wins | Simulation Instances | % of Simulations | Probability of at Least That Many Wins |
1 | 40 | 0.08% | 100.00% |
2 | 447 | 0.89% | 99.92% |
3 | 2,302 | 4.60% | 99.03% |
4 | 6,611 | 13.22% | 94.42% |
5 | 12,069 | 24.14% | 81.20% |
6 | 10,199 | 20.40% | 57.06% |
7 | 10,626 | 21.25% | 36.66% |
8 | 5,491 | 10.98% | 15.41% |
9 | 1,804 | 3.61% | 4.43% |
10 | 368 | 0.74% | 0.82% |
11 | 37 | 0.07% | 0.09% |
12 | 5 | 0.01% | 0.01% |
13 | 1 | 0.00% | 0.00% |
Middle Case Simulation | |||
Total Wins | Simulation Instances | % of Simulations | Probability of at Least That Many Wins |
1 | 6 | 0.01% | 100.00% |
2 | 89 | 0.18% | 99.99% |
3 | 794 | 1.59% | 99.81% |
4 | 3,313 | 6.63% | 98.22% |
5 | 8,570 | 17.14% | 91.60% |
6 | 6,543 | 13.09% | 74.46% |
7 | 12,856 | 25.71% | 61.37% |
8 | 10,174 | 20.35% | 35.66% |
9 | 5,377 | 10.75% | 15.31% |
10 | 1,811 | 3.62% | 4.56% |
11 | 423 | 0.85% | 0.93% |
12 | 39 | 0.08% | 0.09% |
13 | 5 | 0.01% | 0.01% |
Best Case Simulation (75% Bowl Win Probability) | |||
Total Wins | Simulation Instances | % of Simulations | Probability of at Least That Many Wins |
1 | 1 | 0.00% | 100.00% |
2 | 7 | 0.01% | 100.00% |
3 | 142 | 0.28% | 99.98% |
4 | 895 | 1.79% | 99.70% |
5 | 3,626 | 7.25% | 97.91% |
6 | 2,027 | 4.05% | 90.66% |
7 | 9,489 | 18.98% | 86.60% |
8 | 12,709 | 25.42% | 67.63% |
9 | 11,252 | 22.50% | 42.21% |
10 | 6,815 | 13.63% | 19.70% |
11 | 2,472 | 4.94% | 6.07% |
12 | 518 | 1.04% | 1.13% |
13 | 47 | 0.09% | 0.09% |
Best Case Simulation (50% Bowl Win Probability) | |||
Total Wins | Simulation Instances | % of Simulations | Probability of at Least That Many Wins |
1 | 0 | 0.00% | 100.00% |
2 | 9 | 0.02% | 100.00% |
3 | 116 | 0.23% | 99.98% |
4 | 893 | 1.79% | 99.75% |
5 | 3,594 | 7.19% | 97.96% |
6 | 4,239 | 8.48% | 90.78% |
7 | 10,551 | 21.10% | 82.30% |
8 | 12,724 | 25.45% | 61.20% |
9 | 10,144 | 20.29% | 35.75% |
10 | 5,434 | 10.87% | 15.46% |
11 | 1,870 | 3.74% | 4.59% |
12 | 399 | 0.80% | 0.85% |
13 | 27 | 0.05% | 0.05% |
The simulations indicate that the probability of winning at least 10 games is 0.82% in the worst case, 4.56% in the middle case and 19.7% in the best case. I also recalculated the best case with a lower bowl-game win probability of 50%, and the resulting probability of winning at least 10 games is 15.46%. I did this because more wins equals a better bowl opponent, a fact I failed to account for in my original simulation strategy. So, M's chance of winning 10 or more games is around 5% but definitely not more than 20%. If I was making a line right now, I'd put it at 10% (half way between the middle and recalculated best case, as I’m moderately optimistic about ‘09), so the odds are 9:1. 4 Losses in the First 8 Games Unfortunately, that action was too rich for the pessimistic friend, and they settled on 4:1. Then the optimistic friend suggested a hedge bet, whether the 10-win wager would be lost by the PSU game, and I was again called upon to set odds. The phrase “by the PSU game” is vague, so I investigated all possible meanings. The probabilities follow:
Lose 4+ Games in First 8 | |
Worst Case | 58% |
Middle Case | 43% |
Best Case Recalculated | 23% |
Lose Exactly 4 Games in First 8 | |
Worst Case | 33% |
Middle Case | 29% |
Best Case Recalculated | 18% |
PSU is 4th Loss | |
Worst Case | 31% |
Middle Case | 26% |
Best Case Recalculated | 16% |
Given the probabilities above and my moderate optimism, I set the odds at: Lose 4+ Games in First 8: 3:2 (40%) Lose 4 Games in First 8: 3:1 (25%) PSU is 4th Loss: 7:2 (22%) M’s Over Under at 6 Regular-Season Wins Then, last Thursday, Brian’s post alerted us that Vegas released M’s over under, and I was asked to evaluate against my model. The results:
Worst Case | |
Win Under 6 | 42.94% |
Win Exactly 6 | 27.13% |
Win Over 6 | 29.93% |
Middle Case | |
Win Under 6 | 25.54% |
Win Exactly 6 | 26.07% |
Win Over 6 | 48.39% |
Best Case Recalculated | |
Win Under 6 | 9.22% |
Win Exactly 6 | 16.85% |
Win Over 6 | 73.93% |
Comparing the over under with my middle case is probably best, as professional bookmakers are likely less optimistic than this mgofanboy (and certainly more objective than your average buck-stachioed troll or sparty brahsephus). The 6-win middle case isn’t exactly what casinos want to see, but the 7-win alternative below is slightly worse from their perspective, with a lower probability for the “Win Exactly” scenario (a.k.a. casino payday).
Middle Case | |
Win Under 7 | 51.61% |
Win Exactly 7 | 25.40% |
Win Over 7 | 22.99% |
So despite Brian’s mild protest, it makes sense that Vegas has us at 6 regular-season wins, and the payouts are on target with the probabilities (i.e. less than even money, -165, on a 48% chance of M winning more than 6 games, and better than even, +135, on a 26% chance that M is under 6). 2008 Season Simulation and Comparison with 2009 After all that work, I decided to investigate if a similar simulation could explain last season’s disaster. Once again, I used Brian as my guide. Last year, he assigned each game to one of 3 groups based on likelihood of victory. Categories equating to “Win or Ann Arbor Burns” and “Eh … Not So Much” were conspicuously missing from his assessment, but given his use of “auto-win” and “auto-loss” in other ’08 predictions (e.g. PSU and Minn), I assume the omission was deliberate, owing to his uncertainty about the ’08 season. Probabilities and expected wins follow.
Opponent | Brian's Win Group | Worst Case | Middle Case | Best Case |
Utah | Tossup | 45% | 50% | 55% |
Miami(OH) | Probable Win | 70% | 75% | 80% |
Notre Dame | Tossup | 45% | 50% | 55% |
Wisconsin | Probable Loss | 25% | 30% | 45% |
Illinois | Probable Loss | 25% | 30% | 45% |
Toledo | Probable Win | 70% | 75% | 80% |
PSU | Probable Loss | 25% | 30% | 45% |
MSU | Tossup | 45% | 50% | 55% |
Purdue | Probable Win | 70% | 75% | 80% |
Minnesota | Probable Win | 70% | 75% | 80% |
NW | Probable Win | 70% | 75% | 80% |
OSU | Probable Loss | 25% | 30% | 45% |
Bowl Game | Unknown | N/A | 50% | 75% |
Expected Wins | 5.85 | 6.95 | 8.20 | |
Rounded | 6 | 7 | 8 |
The number of expected wins is almost identical to ’09 in each category, so the Vegas prediction of 8 regular-season wins was inexcusably optimistic. Too bad my DeLorean’s in the shop; otherwise, I’d get straight Biff Tannen on that under bet. But I digress. On to the more important question, what’s the probability that M would have a 3-9 season? Let’s look at some more tables.
Worst Case Simulation | ||||
Total Wins | Probability of Exact Win Total | Probability of at Least That Many Wins | ||
2008 | 2009 | 2008 | 2009 | |
0 | 0.02% | 0.00% | 100.00% | 100.00% |
1 | 0.22% | 0.08% | 99.98% | 100.00% |
2 | 1.38% | 0.89% | 99.76% | 99.92% |
3 | 5.17% | 4.60% | 98.39% | 99.03% |
4 | 12.97% | 13.22% | 93.22% | 94.42% |
5 | 21.87% | 24.14% | 80.24% | 81.20% |
6 | 18.47% | 20.40% | 58.38% | 57.06% |
7 | 20.52% | 21.25% | 39.91% | 36.66% |
8 | 12.20% | 10.98% | 19.39% | 15.41% |
9 | 5.40% | 3.61% | 7.19% | 4.43% |
10 | 1.47% | 0.74% | 1.79% | 0.82% |
11 | 0.29% | 0.07% | 0.31% | 0.09% |
12 | 0.02% | 0.01% | 0.03% | 0.01% |
13 | 0.00% | 0.00% | 0.00% | 0.00% |
Middle Case Simulation | ||||
Total Wins | Probability of Exact Win Total | Probability of at Least That Many Wins | ||
2008 | 2009 | 2008 | 2009 | |
0 | 0.01% | 0.00% | 100.00% | 100.00% |
1 | 0.07% | 0.01% | 99.99% | 100.00% |
2 | 0.50% | 0.18% | 99.93% | 99.99% |
3 | 2.52% | 1.59% | 99.42% | 99.81% |
4 | 7.87% | 6.63% | 96.90% | 98.22% |
5 | 16.61% | 17.14% | 89.03% | 91.60% |
6 | 11.76% | 13.09% | 72.42% | 74.46% |
7 | 23.57% | 25.71% | 60.66% | 61.37% |
8 | 19.63% | 20.35% | 37.09% | 35.66% |
9 | 11.43% | 10.75% | 17.47% | 15.31% |
10 | 4.66% | 3.62% | 6.04% | 4.56% |
11 | 1.18% | 0.85% | 1.38% | 0.93% |
12 | 0.18% | 0.08% | 0.20% | 0.09% |
13 | 0.01% | 0.01% | 0.01% | 0.01% |
Best Case Simulation (75% Bowl Win Probability) | ||||
Total Wins | Probability of Exact Win Total | Probability of at Least That Many Wins | ||
2008 | 2009 | 2008 | 2009 | |
0 | 0.00% | 0.00% | 100.00% | 100.00% |
1 | 0.01% | 0.00% | 100.00% | 100.00% |
2 | 0.07% | 0.01% | 99.99% | 100.00% |
3 | 0.60% | 0.28% | 99.93% | 99.98% |
4 | 2.56% | 1.79% | 99.33% | 99.70% |
5 | 7.52% | 7.25% | 96.77% | 97.91% |
6 | 4.12% | 4.05% | 89.25% | 90.66% |
7 | 17.63% | 18.98% | 85.12% | 86.60% |
8 | 23.60% | 25.42% | 67.50% | 67.63% |
9 | 22.06% | 22.50% | 43.90% | 42.21% |
10 | 13.99% | 13.63% | 21.84% | 19.70% |
11 | 6.19% | 4.94% | 7.85% | 6.07% |
12 | 1.49% | 1.04% | 1.66% | 1.13% |
13 | 0.17% | 0.09% | 0.17% | 0.09% |
Best Case Simulation (50% Bowl Win Probability) | ||||
Total Wins | Probability of Exact Win Total | Probability of at Least That Many Wins | ||
2008 | 2009 | 2008 | 2009 | |
0 | 0.00% | 0.00% | 100.00% | 100.00% |
1 | 0.01% | 0.00% | 100.00% | 100.00% |
2 | 0.09% | 0.02% | 99.99% | 100.00% |
3 | 0.55% | 0.23% | 99.90% | 99.98% |
4 | 2.59% | 1.79% | 99.35% | 99.75% |
5 | 7.89% | 7.19% | 96.76% | 97.96% |
6 | 7.89% | 8.48% | 88.86% | 90.78% |
7 | 19.75% | 21.10% | 80.98% | 82.30% |
8 | 23.52% | 25.45% | 61.22% | 61.20% |
9 | 20.18% | 20.29% | 37.70% | 35.75% |
10 | 11.58% | 10.87% | 17.52% | 15.46% |
11 | 4.76% | 3.74% | 5.94% | 4.59% |
12 | 1.04% | 0.80% | 1.18% | 0.85% |
13 | 0.14% | 0.05% | 0.14% | 0.05% |
Anyone else see something strange? No? Scroll up and look at the tables again. I’ll wait. You got it now? Great! So yes, as you correctly pointed out, while M had a higher probability of winning 9 or 10 games in ’08 (or even a MNC), the probability of going 3-9 was also higher. Very curious. The reason is a lack of “auto-wins” padding M’s total, and given ’08’s actual outcome, Brian seems prescient in his unwillingness to guarantee victories. The absence of “auto-wins” and “auto-losses” increases the variance of the ’08 win distribution, which creates higher probabilities for both high and low win totals. The ’09 win distribution has less variance, increasing probabilities for middle win totals. For those who took Stats 350, think of this like the difference between the shape of a t-distribution (red) and a normal distribution (blue). What does this all mean? Well, first, M’s ’08 season was a fluke, an anomaly, a statistical improbability. The odds of losing 9 or more games were 30:1. Second, and more importantly, the odds of repeated calamity in ’09 are even lower – 55:1. Awesome! Another catastrophe is highly improbable. Rejoice! Now the bad news: I have no idea how accurate this model is. But applying it to ’06 and ’07, the expected number of total wins for both seasons was 9 in the worst case, 10 in the middle case and 11 in the best case. I’ll spare you the tables since we’re winding down, and I’m lazy, and I have a life. No, srsly, I do. I swear. But I digress, again. If we concede that ’06 qualifies as a best case, given M’s 11-0 start and relatively injury-free season, and all unanimously agree that, between the horror and angry Michigan health hating god, ’07 defines a worst case scenario, then the model is hella accurate. So my call is, M will have 6 or 7 regular season wins in ’09, and 7 wins total including the bowl game. Progress! I hope everyone found this interesting, or at least, not a total waste of time. If you notice mistakes or have ideas for other analysis projects, hit up the comments. One thought is making this more of a true Monte Carlo model by randomizing the use of worst, middle and best win probabilities across the 13 games, simulating any luck or tragedy that might befall M over a season. Thoughts?
Your optimistic friend should be commended for his courage
. . . and then bet against him.
Of course, if the optimistic friend proves prescient and RR wins 10-11 games and makes a BCS bowl, the only question remaining is how tall to build his statue and how many virgins to sacrifice to it in following seasons.
Lame attempts at humor aside, I think 10 wins is as likely as pigs flying, aged alumni wearing maize to Maize-Out games, and my wife voting for Palin in 2012.
This is a really impressive analysis, but I am at least as curious about the whole foundation for it - how you assigned the win/loss prospects for each game. You think that Purdue is a 'should be' victory? I'd like to know why, for example. This is a very thorough quantitative analysis built upon an unjustified (but *not* unjustifiable) qualitative assessment. Just my e-pinion. Thanks for the hard work.
But, unfortunately, unlike basketball the football season is too short to really construct a very accurate model of prediction. I don't know what else we can do besides apply some logical guesses.
Kenpom, on the other hand, doesn't begin to guess outcomes until he has data from games played that season (sometime in december, I think). By then, he has 100's of games worth of data with which to compare teams and predict outcomes.
Agreed. From what I can tell, he was drawing off of Brian's predictions, hoping that he had a pretty good feel for most of the games. It's flawed, but at this point, it's close to the best we can do.
Great read. Thanks.
Yes, I explicitly used Brian's predictions from his various pre-season wrap-ups (all hyperlinked in the diary) to categorize the likelihood of victory, as presumed pre-season, before any snaps are taken. He knows more about M football than anyone I know, so he's my authority.
random, independent Bernoulli trials to decide game outcomes,
indep trials seems logical, but take 2007. a loss to ASU re-colored the entirety of the season. all of a sudden, we essentially took 10-20%-age points off every probability of victory for subsequent games.
Yeah I would imagine the whole concept of momentum, emotion, and other physiological factors probably can't be figured into probability. But losing a close game to a good team, or gasp losing to an opponent you should not, has to greatly effect the actual chance of winning the following games.
Perhaps someone could pull up some historical data where this has happened, then compare the subsequent game results to the prediction.
This analysis is only trying to calculate probabilities and odds for win totals pre-season, before any snaps are taken.
But was the 10-20% drop a result of our losing to Appy State (momentum, etc) or was it the fact that we had a team that could lose to Appy State. I would argue it's the second, which is like thinking you're in the best case scenario and then suddenly realizing that you're in the worst case scenario (or possibly below). The independent Bernoulli assumption actually holds up pretty well there--we were 9-3 over the rest of the season.
The time using independent trials really fails is taking into consideration key injuries. With Oregon that year, they were very likely to win if Dixon was healthy and very likely to lose if he wasn't. So if you were to do a similar experiment, you would probably underestimate the odds of disaster since the team was quite good but only a step away from horrible.
You could theoretically use a Hidden Markov Model or something like that where a loss would be evidence that something terrible had happened to the team. I probably wouldn't, though--I think the independence assumptions are a pretty good idea here.
But that was a great post. It just looks good.
As someone who took Stats 350 I loved this. I can see this used as an argument for our weak scheduling. If you take the two "Win or Ann Arbor Burns" games and replace them with "Who Knows?" or "Probably Not" teams I would estimate the projected win totals to drop by one maybe two games.
Thanks again this is great. I will now direct my sparty friends who say Michigan is going to win three or four games to this.
one thing we do know: Brian did a piss-poor job predicting last season. But, he's not alone there.
But it doesn't account for the fact that Mike Hart was a 3*, so...
Kidding, kidding. Actually, I would be interested in two things:
1. How sensitive is the model to your choice of probabilities for each category? I noticed that the best and worst case probabilities for the "should win" type games was less variable than for the "should lose" type games. Like the worst case for "should be victory" games is .7 and the best case for "probably not" is .45.
I wonder if that's some irrepressible homerism finding its way in there. That is, it's probably true that deep down most fans are more confident that their team will win against a cupcake than they are that they will lose against a titan. Hope springs eternal and whatnot.
2. Which leads me to wanting you to make it more Monte Carlo-esque, as you suggested. This would allow you to account for variability in the variation in the probabilities. So, best case against OSU could account for Pryor getting injured uh, prior to the game, and worst case against Purdue would be Tate getting injured.
All in all, a very interesting analysis, though.
I noticed that too (No. 1). If there is a 30% probability spread for victory, shouldn't there be the same for the other two categories?
Great analysis. If I could issue you a +1, I would. I do think the probabilities need to be tweaked though, paying particular attention to the outer ends of the range. At the upper end (Ann Arbor burns) probabilities range from 90 to 99%, which strikes me as reasonable since we're characterizing an auto-win. At the bottom of the range though (Eh - NSM), which is presumably meant to characterize an auto-loss, the probability ranges from 10 - 30%. A game we're almost certainly going to lose is a 30% win best case? Strikes me as a tad optimistic.
That said I think it's interesting that your results seem to parrot most of the common thinking (except Vegas) on U of M's chances this year of 7 wins plus/minus 1. Nice job.
My thinking is that M has a far better chance of beating OSU than Delaware State does of beating M. So, the difference between "Win or Ann Arbor Burns" and "Eh ... Not So Much" is a blatant bias, but I think it's justified.
As for "Probably Not" and "Should Be Victory," I totally get what you guys are saying. I just adjusted the probabilities in the spreadsheet where I'm tracking all of this.
The new, more balanced probabilities are:
Probably Not
Worst Case: 20%
Middle Case: 25%
Best Case: 30%
Should Be Victory
Worst Case: 70%
Middle Case: 75%
Best Case: 80%
After incorporating these new probabilities, the expected number of wins does not change significantly, and each still rounds to the same number: 6 in the worst case, 7 in the middle case and 8 in the best case. I'll try these with the simulation code later tonight and let you know if there is a major change.
That was a bit of work there, and I appreciate it. I really hope that UM blasts the statistical prediction out of the water and wins more than 7. That being said, it is encouraging that the odds of a repeated horrible season are 55:1 - I would bet those odds all day. We are now only 2 months away from the start of the season. The anticipation is palpable...Go Blue!!!
well done sir.
Good stuff. Even if the base case being used is subjective, it seems to jibe with Vegas oddsmakers and past results (to a point, but, hey, anomalies happen.)
Superb Post. Period.
Read. Thanks!
This was a truly epic post.
Great analysis. One quibble--you "test" the model using three past seasons (small sample size) and find that it generated an accurate point prediction in two and failed in one. Given that failure rate (33%) and small sample size I'd back off the "hella accurate" description until you have lots more data.
This whole model is built off of his detailed pre-season predictions, which don't exist prior to '06 (as far as I can tell). I would love to check out '05, '04, '03, etc. if someone can point me to his "auto win," "win," "tossup," "loss," "auto loss" predictions for those seasons.
Now that I am "Thumbs Up for Michigan" I must blindly assert against all odds and evidence that we will go undefeated and win the national championship.
Go Blue!
First, your post is fantastic and a shining example of the quality of posts we need more of here.
"This was the most subjective part of the exercise. If you take issue with any of the win probabilities, please opine in the comments. I’m always interested in constructive criticism."
I apologize for the brief comments to follow, and promise I'll flesh them out more constructively when I have the time. However, by using Bernulli trials, you've guaranteed the outcome of the simulation is irrespective of actually running the simulation.
Expected wins = sum(Xi*1), where Xi is the % of winning a given game, and 1 is the value of a win. Therefore, if you add up the % to win as decimals for each of the three cases, the sum of the %s = your expected wins always. Basically, the independence of the trials is the limiting factor in your simulation. We know that there is strong correlation between games. Even if we used normal distributions instead of Bernulli probabilities we'd end up in the same place due to the Central Limit Theorem, at least for the middle case. However, we SHOULD get different top and bottom percentiles.
Essentially we are faced with an issue that what which we are simulating is binary in outcome. Being locked into a 0 or 1 discrete distribution for wins will yield ultimately predictable results. What I'd be interested in working on with someone is a model that estimates points scored for each team, then uses that to determine the winner. We'd need to factor in things like home/away, build in some correlations between games, and probably come up with many more additional factors, but it would be a very interesting exercise. If anyone's interested, hit me up at [email protected]
Again, great post.
woooooooooooooo FOOTBAW
In all seriousness....can this be done on the xbox by simulating a season a million times? I'm guessing the programmers are not that detailed?
It could, but who wants to write down the results of 10,000 simulations, then if new information needs to be included redo it?
that would be brutal
Also - on the binary nature of the outcome...wouldn't it be more reflective of a true season if you use this method to reflect a win and a loss? For instance, if you win a game 21 to 20, it counts as one win which is just the same as winning 45 to 10, right? The probability of getting the 1 in the win column just changes...
I'm not sure I'm understanding the question correctly, but I'll say yes. Regardless of how it's modeled, at some level the probability of getting a win can be stated as x%. If team Z has a mean points scored of j with standard dev l, and team Y has a mean points scored of k with standard dev m, then you can still derive probability x that team Z wins the game.
The difference here is that rather than having a set of probabilities to win each game for a given scenario (worst, average, best), you derive the worst and best scenarios from simulation results of the expected case. In theory, what you want to get to is for a given input, say PPG, if average PPG increases by 2, what affect does this have on the outcome distribution?
Translation: Rather than build a model based on %chance to win, build a model that accepts inputs such as turnover rate, QB disaster probability, home/away, (or heart, love for Michigan, grit, and other intangibles if you're so inclined) and have the inputs modify the mean and std dev for something like PPG, which is then sampled to determine the winner of a game.
I thought you were arguing for a non-binary outcome system in the original comment, but you were really saying that the result of the trials in the system 'as is' equals the sum of the probabilities....right? So in your proposed system, a simulation is done for each game based on mitigating factors and an actual result (W or L, 1 or 0) is tallied. I'm definitely on board with that. Is PPG the way to go or do you calculate some other 'score' that gets compared for each mathcup?
So...if resultant PPG is the measure - a few thoughts:
Do you start with an average PPG given up and scored for each team and then adjust on match-up? How do you assign this average? Scheme, Experience (# of starts & average year of projected starters? - would freshman be ranked on 'Star power' for projected effectiveness? do you break it down by position group?)
Home vs. Away - do you assign a different value to each stadium, or is this blanket? Also, do you ramp up home-field advantage in rivalry games and decrease it for cupcakes to reflect general level of raucousness?
Answer: It depends.
Basically, there is a need to start with something we wish to calculate, in this case I chose PPG because a win is binary in nature. From there, I'd find some sort of PPG expectation, then what factors affect it. After identifying the factors, I'd have to determine if they affect expectation "E[]" or variability "V[]". For example, a rivalry game probably doesn't affect E[ppg] but would affect V[ppg].
In a nutshell: Baseline --> Identify Factors --> Assumptions --> Rebaseline --> Assess Distributions of Factors --> Simulate --> Analyze --> Retest/Calibrate...etc
In general though, I think there's more at play than just picking %win and sampling it. jc's analysis is a good kickoff for a conversation though.
At the end of the entry, I suggest randomizing whether the probability for a given Bernoulli trial is drawn from the worst, middle or best case, to simulate luck or tragedy. In some simulations there would be multiple-game stretches of worst-case probabilities, similar to multiple games with injured players. Would this address your concern?
The only other way I can think of to address your concern is manipulating probabilities up or down based on whether the previous game is a win or loss, but I think that would be a flawed model, ignoring the randomness and independence that occurs in the real world. Functionally, it would just increase the probabilities of extreme outcomes (i.e. very high and very low win totals), which is not what is observed in actual NCAA seasons, where the majority of teams fall in the middle.
I think the issue is that three sets of binary distributions (worst, average, best) can't really be sampled by anything that isn't just self reinforcing. If you say there's a 25% chance to be worst case, 50% chance to be average, and 25% chance to be best, the resulting range would then be not as severe as the worst case, nor as good as the best case, but instead would more than like have the same average as the average case, with a SLIGHT increase in the std dev of the result set. A monte carlo, in this case, won't really affect anything as we're using binary distros as the input. On top of that, the sample from a given distro (0 or 1 wins) isn't put into a computation other than adding up the results.
I don't think games are actually independent trials. How a team plays says a lot about the next game. Losing to Toledo, fwiw, was a huge data point that no game is a gimme from then on out. If I predict 7-5 for next year with a win over WMU, then WMU beats us, I don't then expect 6-6 automatically. I also think that probabilities of extreme outcomes (especially to the downside) are a lot higher than people assume. Fat tail risk and all, or just look at your 401k for a painful example.
great post but i would make a couple of amendments to the a couple of the games. If the illinois game is going to be a "probably not" then MSU had to to be to. MSU so much better last year that you can't assume a four game swing between the two teams. If Illinois was a "who knows" then MSU's grade would be legit but there is no way that a 5-7 team should be ranked ahead of a 9-4 team.
again great post
lose the "Juice"? I can't imagine them being decent without him.
no, this is his last season
Good point, but I think you can rationalize the discrepancy by the fact that MSU will probably take a small step back with the loss of Ringer and an experienced QB. Illinois brings back Juice and his receiving corps which lit us up last season and now they get us in Champaign. I think Illinois is the tougher road game, not by a lot, but enough to put each game in a different category.
great post but i would make a couple of amendments to the a couple of the games. If the illinois game is going to be a "probably not" then MSU had to to be to. MSU so much better last year that you can't assume a four game swing between the two teams. If Illinois was a "who knows" then MSU's grade would be legit but there is no way that a 5-7 team should be ranked ahead of a 9-4 team.
again great post
Wow, as an engineer I like numbers and this one most certainly had them! Great job with the analysis. It takes a lot to come up with the idea and then even more to see it through and rationalize your argument. Thank you (x10000) for this post!
Last year's record means nothing. Juice looked deadly at times last year and will almost certainly be better. Meanwhile, State lost several starters including Ringer and Hoyer. I don't see them as being "so much better" than last year.
**EDIT**
This was supposed to be a reply to mghorm. My bad.
What i meant to say was that MSU was so much better overall last year that illinois shouldn't jump them in UM's chance of winning. I think that they both should get the same grade
I think the premise is predicting the amount of wins prior to the season starting. Therefore, no revisionist history is allowed by looking at how teams actually fared during the season in question.
you have to look back to see if your method is worth a damn. There is no point in running a simulation that had Indiana winning the Big Ten the last three years. The whole point of the simulation is to look at patterns from previous seasons and used those patterns to predict future outcomes. So you have to look at previous seasons
As the subject says, I am not criticizing your post which is very interesting. However, when one's findings suggest that an extremely improbably event has transpired one must question the assumptions that lead to that conclusion.
My eye ball tells me Michigan had no business being well over 50% to win any conference game last season (hingsight clearly). I'll give them good chances for Toledo and Miami but that's it.
In other words, I would disagree with your conclusion that 3-9 was a statistical anomoly. I think the 'assumptions' are faulty. Michigan was in the range of what they deserved. Maybe on the lower end but still in the range.
I'd absolutely agree with that argument. You could argue that the anomoly was less the model and more the very poor predictions (both are assumptions but I think you were arguing more against the model).
Most years the predictions are going to be much better (for one, much more informed, with new coaches and players we were obviously working on very little data that year). I would at least rename the scenarios--"Good case", "Middle case" and "Bad case" are more accurate--2007 for one was far from the worst-case as painful as it was, it could have been a LOT worse.
Actually, the "turnovers are essentially random events" and "if you lose more of your close games than you win" theories say anything, then 2008 was something of a statistical anomoly.
Comments