I was curious - is there a Run vs. Pass breakdown of the 3rd/4th & 1s that Michigan has faced to get to this number?
Mid-Week Metrics Re-Writes an Old Narrative
This week was essentially a pick-em so we’ll skip the spread and chart it straight up.
I have been making a few tweaks to the math behind the chart to take out some of the noise, especially on possession changes, hopefully it’s becoming a better product.
Michigan jumps out early, is in good position but can’t close it out for the second and third quarters before finally putting it a way in the fourth.
1. +12.5%, Play 3, Toussaint for 65 on the opening drive.
2. +5.2%, Play 5, Robinson punches it in for the opening score.
3. +5.0%, Play 153, JT Floyd picks Scheelhaase (+1.5%) and takes it back 43 yards (+3.5%).
1. –5.8%, Play 65, Michigan stopped on 4th and goal from the 1.
2. –4.9%, Play 33, Robinson’s first fumble with Michigan driving.
3. –2.7%, Play 137, Scheelhaase runs for a 31 yard score to cut the lead to 10.
Other Notes from Illinois
Last year we had a consistent defense and hoped our offense was good enough to win the game for us. This year we have a consistent defense and hope our offense is good enough to win the game for us. The difference is last year the D was so bad that the offense had to hit home runs to pull it out. This year the defense has consistently held serve and the offensive variability has largely dictated our success or failure.
This week the narrative expanded and not only did the defense hold serve, it produced extra value. This is the first game all season that offense had a negative win percent added (-13%) and the team still won the game. Iowa and Michigan St both saw negative scores but the defense couldn’t do enough to overcome.
How you feel about this game is largely dependent on how you view Illinois. If you view them as a Ron Zook coached mid-level BCS conference team you probably are part of the group that was uninspired about the showing on Saturday. You would probably point to stats such as Michigan had the best field position of any BCS conference team in any game so far this season and still only had 31 points (vs an expected 39 based on field position).
If you view Illinois as a Ron Zook coached mid-level BCS conference team with a stout defense and some weapons on offense, you are probably Brian. I have Illinois’ defense ranked #14 nationally at +7 and third in the Big Ten behind Penn St and MSU. Scheelhaase is a ranked as a top 20 BCS conference QB (+5) and AJ Jenkins is a top 10 receiver (+8, catches only) who was limited to +1 on a season-high 19 targets. I am in the second camp here. The offensive performance was far from perfect but the total performance of the team accounting for Illinois’ stout defense probably puts Saturday as the best/most important win of the year to date.
Rushing: –1, good and bad even out versus strong rush defense
Passing: +8, found some big plays and no meaningful interceptions
Rush defense: +1, decent day against a mediocre run game
Pass defense: +7, in JT we trust, apparently
Special Team: +1, a narrow miss on a FG away from best score of the year
Denard: +1 overall, +4 pass -3 rush, no games higher than +3 since Northwestern
Devin: +3, +5 pass –2 rush, best number of the year
Toussaint: +4, 3rd best of the year for a Michigan back (Fitz vs Purdue and VS vs E Mich, +6)
Scheelhaase: +4, +0 pass, +4 rush
Jenkins: +8 but took a season high 19 targets to get there
Third and Done
So I wrote this up on early in the week only to find that its become a topic across Michigan blogdom this week. Hopefully there is something new for you here, if not rest assured knowing that we own third and one.
Michigan hasn’t just been good on third (or fourth) and one, they have been amazing. Michigan has 15 stops in 27 competitive attempts against them this year. That’s a 44% offensive success rate. The national average is 72%. Michigan is literally getting twice as many stops as the average defense would on third or fourth and one.
Michigan is currently 12th over the last 9 seasons on the conversion rate in this situation. Only 1 of the 11 teams ahead of them have faced more than 18 attempts. The only comparison to what Michigan is doing so far this year is Boston College of 2008 who had 16 stops out of 25 attempts against. In fact, if Michigan gets two more stops on third or fourth and 1, they will have the most stops over the last 9 years in that situation. Michigan has ended a full 7 drives more than the average team would on super short yardage situations.
B1G Championship Game
The loss to Iowa effectively ended Michigan’s chances. I have them listed at 0.3% chance of making the inaugural title game but that assumes that Indiana has a chance to win on the road against Michigan St. A Sparty No on Saturday and a Michigan win puts the odds up over 20%. In total, Michigan needs wins over Nebraska and Ohio and Michigan St to lose to both Indiana and Northwestern. The Spartans hold a commanding 91% chance of making the title game, a win by both Michigan teams on Saturday would clinch it. Nebraska stands at about 8% but need to win out and have Michigan St slip to have any real chance. Iowa technically could win some 3 or 4 way tiebreakers but at less than 1 in 2000 odds, things don’t look so bright.
On the other side of the standings both Wisconsin and Penn St control their own destinies. I give Wisconsin an 84% chance of winning the head to head matchup so they have a 67% chance of reaching the title game. Penn St sits at 27% and thanks the Boilermakers upset over Ohio, those two schools both sit at 3% apiece. For Purdue the path is win out, Ohio beat Penn St, Illinois beat Wisconsin, Michigan beat Ohio and Wisconsin beat Penn St. Ohio isn’t dependent on as many game, but the odds are the same. Win out, Penn St lose out, and Wisconsin lose to Illinois.
My Heisman Thoughts
With some new chaos at the top of the polls, Wisconsin has got to be killing themselves for not being able to defend the deep ball late. It’s put them out of the National Championship race and buried Russell Wilson’s Heisman campaign. I think it should still be kicking. I have him leading in WPA (+3.0) ahead of Case Keenum (+2.8) and Brandon Weedon (+2.2) and EV (+13) ahead of Keenum (+12) and RG3 (+11).
Kellen Moore: +1.5/+10
Andrew Luck: +1.7/+6
Trent Richardson: +.3/+4
Denard still holds up well on WPA at +2.07 but his EV is way down at +4 and barely in the top quarter of all QBs.
PAN, National Rank (leader), Big Ten Rank (leader)
Michigan: +4, 12th (Georgia Tech), 2nd (Wisconsin)
vs. Nebraska D: –0, 63rd, 8th
Fitz: +1 (now in top 30 RB’s)
Michigan: +1, 50th (Boise St), 5th (Wisconsin)
vs. Nebraska D: +2, 30th, 6th
Michigan: +2, 21st (Alabama), 4th (Illinois)
vs. Nebraska O: +0, 53rd, 7th
Taylor Martinez: +1
Rex Burkhead: –1 (24th of 28 back that average 100 yards per game)
Michigan: +2, 39th (Oregon), 7th (Penn St)
vs. Nebraksa O: +3, 25th, 4th
Taylor Martinez: +4
Michigan: –0, 79th (Florida St), 9th (Purdue)
Nebraska: +2, 27th, 3rd
Rex Burkhead has the yards and the carries but my per play valuation point to a Nebraska team that pounds the ball on the ground for yards, but puts points on the board through the air courtesy of a throwing motion that makes Tebow look like Tom Brady. These teams are pretty evenly matched. Mobile QBs that are flawed passers who can succeed on the back of the run game and defenses far below historical precedent but not major deficiencies either. Michigan 31-28
is your predicted score a result of some algorithm or your a more subjective analysis?
Also, do you have the probabilities of M beating Nebraska and Ohio?
Really enjoy your blog. Thanks!
The pick is based on the numerical projections with a slight adjustment to make the score match "normal" football scores.
Nebraska is calculated at a 62% chance, Ohio at 74%. About a 10% chance we lose both and above 45/45 that we win 1 or 2.
This makes it seem as though scoring a touchdown first and holding them to a 3 and out predicts us to win > 70% of the time? Seems a bit high. Are the plays normalized against the time remaining in the game in which the outcome can change? In other words, a touchdown with 2 seconds left to take the lead would bump a winning percentage from ~2% to 98%. Just curious.
I was about to ask the same thing. A 12.5% swing in our favor after a 60 yd run in the first drive seems high. There were still 59 minutes to play.
Most of the time, when a team is able to run off a 60 yard run on its third play from scrimmage, it's because that team is playing vastly inferior competition and will be able to run such plays consistently -- in other words, they're very likely to win. That, or it'll be a shoot-out, in which case scoring first still works in your favor. Even with the zillions of games in Mathlete's database, there probably aren't many with a fluke-y big play that early in the game between two otherwise well-matched teams, especially with the fluke play favoring the team that loses.
In other words, in a database full of games between evenly matched teams, a 60 yard run so early in the game is probably worth less than 12%. The 12% number is probably mostly a reflection of the fact that, when you start ripping off 60 yard runs that early in the game, you probably aren't playing against an evenly matched opponent.
Mathlete, if I'm wrong, please chime in.
Breaking a big play on the first possesion is not that abnormal. Overcommitment by a defense and a missed assignment is all it takes. Hell, it's what the Lions depend on every week. Happens several times every Saturday and Sunday. Regardless of what the "data" says, it doesn't pass the eye test: a 65 yd gain that doesn't end in a score is simply a 65 yard gain. It speaks much more to a broken assignment than "inferior competition."
I guess I just don't see the point in grading the win percentage based on individual plays as though they were mutually exclusive of the greater situation. For instance, if we failed to convert that 65 yard gain into any points, would it still be considered the 12.5% positive play towards winning the game? That would argue that flipping the field is more important. This assumes of course that the percentage is a real-time calculation, not a retrospective review of the subsequent plays. If this graph is a retrospective review, then it holds more merit as the plays do not become mutually exclusive of one another.
I appreciate the objective of trying to calculate winning percentage due to individual plays based upon data from previous games as the predictors. It just seems that:
a) The predictive database size (although it seems large) is not powerful enough to give an accurate prediction, particularly when looking at specific plays which are less likely (ie, 65 yd gains)
b) There is an inherent flaw of comparing dissimilar teams (Michigan 2010 vs. Michigan 2006). If you have a good defense, a 7 point score will mean much more to the team with the better D because of the ability to prevent the other team from scoring. Unfortunately, it would take multiple games played until enough data on the current team would be useful enough to quantify any sort of normalization value for the offense and defense in comparison to the dataset of previous years.
Predicting the impact of football plays on the final result is an interesting statistical exercise, but it will inevitably lead to "tinkering" of the math until you see the results that your mind expects given its non-statistical review after watching the actual game.
Then again, I haven't been an engineer since I got my degree 14 years ago. So what the hell do I know?
A little clarification. Plays themselves aren't given values directly, but indirectly. The value for Fitz's run is calculated based on the difference between future winning percent of 2nd and 10 at the 20 and 1st and 10 at the opponent's 15 in the first minute of the game. The difference between those situations is 12.5%. Whether it 1 65 yard play or thirteen 5 yard plays, the change in expected winning percent between those situations is 12.5%.
There is no fitting the data to meet expectations. I have over 1 million plays over nine seasons and have calculated winning percentages using two steps. Calculate actual winning percentages in real situations and then smooth the data so there aren't odd variances.
For those that are really interested, the general equation for calculating win probability is the following logistic function:
Where L is the lead for the offense, adjusted for field position and down and distance
Where T is the time in minutes elapsed
I started out with something much more complicated but after some thorough reviews of the data, I was able to fit the data to the above equation with a high level of confidence.
All of the data is based on nearly 10 years of actual results. I agree there is a lot of time to play, but a 7 point lead translates to a win over 70% of the time even in the opening minutes of the game.
a TD favorite wins about 2/3rds of the time. (Not too carefully researched.)
Thanks for confirming my rule-of-thumb. (Taking a 7 point lead near the beginning of the game for a pick-em game is essentially the same as a 7 point favorite at the start of the game.)
Last year we had a consistent defense and hoped our offense was good enough to win the game for us.
The defense was consistently bad and the offense had to be great. This year the defense is consistently solid and the offense needs to be productive, but not necessarily great.
Ah, I didn't read it carefully enough. I read "consistent" and assumed you meant "good."
does sound like a good defense; in the quest for clarity I would suggest "consistently bad."
What's your thoughts on the 4th and 1 from the 1 to go for it? I plugged the numbers into the calculator Brian posted yesterday and came up with this:
That says to me that you have a 68% chance of scoring a TD and a 100% chance of scoring a FG if you kick. The EP total is about double for going for it, but do you really need the points? The total WP is higher for making the FG than for scoring the TD, so it seems like a good idea to kick when already up 14.
I have cost of failure a little higher (86% WP), with 92% on a TD and 87% on a made FG. That would point to going for it even if your odds of success were sub 20%. Plus, in the second quarter it's stil all about maximizing points, no point in counting possessions until the fourth quarter.
I have noticed that Hoke prefers taking the ball to start the game if we win the coin toss. Is there a difference in win percentage for taking the ball first vs starting the game on defense?
Question: What software do you use for your data collection, storage, and manipulation?
Background: I am currently working on acquiring all play-by-play data for every available from 2003 through present. I've finished collecting all of the 2003-2006 and 2010 data.
The purpose of this is to create my own metrics similar to FEI and other advanced stats to help me (and others) understand how good teams and each teams' units are in comparison to each other. This is because I cannot trust any statistic which has a proprietary formula- although the results might pass the "smell test" in the general sense, without knowing the formula it's difficult to determine how well each individual result is.
I therefore plan on making my own metrics, releasing the results AND the formulas to the public (or at least the MGoCommunity) and then using feedback to improve the formulas.
The data is currently being stored in Excel and is approximately 1 million lines per season.
I've already come up with some of my initial equations. But, getting all of the calculations to work correctly in Excel is a tall order. Even with a relatively speedy computer [that I built] it still takes an extremely long time to calculate anything. I'd therefore like to try some other application besides Excel at least for my data manipulation if not the collection too.
I was thinking about purchasing Mathematica simply because my equations are getting complex and I know that it has the capability to solve them. I have some small background in Mathematica, Maple, and Matlab from U of M but I don't think that any of those programs are what I'm looking for since they [probably] aren't designed to handle multimillion line arrays. I'm trying to avoid learning a new language like SQL (or even R which I more "aware of" than "familiar with").
Thus, any advice on what software to use would be appreciated.
For some reason, I just find stats and the processing of statistics to be fascinating.
1. MS Access - it's actually pretty powerful, very user friendly, and designed to handle larger amounts of data than a single excel spreadsheet. Further, if in the future you chose to migrate your data to a more dynamic database, you could use Access to generate SQL statements to move/manipulate the data.
2. MySQL - powerful, open-source SQL. Really, SQL is quite easy to use and it's going to be much more powerful than using access or excel. Similarly you could try to use MS SQL Server Express which Microsoft gives away for free to store and manipulate your data, depending on the size of the database(s) you're using.
also wanted to add that if you're willing to learn a few of the more advanced features of excel such as pivot tables, you'll be able to really cut down on the processing time.
how accurate have your pre-game predictions been throughout the year?
Opp (Change vs last post)
Nebraska: 56% (-8%)
Ohio: 70% (-7%)
Do you still have the same game odds as you had last time (seen above)?
Have you, by any chance, calculated the drop in win percentage by Brady Hoke not going for it on 4th and 1 from inside Illinois's 45 yard line in the 1st quarter? I realize that it's not on the list because it's not an actual play. I'm guessing that dumb decision was worth at least a 5% hit to the win % (I'm surprised that this moronic choice hasn't gotten more discussion, but I guess that's what happens when you win).
I love that comparison, especially since that D-line featured two guys currently starting on championship-caliber defenses in the NFL (B.J. Raji with GB and Ron Brace with New England). This speaks well to RVB and Martin's chances in the league - and our chances for the rest of the year.
i've begun to run my life by asking what would the mathlete do to my percentages if i do something or other. like, what does clicking "confirm friend" with an old girlfirend on facebook do to the odds of my marriage not crumbling? Minuus 6.2% Is she's hot? Minus 14.8%? What does taking these shrooms do to the odds of me waking up in jail? Plus 62%? Thank you mathlete for changing my life. Knowledge is power.
...but the predictive analysis the Mathlete uses probably exploits the patterns a coach or team demonstrates. This results in a percentage that "X" will happen given a sample size. For it to apply in YOUR life, we'd have to determine if you consistently do such dumb ass things, or if this is just a flash in the pan where the rest of us are left with the question... "What the f#ck was he thinking???"
but i wrote it in the first person, so unless you are using the royal "we" you all have to determine nothing regarding the consistency of my dumb assery. i am merely thanking the mathlete for providing the methodology for me to determine the percentages - the question marks were of course rhetorical. but your condescension is duly noted. cheers.
of condescension was intended. Mostly just being a smart ass, for humor's sake. Your explanation of the question marks puts that whole thing back in perspective. Apologies.
Good read, but I think your projected score is far too high. We almost never play high-scoring games anymore, and even Nebraska has played a bunch of low-scoring ones after some shootouts earlier in the year. If it's a 3 point margin, I think 20-17 is more likely.
Instead of measuring by change in absolute win probability, you were to measure by %of loss probability removed, his play would score ~50% (~90% up to ~95%), while Fitz's early run would score ~25% (~50% up to ~62.5%)
If Floyd's INT had occurred near the beginning of the game, and Fitz's run had occurred up 10 in the 4th qtr, the Floyd INT's change in absolute win probability would also be much higher than that the Fitz run.