Mid-Week Statistical Nuggetry: Now With Twice the Comma Comment Count

The Mathlete

During the season I’ll be posting a weekly review/preview/tidbit piece. For now it’s named Mid-Week Statistical Nuggetry but I am open to better ideas. Within the article I will try and look at pertinent notes from the past game, and look forward to our next opponent. Occasionally a bigger topic might come out of the previous games that will get a full treatment.

As a side note, after Week 1’s games the database now has crossed the 1,000,000 play mark.

Plays That Made the Day

A new addition for this year. I went back over the last eight years of data and have added a “live” win percentage indicator based on score, down and distance and possession. It’s still being tweaked but for the most part it is in place. Using this, I will try and pick out the most valuable and least valuable plays for each game in terms of WPA (Win Percent Added).

Bad Play #3, 5% lost. Carder hits White for 17 yards to move Western into the Red Zone on their second drive of the day.

Bad Play #2, 6% lost. Denard misses Roundtree on third down on the opening drive of the second half, forcing a three and out punt after good starting field position.

Bad Play #1, 7% lost. On 3rd and 7 on the opening drive, Carder hits White for 14 yards to set up first and goal.

Big Play #3, 8% added. Western Michigan misses a 38 yard field goal with the score tied at 7 early in the second quarter.

Big Play #2, 9% added. Kovacs sacks Carder, forcing a fumble and Herron scoops and scores to push the lead to push Michigan’s lead to 17.

Big Play #1, 35% added. No brainer here. With the score still tied at 7, Jake Ryan hits Carder and Big Play Brandon Herron is there for the pick six, taking Michigan from a 38% chance of victory to a 73% shot.

Brandon Herron’s Big Day

With two defensive touchdowns, Brandon Herron added 9.1 points of value in just the returns, not to mention 8.5 points in value from the turnovers themselves. Since 2003 only one player has ever accounted for more value in what I call miscellaneous returns (any return that’s not a punt or kickoff) than Michigan linebacker did Saturday: In 2008 Utah was playing at San Diego State and the Utes’ Deshawn Richard returned two Ryan Lindley passes for touchdowns, one for 89 yards and the second for 38. The two returns barely eked ahead of Herron with 9.2 points in added value.

Special Teams’ Bad Day

Michigan finished the day –2.8 PAN against the Broncos Saturday. All five special teams units were below zero.

Punt return was the closest to zero at –0.1. Kick return was also serviceable at –0.2. With Hagerup, the punt team was –0.5, the Gibbons was –0.9 thanks to the blocked PAT and kickoff was obviously the worst at –1.1. The kickoffs weren’t great, 48th out of 75 teams on Saturday, but the coverage was even worse, coming in 71st out of 75 on Saturday.

Field Position and the Offense’s Short Day

Last week Michigan’s offense had the fewest number of relevant drives (4) of any team facing an FBS opponent. On average, those four possessions should have yielded 7 points, they yielded 14 (2 defensive TDs and a final touchdown after they already had a 17 point lead). Michigan was +1.66 points per drive (PPD – Expected PPD), which was tenth best for the week.

The sample size is extremely minimal here so plenty of caveats apply, but considering how little opportunity the offense had, they didn’t do terrible, but they weren’t exactly Wisconsin scoring 38 in six drives with an expectation of 13 either.

On the flip side the defense faced an expected 13 points and gave up 10 and Western Michigan missed a field goal that is made about 68% of the time. This obviously doesn’t factor the defensive touchdowns which more than negated points actually allowed.

Biggest Comebacks From Back in the Day

Even if Michigan wouldn’t have been able to score with their good field position when the game was eventually called, they probably would have at least taken the game into the fourth quarter. I ran fourth quarter comebacks of 24 or more points through the database and since 2003 there has been only one.

Last year Kansas’s big comeback over Colorado from 28 down in the fourth quarter is the only game I could find in the last eight years where a team came back from 24 or more in the fourth quarter, although TCU actually came back from 24 down to take the lead against Baylor Friday before losing it in the end. Ten teams have come back from 24 down prior to the fourth quarter, most notably Auburn in last year’s Iron Bowl.

It Was Over Before the Lightning Called it a Day

One of the benefits of the WPA metric is the ability to track the progress of the day and put it in you know what form:

image

The big interception return from Herron made a dramatic swing and Michigan’s win percent hit 100% for the first time after Michael Shaw’s long TD run.

*This is still a bit of work in progress so some of the jaggedness in the chart isn’t real. I am having some challenges getting the win percent smooth across possession changes but overall the trends are right.

Notre Dame at Night

A quick mini-preview of Saturday’s history making showdown. Numbers from last week aren’t opponent adjusted, numbers from last year are.

Michigan Rush

Michigan last week: +4

Notre Dame defense last week: +8

Michigan last year: +6

Notre Dame defense last year: +2

Michigan Pass

Michigan last week: +2

Notre Dame defense last week: +4

Michigan last year: +3

Notre Dame defense last year: +6

Notre Dame Rush

Notre Dame last week: +5

Michigan defense last week: –2

Notre Dame last year: +0

Michigan defense last year: -3

Notre Dame Pass

Notre Dame last week: –2

Michigan defense last week: +0

Notre Dame last year: +0

Michigan defense last year: -3

Special Teams

Michigan last week: –2.8, bad in kickoff and kicking

Notre Dame last week: –5.1, really bad in kicking and punting

Prediction

My numbers are slightly more favorable than Vegas but still tilt toward the Irish.

Notre Dame by 2

Comments

justingoblue

September 7th, 2011 at 12:41 PM ^

When you have your model tweaked a bit more, is it possible to give out a "cheat sheet" if you were willing to? Something to the tune of: interception by Michigan increases chances 2.5% or QB injured decreases chances by 15%. Of course, I could have just made myself look dumb if you're using a ton of different variables.

Either way, thanks for putting this together, always enjoy your work (even with M+2 at the bottom).

MGoNukeE

September 7th, 2011 at 12:50 PM ^

I'm not sure what your x-axis is supposed to be on your graph aside from arbitrary time units. Your graph may increase clarity if WPA is plotted against plays, since WPA seems to change on each play. Another alternative is, if WPA can change in-between plays (like if a key player is removed from the game), to plot WPA against game-clock time (in minutes or seconds, depending on how often WPA is measured).

In any case, I'm interested in seeing how the WPA chart evolves over time.

Jon06

September 7th, 2011 at 12:53 PM ^

"Statistical Nuggetry" isn't very catchy. Why not just "Mid-Week Nuggets" or "Wednesday Nuggets"? Alliteration would be nice but it's hard to come by. From the synonyms for nugget listed by thesaurus.com, you could try "Wednesday's Wad" but that's somewhere between nonsensical and gross. Similarly, "Midweek Mass" is alliterative but the first mass that jumps to mind is not the nugget sort. Statistics is kind of your religion, but it's still not catchy.

How about, and I know this is probably cringe-worthy, "The Mathlete's Athletes: By the Numbers" (with or without the colon) or--now that I think about it--just "By the Numbers"?

And now for an overdue shutting up.

Red is Blue

September 7th, 2011 at 12:54 PM ^

The WPA starts at 50%, so if I understand it doesn't account for inequalities between the teams.  In other words, it is really tracking the likelihood a team wins assuming both teams are equal  Just before Herron's interception, my guess is that M still probably had a greater than 38% chance of winning.  Obviously there is no way to tell, but in fact, even if WMU scores on that drive, I think M still has better than 50% shot at winning. 

Red is Blue

September 7th, 2011 at 1:12 PM ^

Thanks.  I was wondering if it would it be possible to factor in the inequality of the teams?  Maybe somehow use the point spread as a predictor of the imbalance between the teams?

Also, I'm a bit confused about the value points associated with Herron's turnovers and returns.  Is the 9.1 he had situationally dependent? And how is that different from the increase in WPA?

 

 

 

The Mathlete

September 7th, 2011 at 1:20 PM ^

The Value added is inedepent of score or time, it is based only on down, distance and field position. He would have added 9.1 whether it was in the first quarter or in the fourth up by 4 TDs. The value would always be 9.1. The effect on WPA is dependent on score and time. The big return while tied in the second quarter was worth a lot. If he would have done that in the fourth with Michigan up 3-4 TDs, the WPA would be effectively 0 because the game was already in the bag.

MCalibur

September 7th, 2011 at 6:05 PM ^

The difference in the teams should sort itself out according to the score and field position. Presumeably the better team will score more and control field position more. Only things that happen on the field impact actual win probability.

What you're talking about is more along the lines of "how concerned should Michigan be if it goes down 14-7 to WMU midway through the second quarter?" Aparently you would answer, "not very" because they had plenty of time to let their superiority manifest.

The TCU and ND games are other examples. Those teams were probably better in the long view that the teams theat beat them, but on Saturday the things that transpired over the course of that particular game (5 turnovers in ND's case) dictated how difficult it would be to overcome those things.

I think starting out at 50-50 is the right way to engage this endeavor.

Red is Blue

September 7th, 2011 at 11:13 PM ^

Your comment about TCU and ND gets to my point exactly.  Lets take your assumption that those teams are probably better in the long view than the teams that beat them.  At what point does the game situation (the short view) start to overwhelm the long view?  That is, the "inferior" team has reached a game situation where it seems like this may be their day. 

To me, trying to the question "how concerned should a team be about winning given a certain game situation up to that point and given their likely relative strength?"  Is much more interesting than, "lets assume that the teams are even, what are the odds of one of the teams goes on to win from this point in the game?"

If you were betting on dice and you had reason to believe the dice might be loaded, would that change your betting patterns or would you just assume that the dice were fair?

 

MCalibur

September 8th, 2011 at 12:17 AM ^

Meh, that dice analogy is out of place. This isn't a gambling question. Yes, I would bet on the favorite to win. Note the difference between winning and covering. What does that have to do with what we are doing here?

I'll save you a lot of reading and just say this: one arbitrary starting point is as good as another. So, why make an assumption you don't need to make?

If you're hell bent on it, I have a suggestion. Sportsbooks. Michigan was a 14 point favorite against WMU, apparently we were expected to win about 85% of the time. By the time Western scored their first TD, they had gotten it down to about 65%, They never got closer than that.

But now there is a different problem, how do you know the spread is accurate? How often does the book accurately predict the exact final spread? I have no idea, but I doubt its over 5% of the time.

So back to my question, what makes one sort of inaccuracy better than another; why make an unnecessary assumption?

Red is Blue

September 8th, 2011 at 7:44 AM ^

I do agree that you don't know that the spread is accurate, but it represents a consensus of folks voting with their wallets on what they feel is the likely outcome.   Sure, the exact number of the spread is not hit that often, but that is irrelevant.  If you analyzed many samples, I'd imagine that teams that go into a game with an 85% chance of winning (according to the odds) actually win at rates that are closer to 85% than 50%.  So it seems likely that the 85% prediction (based on a consensus of bettors) is actually a more accurate than an arbitrary 50% (based on the fact the we can't be sure that an imbalance exists and if it does we can't the the imbalance with precision).

MCalibur

September 8th, 2011 at 12:57 PM ^

Auburn was a 24 point favorite over Utah State; expected to win 100% of the time (110% actually). Without a doubt, the bookie baseline would have given Auburn an absurdly high win expection down by 10 points with 3:38 remaining in the fourth quarter with possession then again down by 3 with 2:07 left in the fourth needing to recover an onside kick. That method would show UTST as a dog until the last play of the game where it's WPA would go from under 50% to 100% instantaneously after the last play of the game. I'm guessing using a bookie bias would never have given UTST anything better than a 50% win probablity at any point in that game. It just doesn't make sense to track WPA with an arbitrary bias. Maybe Mathlete can help us out with real numbers for that game. Chart?

I understand your point, it's just out of place in this endeavor. If a team is actually better, it will show as the game unfolds. Until that happens, it's anybody's game.

zlionsfan

September 8th, 2011 at 1:02 PM ^

is that it's a much more complex question, particularly because it's going to be based on assumptions that can frequently be wrong.

Yes, it seems safe to say that Michigan is considerably better than Western Michigan ... but even so, to give a meaningful answer to your question, we need to know by how much, and also how that differs, if at all, from the current situation. Early in the season, that's extremely difficult to determine, especially when coordinators/coaches/schemes have changed from one season to the next ... and even in the season finale, it can still be difficult to distinguish game-as-expected from unusual-game.

Now take Notre Dame and South Florida. The Irish ought to be the better team, and it makes sense that a good team that makes a lot of costly mistakes can lose to a worse team, but even so, that still doesn't mean that you have better information on how the rest of the game will go. If anything, modifying the model in those situations will give you less accurate information, not more accurate information, because as USF stays in the game, the model will say "but it's OK, Notre Dame is better, so they're more likely than average to come back."

The analogy to dice is interesting, but the problem is that you're talking about something different than the issue of average team vs. current teams ... even if you know that the dice are loaded, you are still betting (or should be betting) based on the average rolls that loaded dice will produce, not on what the dice just showed. You're still looking at independent events.

Red is Blue

September 8th, 2011 at 8:19 PM ^

My only point was that the model inherently imbeds an assumption that the teams are even.  When in reality, we have some evidence (which granted may be just bad perceptions) to suggest they likely aren't even.  That, of course, doesn't mean the "better" team always wins.  Hypothetically, substitute Delaware State for ND last weekend with the same situation with the same game events having transpired up to half time.  It  seems appropriate at that point that the model would yield a result that it is more likely that ND would come back than DSU.

Said another way, for the rest of the year I'll take all the teams that go off at 55% chance or better of winning, you can have their opponents.  I'll take 100 points for each victory and I'll be generous and give you 106 points each time you win.  Before the game starts, using 50% model starting point, would seem to suggest that the model would predict I would score an expected point total of 50 points/game and you would score an expected 53 points per game.  Or, in other words, you would outscore me.  But, in the "real world" I would strongly expect that I would end up with a higher actual points per game average.

I get that this would probably be difficult to implement in the model (it is well beyond my capabilities).  In a "perfect" (whatever that means) model at some point, the game experiences should start to overwhelm the starting point.  Thus going into the 4th quarter, I would have expected such a model to show Baylor having a >50% chance of winning even if TCU started with a >50% chance with TCU being perceived to be the "better" team. 

Referencing the M game, after WMU scored in the first quarter, the model seems to suggest that M had < 30% chance of winning at that point.  But, if I understand the percentage quoted earlier, the consensus (from bettors?) was that M had a 65% chance of winning.  Which was right?  Probably neither.  But, which seems more likely to be closer to being right?

We don't know whether it will be raining in A^2 next August 19 at noon, but could make an assumption of the likelihood.  As we get more information about how the weather patterns are playng out , we can refine the predictions.  I hold that even if it is not perfect, we could use a starting point assumption for rain that is better than 50/50.

 

 

FragglePac

September 7th, 2011 at 2:22 PM ^

I like the use of Nuggetry but overall it lacks a little luster.  How about...

"Mid-week Number Nuggets"

"Mid-week Nuggetnometry " 

"Mid-week Mathleticism"

That's what I've got on quick thought.

leftrare

September 7th, 2011 at 5:46 PM ^

Sorry, but I think this is just wrong:

"With two defensive touchdowns, Brandon Herron added 9.1 points of value in just the returns, not to mention 8.5 points in value from the turnovers themselves."

Herron was in the right places at the right times, and OK, I'll give him credit for having pretty good LB speed.  (I suppose RVB also got some kind of nominal credit for falling on a fumbled QB snap.)  But all football plays are team plays.  And if credit is going to be doled out to individuals, why not Ryan and Kovacs instead of Herron?

I'm sure this seems pedantic -- actually it IS pedantic -- but I have a bigger point to make.  This kind of analysis, like fantasy football, tends to overweight the importance of players who touch the ball vs. players who make key physical plays away from the ball.  I think to participate in fantasy football is to make an idiot of yourself because you become obsessed with the shiny objects and quit paying attention to or caring about the guys in the trenches.  I like what Mathlete does in TEAM quantification but I also get bleary eyed when he starts talking about this player or that player scoring "points" for his team.

 

 

Beavis

September 8th, 2011 at 1:14 PM ^

Challenge flag - any model that says Michigan was ~30% likely to win this game at any time is a model that is worth its weight in horseshit.  

tjyoung

September 9th, 2011 at 12:23 AM ^

I have a question about the chart.  If Michigan's win percentage hit 100% after Shaw's long touchdown, then how is it possible that it fell below 100% soon afterward?  If the model deems Michigan the winner at 100% at a certain point, then the model should not doubt itself later on, am I correct?  Am I making sense to anybody else?  Or am I just very confused?  Thanks for any help trying to explain this to me.