...talks about how UConn hasn't been in contact and how they're out. (HT: UMHoops)
A look at Michigan’s opening opponent through the eyes of PAN*.
When Michigan Rushes
Let’s kick the season off with a nice chart, Michigan Rush Offense PAN vs. UConn
Last year the gap between the two was worth nearly two points a game and this year it is projecting to narrow slightly. This projection is probably on the pessimistic side for Michigan as UConn has four consecutive years of decline on rush defense and nothing would indicate that Michigan would see a drop versus last season’s performance on the ground.
Since Rodriguez had experience against UConn while at West Virginia, those matchups provide another, better data point of comparison. In four games from 2004-2007 West Virginia averaged 6 PAN/game offensively and UConn averaged 0 PAN/game defensively. In other words, West Virginia’s ground game average 6 points per game more than the average team that played UConn and the Huskies defended the Mountaineers about on par with the average team.
Based on both West Virginia and Michigan experience, the numbers indicate that Michigan should have an opportunity to do some damage on the ground on Saturday.
When Michigan Passes
Michigan was pretty average passing the ball last year but UConn wasn’t great at covering the pass. The historical numbers are a bit all over the map, the Huskies had a 10 point negative swing from 2008 to 2009.
UConn returns a lot of their defense from last year but the one position group that will be replacing players is the secondary. In 2009 the team had to deal with the midseason murder of starting cornerback Jasper Howard, putting a little perspective on the mostly on-field issues Michigan’s secondary has faced. Of the top 11 UConn players in points taken last year, the only three not returning this year are cornerback Robert McClain, 25 PT, 2nd on team and first among DBs, DE Lindsey Witten, 20 PT, 4th on team and first among DL and S Robert Vaughn, 15 PT and 2nd among DBs.
With the year to year variance these two teams have shown in passing and defending the pass, it is difficult to tell who will pick up the advantage when Michigan puts the ball in the air.
When UConn Runs
Michigan saw their first dip into negative PAN against the run last year, while UConn is coming off back to back strong seasons on the ground.
The UConn running back situation is one where PAN sheds an interesting light that is hidden by tradition stats. Last year UConn split the carries almost evenly between Jordan Todman and Andre Dixon (235 vs 239). Todman ran for 1188 yards and 14 TDs while Dixon had 1093 yards and 14 TDs as well. Despite those very similar stat lines, Todman’s performance was worth 16 points and Dixon’s nearly offset the gains with –15 points.
Unfortunately for Michigan Todman is back and Dixon is gone. The historical trend indicates that Michigan should have the advantage, but with a quality back in Todman returning, Michigan will need a much improved defense performance to limit the UConn rushing attack.
When UConn Passes
After a dreadful stretch through the air in 2005-2008, UConn bounced back last year with their best showing in five years.
UConn has two QB’s with starting experience coming back. Cody Endres who took over in mid-season after an injury, was a modest 1.1 PAN whereas this year’s starter Zach Frazer was a worse –1.5 PAN in action at the beginning and end of the season. Frazer posted a similar –1.6 in 4 games in 2008.
Despite the higher value, Frazer beat out Endres again for the job this season and Endres went on to get suspended for the opener, leaving UConn with the sole experienced QB for Michigan. Unfortunately, Michigan’s secondary will make this matchup interesting, but at least the Huskies are able to trot out a world beater at QB even if he does have 2 years of experience.
History in Openers
When factoring in quality of opponent, Michigan best two games of the Rodriguez era have been the openers. 2008 felt very disappointing at the time, but taking an eventually undefeated and Alabama crushing Utah team to the wire, was the best performance of the season. 2009 saw a much much weaker opponent in Western Michigan, but the utter dismantling Michigan displayed made the 2009 the highest rated game Rodriguez had at Michigan to date. Success in openers had been the norm for Rodriguez at West Virginia. 3 of his last 4 were double digit PAN and two were over 20.
UConn’s sample size is much smaller. 3 of the last 7 years they have opened with 1AA opponents and the four years have seen performance within 5 points or so of average.
Head to Head
In the last four meetings Rodriguez and West Virginia owned UConn. West Virginia average a PAN of 13 while UConn came in at –5 PAN. Even after giving the Huskies a break for how good West Virginia was for several years, they still did worse than average against them.
The 2007 game is a bit of anomaly on this chart. It looks like UConn outplayed West Virginia but the Mountaineers completely dominated the Huskies in the game. The PAN is off because two first half fumbles by UConn meant the offense didn’t have to do much heavy lifting to build a 17 point lead after the first drive of the second half. A 17 point lead means that the plays stop counting towards the PAN, but WVU just kept going. To the tune of nearly 400 yards, 29 PAN all after they already had a 17 point lead. So in other words, 2007 looks like a good performance by UConn, but in reality a couple fluke plays got them in a hole and once they were there, West Virginia buried them.
The All In Look
The history is on Michigan’s side, the two year trend is on Michigan’s side, the strength in openers is on Michigan’s side, the head to head coaching matchup is on Michigan’s side and with homefield, I have Michigan pegged at about a touchdown favorite with about a 75% chance of starting the year off in the win column.
*PAN is calculated by assigning every play a value based on how much the play helped or hurt the offense’s chances of scoring. Every down, distance and line of scrimmage combination is assigned an expected value, the average points scored across college football in that same situation. If a play increases the expected value, the respective teams and players are credited with the amount of increase.
All plays are then adjusted based on strength of opponent. Plays against weak opponents are penalized and downgraded while plays against strong opponents are bumped to reflect the degree of difficulty.
Only games against FBS (D1A) opponents, games against FCS (1AA) opponents are non-existent in any numbers used in this work.
Qualifying Plays (QP) are all plays in the first half and plays in the second half when the game is within two touchdowns. End of half run out the clock drives are also excluded.
...we will beat Uconn.
Inspired by all of the great statisticatin’ done by such MGoUsers as Misopogon, the Mathlete, and, most recently (and in the past), MCalibur, I decided to look into something I’ve been wondering about for a while. Well, four somethings really, all related to the importance of yards on first down in determining eventual success at getting first downs and sustaining drives:
How much do varying numbers of yards on first down affect the probability of getting a first down (or a touchdown) in that series?
Here I was simply wondering about the “how much?” question. It goes without saying that losing 10 yards on first down (or starting first and 20) reduces the probability that you will get a first down, but by how much? Similarly, obviously the more yards you get on first down, the better your chances are of getting a first down on the series, but by how much? Are there thresholds beyond which your probability of getting a first down increases appreciably, or is it more or less a linear relationship, where every additional yard on first down increases your probability of getting a first down by the same amount?
How much variation is there between teams in their ability to recover from bad first down plays?
My assumption here was that with principally running teams like the RR-era WVU, it is harder to overcome a bad first down play than with more balanced teams like the Lloyd-era UM or the Vest’s OSU teams. Conversely, I assumed that when the RR-era WVU or UM teams got at least four yards on first down, a first down on the series was virtually a lead pipe cinch. But, as these were only assumptions, I was interested in doing some analysis to check this out.
How much variation is there across games in the ease with which first downs are gotten, and in the effects of various numbers of yards on the probability of getting a first down?
For example, you would think that statistically it would be easier to get a first down on any given series in a home game, other things equal, right? Or, it would be harder to get a first down on any given series in a game against an opponent with a better defense, right?
Based on my data, the answer to both of these questions is NSFMF. I’ll explain later.
How much do things other than the focus of the analysis, like field position and penalties, affect first down probabilities?
When I started, I knew I wanted to compare Lloyd-UM with RR-UM and RR-WVU, since I wanted to see how the spread n’ shred in its mature form would compare with the more anemic version (UM 2008-2009) and with the DeBordian “rock, rock, rock, rock, rock, ICBM, rock, rock, rock (also rock)” approach. In compiling the sample, I made several choices:
- I did not look at UM in 2008 because I thought it would unfairly penalize Michigan and/or RR, and also I’m pretty sure we invaded Grenada that year and they called off the season.
- I added another comparison team, the 2006-2009 OSU juggernaut. Damn, those guys won a lot of games in those years. Fuckers…
- I omitted all Baby Seal U games (e.g., OSU vs. Youngstown State, UM vs. Delaware State, and WVU vs. Eastern Washington) except
- which I included. I debated about this latter non-omission because I didn’t want to unfairly stack the deck against Lloyd, but I figured (1) omitting Baby Seal U from the other coaches actually (slightly) stacked the deck in favor of Lloyd, and (2) there are Baby Seal Us and then there are Appy State Us.
This is a picture of an actual baby seal.
As for the unit of analysis (or “record” or “case,” depending on your disciplinary background), you may or may not know that ESPN.com publishes the play by play for each game, with pretty detailed information on each play.
At the game level, the sample consists of 122 games played from 2005 to 2009 by three schools (OSU, WVU, UM) and three coaches (Tressel, Rodriguez, Carr). For each game, I recorded:
- the game number in the season (i.e., first, second, …, thirteenth);
- the opponent’s total defense ranking (from NCAA.org); and
- whether it was a home game or not (away- and neutral-field games were coded the same. In retrospect I probably should have distinguished between these, but it didn’t end up mattering anyway).
At the play/series level, the sample consists of 3,529 first down plays and the series these plays began. For the teams of interest (i.e., not the opponents), I recorded the following data for each first down play:
The dependent variable was whether the series ended in a first down.
The primary independent variable was the number of yards gained on first down.
The control variables were:
- the field position on first down;
- the yards to go on first down;
- whether there was an offensive or defensive penalty (or both) on the series (penalties on first down, where first down was repeated, figured into the “yards to go” variable);
- whether there was a turnover on the series;
- whether there was low time (less than a minute) in the second or fourth quarters;
- whether there was a pass or run on first down;
- the quarter the series took place in; and
- the number of previous first downs for the drive in which the first down took place (so, if it is the first first down play in a drive, this variable would be scored 0; if a team makes a first down, this variable would be scored 1 for the second first down play in the same drive).
Table 1 below shows the sample by season and team.
Hierarchical Linear Models
My initial plan was to run two-level hierarchical linear models (HLM), in which first-down plays/series are nested within games. Briefly, HLM allows you to calculate how much of the variation in the dependent variable is due to level-1 (play/series-level) factors like yards on first down, field position, etc., and how much is due to level-2 (game-level) factors like opponent defensive strength, home/away game, etc.
Essentially, HLM would calculate the average probability of getting a first down, as well as the effect of the level-one independent variables on that probability, for each of the 122 games, and then those parameters would be the dependent variables to be predicted as a function of level-2 (game-level) variables.
Fortunately for those of you who are about to stop reading, one of the things I discovered is that there is not significant variation from game to game either in the probability of getting a first down, nor in the effects of the level-1 independent variables, to support an HLM analysis.
This does not mean that, for example, UM had exactly the same average success in getting a first down against OSU as they did against Eastern Michigan. What it does mean is that there is not so much variation from game to game in this average probability that it makes sense to predict that scant amount of variation with game-level factors.
The Probit Binary Response Model
Hence, the following is just a play/series-level analysis, which is probably more intuitive for the reader anyway. Because the dependent variable is dichotomous (0 if no first down on the series, 1 if first down or touchdown), I used the probit binary response model (PBRM). For those of you not steeped in this method, the PBRM is one of several regression-like methods for binary dependent variables.
Probit coefficients are in the metric of the standard normal cumulative distribution function (CDF), also known as z-scores. When you evaluate the standard normal CDF at a given value, it tells you the probability of scoring a “1” on the dependent variable.
The sign and magnitude of probit coefficients are interpreted in the standard way: a negative effect means that the variable lowers the probability of scoring a “1” on the dependent variable, positive coefficients mean that the variable increases the probability, and larger coefficients (in absolute value terms) mean stronger effects.
Except for Table 3 below, I have transformed all coefficients into probabilities, so you don’t have to worry about the metric of the coefficients.
Several Words on Sampling Error
You may remember from some statistics course that it is generally good practice to report not just the point estimates from any statistical analysis, but also an estimate of sampling error. This is why when networks report polling data, they usually say something like “Candidate X is leading Candidate Y by 5 points [the point estimate], with a margin of error plus or minus 3 points [the sampling error estimate].”
Virtually all statistical software packages (I used Stata/SE 10) assume that the data were gathered via a simple random sample, in which all samples of a given size have an equal probability of selection. Clearly, my choice to non-randomly sample three teams and five seasons, and then take a census of all games (except for Baby Seal U games) and first down plays violates this assumption. Hence, this analysis isn’t necessarily representative of the nation-wide effects of first down yards (and other variables) on first-down probabilities. You should interpret all of these findings as merely relating to UM, OSU, and WVU for the years specified.
Figures 1 and 2 below show, respectively, the number of yards gained on first down and the starting field position for any particular series. Recall that there can be multiple series within a drive, so Figure 2 should not be interpreted as the starting field position for the drive.
Note from Figure 1 that the modal number of yards gained on first down is zero. Obviously, this can occur via an incomplete pass, a completed pass for no gain, or a rush for no gain. The distribution is right-skewed, although fairly normally distributed (excluding the zero yards bar) within a range of about a loss of 10 yards and a gain of about 20 yards.
Note from Figure 2 that the modal starting field position is 80 yards from the opponent’s goal line (or the offensive team’s 20). This is largely due to touchbacks on punts or kickoffs, of course.
Table 2 below shows the descriptive statistics by team for the variables used in the analysis. Note that the percentage of first down plays where the series ended in a first down or touchdown ranges from 66% for the 2009 UM team to about 76% for the 2006-2007 WVU teams. This should explain in part the 5-7 record of the former team and the shredding of opponents achieved by the mature WVU teams. Interestingly, OSU and Lloyd-era UM had about the same overall probability of getting a first down.
Time will tell if the RR UM teams can recapture that glory, or whether the spread n’ shred was simply more effective (1) in the Big East, (2) with Pat White/Steve Slaton, or (3) both (1) and (2).
One bit of hopeful evidence comes from the opponent total defense rank (near the bottom of Table 2). It doesn’t appear as though WVU played an appreciably easier average schedule than OSU, and if anything, WVU’s opponents finished their seasons with, on average, better-ranked defenses than either Lloyd-era or RR-era UM.
In terms of the primary independent variable of interest, Figure 3 shows the distribution of yards gained on first down, by team. Note that RR-UM was more likely than the other teams to lose from 1 to 4 yards on first down, less likely to gain from 3 to 5 yards, more likely to gain 6 or 7 yards (there may be a small sample size problem here), and less likely to hit a big play on first down (10 or more yards) than OSU or WVU.
Interestingly, RR’s WVU teams were less likely to gain 0 to 2 yards on first down, which is probably largely due to the lower percentage of passing plays on first down for WVU (17% vs. about 32-34% for the other three teams. This should demonstrate that RR/Magee understand that when you have Pat White, you run the ball on first down (and most downs thereafter). When you have Tate, you have to be more balanced. Say, maybe these guys do know about football…
Other points of interest from Table 2:
- Lloyd’s teams were more disciplined on offense with respect to penalties than the Vest’s teams--about 4.7% of OSU’s series had at least one post-first down offensive penalty (recall that the first down penalties were folded into the “yards to go” variable), compared to 2.8% for Lloyd-UM. RR’s teams fall in between.
- On the other hand, the Vest’s teams drew more post-first down defensive penalties than RR’s teams. Perhaps the passing attack invites more encroachment/pass interference calls than a more ground-based attack?
- Turnovers! About 7.6% of RR-UM’s series ended in turnovers, compared to 4.0 to 4.7% for the other teams. Yikes.
Figures 4-6 show some results from the regression analysis. First, Figure 4 shows the probability of getting a first down after selected numbers of yards on each first down play, assuming (1) it was first and 10, and (2) there was no penalty on the series.
Note that losing five or more yards on first down gives you about a 0.25-0.30 probability of getting a first down, whereas, obviously, gaining 10 or more yards is by definition a first down (on first and 10 at least).
In between these extremes, the first down returns to yards on first down is basically linear, though there are fairly noticeably inflection points between losing 5 or more and losing 1 to 4 yards (the first two points in the curves) and between gaining 3 to 5 and gaining 6 or 7 yards. By the way, I chose these categories based on exploratory analyses that showed that there was no statistically significant difference between gaining, say, 0, 1, or 2 yards.
Finally, notice the similarity between the OSU and Lloyd-UM curves. This shouldn’t be particularly surprising, since those teams pursued fairly similar offensive strategies--lots of off tackle to Hart/Wells interspersed with daggers to Manningham/Ginn.
I was interested to see that WVU dominated the story, at all categories of yards gained on first down. That is, it isn’t true that the WVU offense bogged down especially on small losses or gains on first down. A great offense will overcome.
Figure 5 shows the probability of getting a first down by field position on first down, in 10-yard increments. There are basically four points here:
- Being inside your own 20 reduces your probability of getting a first down, probably because of more conservative play calling;
- There is basically no difference between the 20 and the 50;
- Probabilities go up between the 50 and field goal range (a field goal attempt was coded 0 on the dependent variable, since there was no first down or touchdown);
- The probability goes way down in field goal range, probably because coaches elect to take the 3 points instead of going for it on 4th (see the Mathlete’s excellent diary on this).
Figure 6 shows basically the same trends, broken down by teams. There isn’t much to see here, except that WVU was awesome, RR-UM sucked, and OSU/Lloyd-UM were basically indistinguishable. It looks like a good rule of thumb is that WVU had a 10-percentage point better probability of getting a first down than RR-UM and a 5-percentage point advantage over the Vest and Lloyd.
Table 3 shows the full regression results. There isn’t much new here, but just to recap:
- Yards on first down matters a lot (duh I);
- WVU kicked ass;
- It’s harder to get a first down on first and 20 than first and 5 (duh II);
- Field position doesn’t matter as much as you might think;
- Offensive penalties make it harder to get a first down; defensive penalties make it easier (duh III); and
- Ceteris paribus, passing on first down increases the probability of getting a first down on that series (though in analysis not shown here, I found that, not surprisingly, it increases the chances of a turnover [see Hayes, W.]).
One other thing: in the note to Table 3, it says that the “Pseudo R2” is .3008. This is a statistic calculated in the PBRM that is analogous to the R2 (r-squared) statistic in linear regression, which is interpreted as the percentage of the variation in the dependent variable that is explained by the model. It’s hard to say whether 30% is a lot or a little; all I know from the coding is that there were lots of series in which a team would lose 10 on first down and still get a first down, and others where they would gain 9 on first down and fail to get a first down. So, there is still a large stochastic component to the process.
Stuff You’d Think Might Matter but Didn’t, Statistically
Statistically, variables that had no significant (but see “Several Words on Sampling Error” above) effect on the probability of getting a first down (net of the other variables included in the model shown in Table 3) included:
- Home vs. not home game;
- Which game of the season it was;
- The quarter of the game;
- The drive number (these last two suggest that there is not a robust effect of either “bursting out of the gate,” nor of “starting sluggishly.” Sometimes teams start strong and finish weak, other times the reverse happens);
- Number of previous first downs on a drive. This was interesting to me, because one often thinks, I think, that teams get “hot” on a drive. In other words, each first down makes it successively easier to get the next first down. My analysis suggests this is not true, at least in these data. There are a couple of explanations for this: one is that it does get slightly easier to get a first down the closer you get to your opponent’s goal line (though not in the field goal zone), so the two effects are collinear--the more first downs you get on a drive, the better your field position is, and it is that latter issue that affects first down probabilities. The second goes back to the stochastic component--there are just as many drives where a team will gain 3 first downs and then stall as ones where they will gain 3 first downs and then 2 more.
I have few beyond the things I’ve already mentioned. Basically, yards on first down are incredibly important, but not in any surprising way. The more yards you get, the better your chances are of getting a first down. However, there is a large random component to getting first downs, so yards aren’t everything.
In terms of UM football, it is clear that the mature spread n’ shred is lethal. But you already knew that. The question is whether UM can recapture that WVU magic. I guess I’m optimistic, for several reasons:
- The RR offense requires experienced, athletic players, really at all offensive positions. This we now have, and/or are quickly cultivating.
- A heavily run-based offense is slightly less likely to turn the ball over and much less likely to suffer no gain on first down (due to the lack of incomplete passes). This bodes well for sustained drives.
- WVU played, on average, slightly better defenses (at least if you think total defense rank at the end of the season is a good indicator of defensive strength) than UM on average, and defenses that were as good as those played by OSU, on average. So, at least by this figuring, there is no reason to think that UM’s current schedule is too good for us to be successful.
Obviously, the $1M unanswered question is whether the RR offense will be as successful at UM as it was at WVU. The analysis I have done can’t really speak to this question, but neither does it suggest obvious reasons why it won’t be successful. It does show how powerful the WVU version was, and I for one support giving RR enough time to have a reasonable chance to put that offense into place.
Comments, suggestions, critiques? Let's have ‘em.
If you thought WVU and its fans were insane, check out Cornell's reaction thus far to Boston College hiring Steve Donahue to replace recently-fired Al Skinner as its next basketball head coach.
"I told the folks at BC that they made a great hire," Cornell athletic director Andy Noel said. "Our university really wanted to keep Steve. I'm a little heartbroken, but we turn the page and become a BC fan forever. ... We're appreciative that we had a decade with Steve Donahue."
Ouch. It gets even worse. I personally would not want to be a student in Chestnut Hill right now, or anywhere else in the Boston area for that matter. Brace for impact:
Dale said the players understood why Donahue left.
"It's good for him, and we all know that," Dale said. "The decision he made is based on him and his family. He had to do what's best for him."
I feel bad for Donahue. The guy's reputation is ruined, and he's going to be in a long, bitter legal battle with Cornell. I hope BC has his back in this fight. I also sure hope that Donahue can have a safe move out of Ithaca without too much trouble from those savages. Good luck, Coach Donahue, and godspeed...
So, it is possible that this video has made it through here before and if it has - my apologies. However, a friend sent this to me today and I liked it. Obviously the Michigan hype part is taken from games we were playing not the best teams (EMU, Indiana, ND) but I respect the video maker for leaving out the Baby Seal U game.
The interesting part was the second half of the video where it showed clips of Rod's team in WV and some plays that looked very similar to what some of our guys were doing when things were clicking.
I search came back empty, so when I noticed Tim didn't mention anything about Davion Rogers possibly coming up for an official, I thought I'd post something. Apparently we were down there visiting his school recently.
I don't think anyone around here would be upset if we stole another recruit from West Virginia.