Inspired by all of the great statisticatin’ done by such MGoUsers as Misopogon, the Mathlete, and, most recently (and in the past), MCalibur, I decided to look into something I’ve been wondering about for a while. Well, four somethings really, all related to the importance of yards on first down in determining eventual success at getting first downs and sustaining drives:
How much do varying numbers of yards on first down affect the probability of getting a first down (or a touchdown) in that series?
Here I was simply wondering about the “how much?” question. It goes without saying that losing 10 yards on first down (or starting first and 20) reduces the probability that you will get a first down, but by how much? Similarly, obviously the more yards you get on first down, the better your chances are of getting a first down on the series, but by how much? Are there thresholds beyond which your probability of getting a first down increases appreciably, or is it more or less a linear relationship, where every additional yard on first down increases your probability of getting a first down by the same amount?
How much variation is there between teams in their ability to recover from bad first down plays?
My assumption here was that with principally running teams like the RR-era WVU, it is harder to overcome a bad first down play than with more balanced teams like the Lloyd-era UM or the Vest’s OSU teams. Conversely, I assumed that when the RR-era WVU or UM teams got at least four yards on first down, a first down on the series was virtually a lead pipe cinch. But, as these were only assumptions, I was interested in doing some analysis to check this out.
How much variation is there across games in the ease with which first downs are gotten, and in the effects of various numbers of yards on the probability of getting a first down?
For example, you would think that statistically it would be easier to get a first down on any given series in a home game, other things equal, right? Or, it would be harder to get a first down on any given series in a game against an opponent with a better defense, right?
Based on my data, the answer to both of these questions is NSFMF. I’ll explain later.
How much do things other than the focus of the analysis, like field position and penalties, affect first down probabilities?
When I started, I knew I wanted to compare Lloyd-UM with RR-UM and RR-WVU, since I wanted to see how the spread n’ shred in its mature form would compare with the more anemic version (UM 2008-2009) and with the DeBordian “rock, rock, rock, rock, rock, ICBM, rock, rock, rock (also rock)” approach. In compiling the sample, I made several choices:
- I did not look at UM in 2008 because I thought it would unfairly penalize Michigan and/or RR, and also I’m pretty sure we invaded Grenada that year and they called off the season.
- I added another comparison team, the 2006-2009 OSU juggernaut. Damn, those guys won a lot of games in those years. Fuckers…
- I omitted all Baby Seal U games (e.g., OSU vs. Youngstown State, UM vs. Delaware State, and WVU vs. Eastern Washington) except
- which I included. I debated about this latter non-omission because I didn’t want to unfairly stack the deck against Lloyd, but I figured (1) omitting Baby Seal U from the other coaches actually (slightly) stacked the deck in favor of Lloyd, and (2) there are Baby Seal Us and then there are Appy State Us.
This is a picture of an actual baby seal.
As for the unit of analysis (or “record” or “case,” depending on your disciplinary background), you may or may not know that ESPN.com publishes the play by play for each game, with pretty detailed information on each play.
At the game level, the sample consists of 122 games played from 2005 to 2009 by three schools (OSU, WVU, UM) and three coaches (Tressel, Rodriguez, Carr). For each game, I recorded:
- the game number in the season (i.e., first, second, …, thirteenth);
- the opponent’s total defense ranking (from NCAA.org); and
- whether it was a home game or not (away- and neutral-field games were coded the same. In retrospect I probably should have distinguished between these, but it didn’t end up mattering anyway).
At the play/series level, the sample consists of 3,529 first down plays and the series these plays began. For the teams of interest (i.e., not the opponents), I recorded the following data for each first down play:
The dependent variable was whether the series ended in a first down.
The primary independent variable was the number of yards gained on first down.
The control variables were:
- the field position on first down;
- the yards to go on first down;
- whether there was an offensive or defensive penalty (or both) on the series (penalties on first down, where first down was repeated, figured into the “yards to go” variable);
- whether there was a turnover on the series;
- whether there was low time (less than a minute) in the second or fourth quarters;
- whether there was a pass or run on first down;
- the quarter the series took place in; and
- the number of previous first downs for the drive in which the first down took place (so, if it is the first first down play in a drive, this variable would be scored 0; if a team makes a first down, this variable would be scored 1 for the second first down play in the same drive).
Table 1 below shows the sample by season and team.
Hierarchical Linear Models
My initial plan was to run two-level hierarchical linear models (HLM), in which first-down plays/series are nested within games. Briefly, HLM allows you to calculate how much of the variation in the dependent variable is due to level-1 (play/series-level) factors like yards on first down, field position, etc., and how much is due to level-2 (game-level) factors like opponent defensive strength, home/away game, etc.
Essentially, HLM would calculate the average probability of getting a first down, as well as the effect of the level-one independent variables on that probability, for each of the 122 games, and then those parameters would be the dependent variables to be predicted as a function of level-2 (game-level) variables.
Fortunately for those of you who are about to stop reading, one of the things I discovered is that there is not significant variation from game to game either in the probability of getting a first down, nor in the effects of the level-1 independent variables, to support an HLM analysis.
This does not mean that, for example, UM had exactly the same average success in getting a first down against OSU as they did against Eastern Michigan. What it does mean is that there is not so much variation from game to game in this average probability that it makes sense to predict that scant amount of variation with game-level factors.
The Probit Binary Response Model
Hence, the following is just a play/series-level analysis, which is probably more intuitive for the reader anyway. Because the dependent variable is dichotomous (0 if no first down on the series, 1 if first down or touchdown), I used the probit binary response model (PBRM). For those of you not steeped in this method, the PBRM is one of several regression-like methods for binary dependent variables.
Probit coefficients are in the metric of the standard normal cumulative distribution function (CDF), also known as z-scores. When you evaluate the standard normal CDF at a given value, it tells you the probability of scoring a “1” on the dependent variable.
The sign and magnitude of probit coefficients are interpreted in the standard way: a negative effect means that the variable lowers the probability of scoring a “1” on the dependent variable, positive coefficients mean that the variable increases the probability, and larger coefficients (in absolute value terms) mean stronger effects.
Except for Table 3 below, I have transformed all coefficients into probabilities, so you don’t have to worry about the metric of the coefficients.
Several Words on Sampling Error
You may remember from some statistics course that it is generally good practice to report not just the point estimates from any statistical analysis, but also an estimate of sampling error. This is why when networks report polling data, they usually say something like “Candidate X is leading Candidate Y by 5 points [the point estimate], with a margin of error plus or minus 3 points [the sampling error estimate].”
Virtually all statistical software packages (I used Stata/SE 10) assume that the data were gathered via a simple random sample, in which all samples of a given size have an equal probability of selection. Clearly, my choice to non-randomly sample three teams and five seasons, and then take a census of all games (except for Baby Seal U games) and first down plays violates this assumption. Hence, this analysis isn’t necessarily representative of the nation-wide effects of first down yards (and other variables) on first-down probabilities. You should interpret all of these findings as merely relating to UM, OSU, and WVU for the years specified.
Figures 1 and 2 below show, respectively, the number of yards gained on first down and the starting field position for any particular series. Recall that there can be multiple series within a drive, so Figure 2 should not be interpreted as the starting field position for the drive.
Note from Figure 1 that the modal number of yards gained on first down is zero. Obviously, this can occur via an incomplete pass, a completed pass for no gain, or a rush for no gain. The distribution is right-skewed, although fairly normally distributed (excluding the zero yards bar) within a range of about a loss of 10 yards and a gain of about 20 yards.
Note from Figure 2 that the modal starting field position is 80 yards from the opponent’s goal line (or the offensive team’s 20). This is largely due to touchbacks on punts or kickoffs, of course.
Table 2 below shows the descriptive statistics by team for the variables used in the analysis. Note that the percentage of first down plays where the series ended in a first down or touchdown ranges from 66% for the 2009 UM team to about 76% for the 2006-2007 WVU teams. This should explain in part the 5-7 record of the former team and the shredding of opponents achieved by the mature WVU teams. Interestingly, OSU and Lloyd-era UM had about the same overall probability of getting a first down.
Time will tell if the RR UM teams can recapture that glory, or whether the spread n’ shred was simply more effective (1) in the Big East, (2) with Pat White/Steve Slaton, or (3) both (1) and (2).
One bit of hopeful evidence comes from the opponent total defense rank (near the bottom of Table 2). It doesn’t appear as though WVU played an appreciably easier average schedule than OSU, and if anything, WVU’s opponents finished their seasons with, on average, better-ranked defenses than either Lloyd-era or RR-era UM.
In terms of the primary independent variable of interest, Figure 3 shows the distribution of yards gained on first down, by team. Note that RR-UM was more likely than the other teams to lose from 1 to 4 yards on first down, less likely to gain from 3 to 5 yards, more likely to gain 6 or 7 yards (there may be a small sample size problem here), and less likely to hit a big play on first down (10 or more yards) than OSU or WVU.
Interestingly, RR’s WVU teams were less likely to gain 0 to 2 yards on first down, which is probably largely due to the lower percentage of passing plays on first down for WVU (17% vs. about 32-34% for the other three teams. This should demonstrate that RR/Magee understand that when you have Pat White, you run the ball on first down (and most downs thereafter). When you have Tate, you have to be more balanced. Say, maybe these guys do know about football…
Other points of interest from Table 2:
- Lloyd’s teams were more disciplined on offense with respect to penalties than the Vest’s teams--about 4.7% of OSU’s series had at least one post-first down offensive penalty (recall that the first down penalties were folded into the “yards to go” variable), compared to 2.8% for Lloyd-UM. RR’s teams fall in between.
- On the other hand, the Vest’s teams drew more post-first down defensive penalties than RR’s teams. Perhaps the passing attack invites more encroachment/pass interference calls than a more ground-based attack?
- Turnovers! About 7.6% of RR-UM’s series ended in turnovers, compared to 4.0 to 4.7% for the other teams. Yikes.
Figures 4-6 show some results from the regression analysis. First, Figure 4 shows the probability of getting a first down after selected numbers of yards on each first down play, assuming (1) it was first and 10, and (2) there was no penalty on the series.
Note that losing five or more yards on first down gives you about a 0.25-0.30 probability of getting a first down, whereas, obviously, gaining 10 or more yards is by definition a first down (on first and 10 at least).
In between these extremes, the first down returns to yards on first down is basically linear, though there are fairly noticeably inflection points between losing 5 or more and losing 1 to 4 yards (the first two points in the curves) and between gaining 3 to 5 and gaining 6 or 7 yards. By the way, I chose these categories based on exploratory analyses that showed that there was no statistically significant difference between gaining, say, 0, 1, or 2 yards.
Finally, notice the similarity between the OSU and Lloyd-UM curves. This shouldn’t be particularly surprising, since those teams pursued fairly similar offensive strategies--lots of off tackle to Hart/Wells interspersed with daggers to Manningham/Ginn.
I was interested to see that WVU dominated the story, at all categories of yards gained on first down. That is, it isn’t true that the WVU offense bogged down especially on small losses or gains on first down. A great offense will overcome.
Figure 5 shows the probability of getting a first down by field position on first down, in 10-yard increments. There are basically four points here:
- Being inside your own 20 reduces your probability of getting a first down, probably because of more conservative play calling;
- There is basically no difference between the 20 and the 50;
- Probabilities go up between the 50 and field goal range (a field goal attempt was coded 0 on the dependent variable, since there was no first down or touchdown);
- The probability goes way down in field goal range, probably because coaches elect to take the 3 points instead of going for it on 4th (see the Mathlete’s excellent diary on this).
Figure 6 shows basically the same trends, broken down by teams. There isn’t much to see here, except that WVU was awesome, RR-UM sucked, and OSU/Lloyd-UM were basically indistinguishable. It looks like a good rule of thumb is that WVU had a 10-percentage point better probability of getting a first down than RR-UM and a 5-percentage point advantage over the Vest and Lloyd.
Table 3 shows the full regression results. There isn’t much new here, but just to recap:
- Yards on first down matters a lot (duh I);
- WVU kicked ass;
- It’s harder to get a first down on first and 20 than first and 5 (duh II);
- Field position doesn’t matter as much as you might think;
- Offensive penalties make it harder to get a first down; defensive penalties make it easier (duh III); and
- Ceteris paribus, passing on first down increases the probability of getting a first down on that series (though in analysis not shown here, I found that, not surprisingly, it increases the chances of a turnover [see Hayes, W.]).
One other thing: in the note to Table 3, it says that the “Pseudo R2” is .3008. This is a statistic calculated in the PBRM that is analogous to the R2 (r-squared) statistic in linear regression, which is interpreted as the percentage of the variation in the dependent variable that is explained by the model. It’s hard to say whether 30% is a lot or a little; all I know from the coding is that there were lots of series in which a team would lose 10 on first down and still get a first down, and others where they would gain 9 on first down and fail to get a first down. So, there is still a large stochastic component to the process.
Stuff You’d Think Might Matter but Didn’t, Statistically
Statistically, variables that had no significant (but see “Several Words on Sampling Error” above) effect on the probability of getting a first down (net of the other variables included in the model shown in Table 3) included:
- Home vs. not home game;
- Which game of the season it was;
- The quarter of the game;
- The drive number (these last two suggest that there is not a robust effect of either “bursting out of the gate,” nor of “starting sluggishly.” Sometimes teams start strong and finish weak, other times the reverse happens);
- Number of previous first downs on a drive. This was interesting to me, because one often thinks, I think, that teams get “hot” on a drive. In other words, each first down makes it successively easier to get the next first down. My analysis suggests this is not true, at least in these data. There are a couple of explanations for this: one is that it does get slightly easier to get a first down the closer you get to your opponent’s goal line (though not in the field goal zone), so the two effects are collinear--the more first downs you get on a drive, the better your field position is, and it is that latter issue that affects first down probabilities. The second goes back to the stochastic component--there are just as many drives where a team will gain 3 first downs and then stall as ones where they will gain 3 first downs and then 2 more.
I have few beyond the things I’ve already mentioned. Basically, yards on first down are incredibly important, but not in any surprising way. The more yards you get, the better your chances are of getting a first down. However, there is a large random component to getting first downs, so yards aren’t everything.
In terms of UM football, it is clear that the mature spread n’ shred is lethal. But you already knew that. The question is whether UM can recapture that WVU magic. I guess I’m optimistic, for several reasons:
- The RR offense requires experienced, athletic players, really at all offensive positions. This we now have, and/or are quickly cultivating.
- A heavily run-based offense is slightly less likely to turn the ball over and much less likely to suffer no gain on first down (due to the lack of incomplete passes). This bodes well for sustained drives.
- WVU played, on average, slightly better defenses (at least if you think total defense rank at the end of the season is a good indicator of defensive strength) than UM on average, and defenses that were as good as those played by OSU, on average. So, at least by this figuring, there is no reason to think that UM’s current schedule is too good for us to be successful.
Obviously, the $1M unanswered question is whether the RR offense will be as successful at UM as it was at WVU. The analysis I have done can’t really speak to this question, but neither does it suggest obvious reasons why it won’t be successful. It does show how powerful the WVU version was, and I for one support giving RR enough time to have a reasonable chance to put that offense into place.
Comments, suggestions, critiques? Let's have ‘em.