Is 4 years.
Some Analysis of the Importance of Yards on First Down
Inspired by all of the great statisticatin’ done by such MGoUsers as Misopogon, the Mathlete, and, most recently (and in the past), MCalibur, I decided to look into something I’ve been wondering about for a while. Well, four somethings really, all related to the importance of yards on first down in determining eventual success at getting first downs and sustaining drives:
How much do varying numbers of yards on first down affect the probability of getting a first down (or a touchdown) in that series?
Here I was simply wondering about the “how much?” question. It goes without saying that losing 10 yards on first down (or starting first and 20) reduces the probability that you will get a first down, but by how much? Similarly, obviously the more yards you get on first down, the better your chances are of getting a first down on the series, but by how much? Are there thresholds beyond which your probability of getting a first down increases appreciably, or is it more or less a linear relationship, where every additional yard on first down increases your probability of getting a first down by the same amount?
How much variation is there between teams in their ability to recover from bad first down plays?
My assumption here was that with principally running teams like the RR-era WVU, it is harder to overcome a bad first down play than with more balanced teams like the Lloyd-era UM or the Vest’s OSU teams. Conversely, I assumed that when the RR-era WVU or UM teams got at least four yards on first down, a first down on the series was virtually a lead pipe cinch. But, as these were only assumptions, I was interested in doing some analysis to check this out.
How much variation is there across games in the ease with which first downs are gotten, and in the effects of various numbers of yards on the probability of getting a first down?
For example, you would think that statistically it would be easier to get a first down on any given series in a home game, other things equal, right? Or, it would be harder to get a first down on any given series in a game against an opponent with a better defense, right?
Based on my data, the answer to both of these questions is NSFMF. I’ll explain later.
How much do things other than the focus of the analysis, like field position and penalties, affect first down probabilities?
When I started, I knew I wanted to compare Lloyd-UM with RR-UM and RR-WVU, since I wanted to see how the spread n’ shred in its mature form would compare with the more anemic version (UM 2008-2009) and with the DeBordian “rock, rock, rock, rock, rock, ICBM, rock, rock, rock (also rock)” approach. In compiling the sample, I made several choices:
- I did not look at UM in 2008 because I thought it would unfairly penalize Michigan and/or RR, and also I’m pretty sure we invaded Grenada that year and they called off the season.
- I added another comparison team, the 2006-2009 OSU juggernaut. Damn, those guys won a lot of games in those years. Fuckers…
- I omitted all Baby Seal U games (e.g., OSU vs. Youngstown State, UM vs. Delaware State, and WVU vs. Eastern Washington) except
- which I included. I debated about this latter non-omission because I didn’t want to unfairly stack the deck against Lloyd, but I figured (1) omitting Baby Seal U from the other coaches actually (slightly) stacked the deck in favor of Lloyd, and (2) there are Baby Seal Us and then there are Appy State Us.
This is a picture of an actual baby seal.
As for the unit of analysis (or “record” or “case,” depending on your disciplinary background), you may or may not know that ESPN.com publishes the play by play for each game, with pretty detailed information on each play.
At the game level, the sample consists of 122 games played from 2005 to 2009 by three schools (OSU, WVU, UM) and three coaches (Tressel, Rodriguez, Carr). For each game, I recorded:
- the game number in the season (i.e., first, second, …, thirteenth);
- the opponent’s total defense ranking (from NCAA.org); and
- whether it was a home game or not (away- and neutral-field games were coded the same. In retrospect I probably should have distinguished between these, but it didn’t end up mattering anyway).
At the play/series level, the sample consists of 3,529 first down plays and the series these plays began. For the teams of interest (i.e., not the opponents), I recorded the following data for each first down play:
The dependent variable was whether the series ended in a first down.
The primary independent variable was the number of yards gained on first down.
The control variables were:
- the field position on first down;
- the yards to go on first down;
- whether there was an offensive or defensive penalty (or both) on the series (penalties on first down, where first down was repeated, figured into the “yards to go” variable);
- whether there was a turnover on the series;
- whether there was low time (less than a minute) in the second or fourth quarters;
- whether there was a pass or run on first down;
- the quarter the series took place in; and
- the number of previous first downs for the drive in which the first down took place (so, if it is the first first down play in a drive, this variable would be scored 0; if a team makes a first down, this variable would be scored 1 for the second first down play in the same drive).
Table 1 below shows the sample by season and team.
Hierarchical Linear Models
My initial plan was to run two-level hierarchical linear models (HLM), in which first-down plays/series are nested within games. Briefly, HLM allows you to calculate how much of the variation in the dependent variable is due to level-1 (play/series-level) factors like yards on first down, field position, etc., and how much is due to level-2 (game-level) factors like opponent defensive strength, home/away game, etc.
Essentially, HLM would calculate the average probability of getting a first down, as well as the effect of the level-one independent variables on that probability, for each of the 122 games, and then those parameters would be the dependent variables to be predicted as a function of level-2 (game-level) variables.
Fortunately for those of you who are about to stop reading, one of the things I discovered is that there is not significant variation from game to game either in the probability of getting a first down, nor in the effects of the level-1 independent variables, to support an HLM analysis.
This does not mean that, for example, UM had exactly the same average success in getting a first down against OSU as they did against Eastern Michigan. What it does mean is that there is not so much variation from game to game in this average probability that it makes sense to predict that scant amount of variation with game-level factors.
The Probit Binary Response Model
Hence, the following is just a play/series-level analysis, which is probably more intuitive for the reader anyway. Because the dependent variable is dichotomous (0 if no first down on the series, 1 if first down or touchdown), I used the probit binary response model (PBRM). For those of you not steeped in this method, the PBRM is one of several regression-like methods for binary dependent variables.
Probit coefficients are in the metric of the standard normal cumulative distribution function (CDF), also known as z-scores. When you evaluate the standard normal CDF at a given value, it tells you the probability of scoring a “1” on the dependent variable.
The sign and magnitude of probit coefficients are interpreted in the standard way: a negative effect means that the variable lowers the probability of scoring a “1” on the dependent variable, positive coefficients mean that the variable increases the probability, and larger coefficients (in absolute value terms) mean stronger effects.
Except for Table 3 below, I have transformed all coefficients into probabilities, so you don’t have to worry about the metric of the coefficients.
Several Words on Sampling Error
You may remember from some statistics course that it is generally good practice to report not just the point estimates from any statistical analysis, but also an estimate of sampling error. This is why when networks report polling data, they usually say something like “Candidate X is leading Candidate Y by 5 points [the point estimate], with a margin of error plus or minus 3 points [the sampling error estimate].”
Virtually all statistical software packages (I used Stata/SE 10) assume that the data were gathered via a simple random sample, in which all samples of a given size have an equal probability of selection. Clearly, my choice to non-randomly sample three teams and five seasons, and then take a census of all games (except for Baby Seal U games) and first down plays violates this assumption. Hence, this analysis isn’t necessarily representative of the nation-wide effects of first down yards (and other variables) on first-down probabilities. You should interpret all of these findings as merely relating to UM, OSU, and WVU for the years specified.
Figures 1 and 2 below show, respectively, the number of yards gained on first down and the starting field position for any particular series. Recall that there can be multiple series within a drive, so Figure 2 should not be interpreted as the starting field position for the drive.
Note from Figure 1 that the modal number of yards gained on first down is zero. Obviously, this can occur via an incomplete pass, a completed pass for no gain, or a rush for no gain. The distribution is right-skewed, although fairly normally distributed (excluding the zero yards bar) within a range of about a loss of 10 yards and a gain of about 20 yards.
Note from Figure 2 that the modal starting field position is 80 yards from the opponent’s goal line (or the offensive team’s 20). This is largely due to touchbacks on punts or kickoffs, of course.
Table 2 below shows the descriptive statistics by team for the variables used in the analysis. Note that the percentage of first down plays where the series ended in a first down or touchdown ranges from 66% for the 2009 UM team to about 76% for the 2006-2007 WVU teams. This should explain in part the 5-7 record of the former team and the shredding of opponents achieved by the mature WVU teams. Interestingly, OSU and Lloyd-era UM had about the same overall probability of getting a first down.
Time will tell if the RR UM teams can recapture that glory, or whether the spread n’ shred was simply more effective (1) in the Big East, (2) with Pat White/Steve Slaton, or (3) both (1) and (2).
One bit of hopeful evidence comes from the opponent total defense rank (near the bottom of Table 2). It doesn’t appear as though WVU played an appreciably easier average schedule than OSU, and if anything, WVU’s opponents finished their seasons with, on average, better-ranked defenses than either Lloyd-era or RR-era UM.
In terms of the primary independent variable of interest, Figure 3 shows the distribution of yards gained on first down, by team. Note that RR-UM was more likely than the other teams to lose from 1 to 4 yards on first down, less likely to gain from 3 to 5 yards, more likely to gain 6 or 7 yards (there may be a small sample size problem here), and less likely to hit a big play on first down (10 or more yards) than OSU or WVU.
Interestingly, RR’s WVU teams were less likely to gain 0 to 2 yards on first down, which is probably largely due to the lower percentage of passing plays on first down for WVU (17% vs. about 32-34% for the other three teams. This should demonstrate that RR/Magee understand that when you have Pat White, you run the ball on first down (and most downs thereafter). When you have Tate, you have to be more balanced. Say, maybe these guys do know about football…
Other points of interest from Table 2:
- Lloyd’s teams were more disciplined on offense with respect to penalties than the Vest’s teams--about 4.7% of OSU’s series had at least one post-first down offensive penalty (recall that the first down penalties were folded into the “yards to go” variable), compared to 2.8% for Lloyd-UM. RR’s teams fall in between.
- On the other hand, the Vest’s teams drew more post-first down defensive penalties than RR’s teams. Perhaps the passing attack invites more encroachment/pass interference calls than a more ground-based attack?
- Turnovers! About 7.6% of RR-UM’s series ended in turnovers, compared to 4.0 to 4.7% for the other teams. Yikes.
Figures 4-6 show some results from the regression analysis. First, Figure 4 shows the probability of getting a first down after selected numbers of yards on each first down play, assuming (1) it was first and 10, and (2) there was no penalty on the series.
Note that losing five or more yards on first down gives you about a 0.25-0.30 probability of getting a first down, whereas, obviously, gaining 10 or more yards is by definition a first down (on first and 10 at least).
In between these extremes, the first down returns to yards on first down is basically linear, though there are fairly noticeably inflection points between losing 5 or more and losing 1 to 4 yards (the first two points in the curves) and between gaining 3 to 5 and gaining 6 or 7 yards. By the way, I chose these categories based on exploratory analyses that showed that there was no statistically significant difference between gaining, say, 0, 1, or 2 yards.
Finally, notice the similarity between the OSU and Lloyd-UM curves. This shouldn’t be particularly surprising, since those teams pursued fairly similar offensive strategies--lots of off tackle to Hart/Wells interspersed with daggers to Manningham/Ginn.
I was interested to see that WVU dominated the story, at all categories of yards gained on first down. That is, it isn’t true that the WVU offense bogged down especially on small losses or gains on first down. A great offense will overcome.
Figure 5 shows the probability of getting a first down by field position on first down, in 10-yard increments. There are basically four points here:
- Being inside your own 20 reduces your probability of getting a first down, probably because of more conservative play calling;
- There is basically no difference between the 20 and the 50;
- Probabilities go up between the 50 and field goal range (a field goal attempt was coded 0 on the dependent variable, since there was no first down or touchdown);
- The probability goes way down in field goal range, probably because coaches elect to take the 3 points instead of going for it on 4th (see the Mathlete’s excellent diary on this).
Figure 6 shows basically the same trends, broken down by teams. There isn’t much to see here, except that WVU was awesome, RR-UM sucked, and OSU/Lloyd-UM were basically indistinguishable. It looks like a good rule of thumb is that WVU had a 10-percentage point better probability of getting a first down than RR-UM and a 5-percentage point advantage over the Vest and Lloyd.
Table 3 shows the full regression results. There isn’t much new here, but just to recap:
- Yards on first down matters a lot (duh I);
- WVU kicked ass;
- It’s harder to get a first down on first and 20 than first and 5 (duh II);
- Field position doesn’t matter as much as you might think;
- Offensive penalties make it harder to get a first down; defensive penalties make it easier (duh III); and
- Ceteris paribus, passing on first down increases the probability of getting a first down on that series (though in analysis not shown here, I found that, not surprisingly, it increases the chances of a turnover [see Hayes, W.]).
One other thing: in the note to Table 3, it says that the “Pseudo R2” is .3008. This is a statistic calculated in the PBRM that is analogous to the R2 (r-squared) statistic in linear regression, which is interpreted as the percentage of the variation in the dependent variable that is explained by the model. It’s hard to say whether 30% is a lot or a little; all I know from the coding is that there were lots of series in which a team would lose 10 on first down and still get a first down, and others where they would gain 9 on first down and fail to get a first down. So, there is still a large stochastic component to the process.
Stuff You’d Think Might Matter but Didn’t, Statistically
Statistically, variables that had no significant (but see “Several Words on Sampling Error” above) effect on the probability of getting a first down (net of the other variables included in the model shown in Table 3) included:
- Home vs. not home game;
- Which game of the season it was;
- The quarter of the game;
- The drive number (these last two suggest that there is not a robust effect of either “bursting out of the gate,” nor of “starting sluggishly.” Sometimes teams start strong and finish weak, other times the reverse happens);
- Number of previous first downs on a drive. This was interesting to me, because one often thinks, I think, that teams get “hot” on a drive. In other words, each first down makes it successively easier to get the next first down. My analysis suggests this is not true, at least in these data. There are a couple of explanations for this: one is that it does get slightly easier to get a first down the closer you get to your opponent’s goal line (though not in the field goal zone), so the two effects are collinear--the more first downs you get on a drive, the better your field position is, and it is that latter issue that affects first down probabilities. The second goes back to the stochastic component--there are just as many drives where a team will gain 3 first downs and then stall as ones where they will gain 3 first downs and then 2 more.
I have few beyond the things I’ve already mentioned. Basically, yards on first down are incredibly important, but not in any surprising way. The more yards you get, the better your chances are of getting a first down. However, there is a large random component to getting first downs, so yards aren’t everything.
In terms of UM football, it is clear that the mature spread n’ shred is lethal. But you already knew that. The question is whether UM can recapture that WVU magic. I guess I’m optimistic, for several reasons:
- The RR offense requires experienced, athletic players, really at all offensive positions. This we now have, and/or are quickly cultivating.
- A heavily run-based offense is slightly less likely to turn the ball over and much less likely to suffer no gain on first down (due to the lack of incomplete passes). This bodes well for sustained drives.
- WVU played, on average, slightly better defenses (at least if you think total defense rank at the end of the season is a good indicator of defensive strength) than UM on average, and defenses that were as good as those played by OSU, on average. So, at least by this figuring, there is no reason to think that UM’s current schedule is too good for us to be successful.
Obviously, the $1M unanswered question is whether the RR offense will be as successful at UM as it was at WVU. The analysis I have done can’t really speak to this question, but neither does it suggest obvious reasons why it won’t be successful. It does show how powerful the WVU version was, and I for one support giving RR enough time to have a reasonable chance to put that offense into place.
Comments, suggestions, critiques? Let's have ‘em.
Two comments -
1) It'd be great to see error bars on the PBRM Results.
2) Are you controlling for opponent strength? Maybe WV played against lame defenses.
"WVU played, on average, slightly better defenses (at least if you think total defense rank at the end of the season is a good indicator of defensive strength) than UM on average, and defenses that were as good as those played by OSU, on average. So, at least by this figuring, there is no reason to think that UM’s current schedule is too good for us to be successful."
One thing I would really like to see in these articles that talk about any sort of strength of schedule calculation would be to drop the bottom half of the schedule and see how things change.
For an elite team, which is more likely to result in a loss - a whole season against solid but middling competition, or the one or two games against national title contenders? Generally the test of a season for Michigan has not been can they beat Purdue consistently but they can beat Ohio State and Penn State.
I'm willing to believe that the middle and bottom of West Virginia's schedule in the Big East was comparable or even superior... but who they did they play with defenses in the same league as Michigan, OSU, Penn State, Wisconsin, or Iowa - all teams that consistently field top 10 or top 20 defenses.
Scheduling Indiana instead of Delaware State shouldn't really change the outcome of the game for a competitive Division 1 team, but it will definitely help your average opponent ranking.
In 2007, the Big East in general seemed to have pretty solid defense. This year you have to get past your fourth big ten team to hit their first. I haven't taken the time to do a more detailed analysis, but I would find one interesting.
But good work anyway.
I would have to contend with your sentiment that a team getting a string of first downs is not due to the team "getting hot." You state that previous first downs on a possession are irrelevant to the current set of downs (in other words, each set of downs is independent of other first downs on the current drive), but that your numbers suggest that as you get closer to the goalline, the probabiltiy of getting a first down increases. The better your field position is, the more likely you are to get a first down.
My response to this is simply that a team is more likely to achieve a first down when closer to the goalline because of the rhythm it has gained from previous first downs on the current drive. That is, your increased probablity of a first down is because most sets of downs that occur in that area of the field occur there because the team has most likely moved the ball there to begin with, and has established a rhythm.
I'm now questioning whether what I just said makes any sense. I also may have missed something in your entry explaining my point.
What you are saying makes sense. Statistically, the issue is that when I control for both previous first downs and field position, the previous first downs variable is trivial in magnitude and statistically nonsignificant (though again, this isn't a simple random sample).
My sense is that this is mostly due to the fact that drives are made up of basically independent trials, despite the fact that color commentators love to crow about how much the momentum is with team X or team Y's defense is on its heels.
But your explanation makes sense as well--that better field position is one of the mechanisms by which "momentum" or "rhythm" gets converted into multiple first downs.
The probability of a first down starting within 5 yards of the goal line is higher than all other probabilities. But you only have to get 5 yards to get success, instead of 10.
The conventional wisdom that seems supported to me is the rather low probability in the 20-10 range. This is where the field is "compressed" and the defense doesn't have to cover deep. Add on the fact that it's easy to get a field goal and the play calling on average gets very conservative, and you have less probability of a 1st down in the Red Zone than you do at 80-70 yards to go (your own 20-30 on the graph).
Unless I'm reading Figure 5 incorrectly, but it doesn't look at all like rhythm is dictating success. Instead what I see is success is determined by the play calling. And field position influences the play calling.
Now, a possible explanation about the conventional wisdom of getting in a rhythm is that it usually happens when it's very important for the offensive team to score, and thus their play calling is more aggressive throughout the drive.
For myself, and picturing one game, I just think back to the 1988 game against Miami, where Michigan built up almost a 21 point lead, and then pissed it all away by calling conservative offensive plays and punting on 3 and outs. Thus putting the entire effort of winning onto the defense.
Defense's job is to get the ball back for the offense. The offense's job is to take the ball into the end zone, thus gaining the maximum points. This should be the goal for every offensive series independent of the score or the time left on the clock, since if you score 7 points every possession, the best the other team can do is tie.
punting is just giving an extra opportunity to the other team. Afterall, a touchdown is equal to or better than two field goals. If you have better than a 50/50 chance at making a first down on a series you should use all 4 downs. Even if the other team starts in field goal range, since it's likely they make conservative play calls and kick the field goal.
Part of the system we run is to establish a fast rhythm, thereby not allowing the defense to counter by substitution(s).
Could this be part of the play calling success?
I understand that you are interpreting the data - but I am questioning this...
because you're working knowing you have four downs instead of three to pick it up.
Ahhh okay, thanks for the replies
I was expecting something a little shorter and more summary:
"Yards good. First downs sometimes marginally less good."
I appreciate all the extra effort, though.
I think a reasonable time in this case is 6 years. I realize that is probably a minority view, but with a complete changeover in style of play, it will take that long. Without such a complete change, 4 years might well be reasonable.
I believe a reasonable chance might be 4 years in a situation where the style of play of the incoming coach does not vary that much from the previous style of play. With a complete change in style of play, players leaving as a result, leading to losing seasons in the beginning years, a reasonable chance in this case is 6 years.
If the offense is not getting it done in 4 years, with three full years of recruiting and Barwiscizing his type of players, then we have issues. That is the offense getting it done ...
The defense hit nadir last year (or maybe 2008), so he has say three years including this season to deliver there.
I guess that stacks to 6 years. But honestly, if the offense doesn't really click this year, and really really click next year (ala Henne, Hart and Long as Juniors in 2006), then we really have to start questioning level of offensive genius ...
That said, I think the offense really does click this season, and even more next. So it becomes a question as to whether the defense can do well enough that offensive (over) production wins the day.
Does the charts series above show that we were actually more likely to gain 6-7 yards on 1st down than WVU, OSU and LC UofM? Is that showing shred plus Tate is better than pro-set plus Chad?
El Jeffe, just can't quit you!
... to anyone who read every single word of the entire article and understood it in its entirety. I think it was good for us, but I could be wrong.
would be to see the sample size in games the same for OSU, WVU and the Carr era. Maybe do WVU and Carr era UM from 2003-2007.
I just can't picture RR saying at a presser, "The hierarchical linear models suggest that successful first downs are marginally important to the likelihood of subsequent first downs and hence scoring."