a terrible blight on our fine country
fourth down decisions
The article at the link makes 4 points about the hazards of decision-making in football. The study used data from the NFL, but it's definitely relevant to the college game.
In particular, it seems relevant discussion on MGoBlog about:
1. going for it on 4th down (do it!)
2. whether it helps the team to fire the coach (not really, but there needs to be a GERG exception, IMO)
3. winning with someone else's talent (WOO! go Brady, Borges, and Mattison)
Why Rich Rod And Other Coaches Don't (And Shouldn't) Always Make Decisions Based On Those Statistical Percentages – And A Proposed Decision Table Coaches Could Use
Before beginning, let me state that (to the best of my knowledge) the statistical calculations that all of these analyzes use are absolutely correct. However, further review and some additional analysis reveals there are very good reasons coaches do not (and should not) always use the statistical percentages to make decisions.
Synopsis: Using probability and statistics, several different analyzes have concluded that football teams should Go For It far more often on 4th down (even when they are in their own end of the field). However, virtually no football coach (at the college or pro level) even comes close to following this scientific decision-making criteria. So, the obvious question is, "Why don't coaches make decisions during a game based on the statistical percentages?" There are three shortcomings of using the statistical analysis:
- The statistical analyzes set decision thresholds that are too precise and too low.
- Using statistical analysis to make decisions about a very few events within a football game is mathematical fallacy (especially at the precision proposed).
- The application of expected value to make decisions in football is problematic.
Therefore, to compensate for these shortcomings, decision thresholds that are less precise and significantly higher should be used.
Revised Decision Threshold Table: The analyzes that have been documented to date have not included a decision table showing the magnitude of the net advantage used to make recommendations. Here is a decision table that shows the magnitude of the net advantage and uses less precise and significantly higher thresholds. While this significantly reduces when a team should Go For It on 4th down, the results are more credible, more likely to be followed in actual games, and more likely to benefit the offense.
In the table, ONLY the 10 events in the larger blue font in the light blue shaded cells (the upper right hand corner of the table) are recommended to Go For It on 4th down. They are:
- Fourth and 1 yard to go from your own 40 yard line to the 50 yard line
- Fourth and 3 yards to go (or less) from the opponents 49 yard line to the opponents 40 yard line
- Fourth and 5 yards to go (or less) from the opponents 39 yard line to the opponents 30 yard line
The recommendation is to Go For It under these criteria when the score is relatively close and time is not a factor. Specific game conditions (such as weather conditions, earlier possessions during the game, current score, time remaining, confidence in having a play that will get the first down, etc.) may provide insight to modify the recommendations. I stopped at the opponent's 30 yard line because inside the 30 yard line, field goals become reasonable for most college football teams. In college football, teams have significantly different field goal success rates and any analysis should be based on that team's anticipated field goal success rate.
Comments on the Decision Table: Key points to consider when reviewing the decision table are:
- I deliberately smoothed the data, looked only at 10 yard increments, and rounded the results to reflect the margin of error inherent in the analysis.
- If the decision threshold is set as breakeven (+0.00), comparisons to other analyzes such as The Mathlete indicate a high level of consistency. (BTW, the red (0.0) numbers in the table are actually very small negative numbers rounded to (0.0). The black 0.0 numbers are very small positive numbers rounded to +0.0)
- I took the numbers in the table and calculated one standard deviation (not sure how valid this is). One standard deviation is 0.9 and I decided to use 1.00 as the threshold. So, the threshold can be described as, "Go For It whenever your expected points are 1.00 or greater".
- Glass half full or half empty? Take a look at the values in the table. Except for the positive values in the upper-right hand corner, and the negative values in the lower right hand corner, the others are very close to 0.00. From a conservative standpoint (half empty), the conclusion is to only Go For It in a limited number of cases. From an aggressive standpoint (half full), the conclusion would be to almost always Go For It except for a limited number of cases.
- The "probability of scoring" was added to provide another view of the impact of decisions. For example, if you Go For It and make it from your own 40 yard line, you still only have a 32% chance of scoring (7 points). However, if you don't make it, the opponent gets the ball at your 40 yard line (i.e. Opp 40 in the table) and has a 46% chance to score. If you punt to the 20 yard line, the opponent has just a 20% chance of scoring. This relationship holds true for all the other yard lines – if you Go For It and do not make it, you more than double the opponent's chance of scoring (versus punting).
An Example of An Advantage of 1.1 Expected Points Per 4th Down Attempt:
Since it is obviously impossible to score 1.1 points, what does an advantage of 1.1 expected points mean? Well, it does not mean you are going to be successful and make the 1st down. Even if you make the 1st down, it does not mean you are going to score on this possession or that you will score before your opponent. It only means that over a large number of similar possessions over several games the net points you score divided by the number of 4th down attempts will equal 1.1 (each specific attempt may result in: turnover on downs, a punt, a FG, a missed FG, a TD, a subsequent turnover, etc.). Here is one possible scenario of what actual game results could look like with 4th down and 1 at your own 40 yard line:
This is, deliberately, a very simple scenario. It will likely take many more attempts than just 4 to approach the expected points in the table and there are many other possible scoring combinations. However, it does provide examples of how expected points translate to actual game results. Items to note in this example:
- The offense is at a net disadvantage until the fourth attempt.
- Net Offensive Points Per 4th Down Attempt (1.0) is close to the expected points per attempt (1.1) in the decision table.
- Total Offensive Points Per Possession (2.3) is close to the expected points for an offense from its own 40 yard line (2.2).
- Total Opponent Points Per Possession (3.0) is close to the expected points for a drive starting from the opponents 40 yard line (3.2).
This example also illustrates the significant risks involved in the decision: if the game ends prior to the 4th attempt, the offense is at a net disadvantage of 3 points and the result may be that you lose the game (even though the expected point analysis does not directly state this as even a possibility).
Development of the Proposed Decision Table: The decision table was derived from two basic sources: Football Outsiders Figure 1: Offensive Efficiency From Field Position and the Mathlete Never Punt With Denard? Fourth Down Strategy Revisited. I used the FO table (shown below) to calculate Expected Offensive Points by field position and the Mathlete's table for 4th down conversion rates. Note that many of the data points I used are not the exact numbers from these two sources. I smoothed the actual data to eliminate some minor anomalies (BTW, this does not affect the results – it is just a pet peeve with me).
The results in the decision table appear to be reasonable based on a comparison to these two sources as well as the Advanced NFL Stats When Should We Go For It On 4th Down? and David Romer, "Do Firms Maximize? Evidence From Professional Football", 2005. I did not use dynamic programming but used what I believe is a reasonable approximation. If anyone has a decision table with different or more accurate numbers, it would be great to compare and contrast the results. The decision table consists of:
Column 1: Yards To Go
Column 2: 4th Down Success Rate. This is based on The Mathlete's data for an average college football team. This introduces the first potential for a significant margin of error. Even if this was the actual success rate for a specific team over the first 10 games of the current season, does anyone believe it is the exact success rate for the 11th game? Of course not. But this does not completely invalidate the analysis. It does mean the decision should use less precise with higher thresholds.
Yellow Row At Bottom of Table: Starting Field Position (on the 4th down play).
Light Blue Row At Bottom of Table: Expected Points Per Offensive Possession (from that field position)
Orange Row At Bottom of Table: Probability of Scoring (7 Points) This is (Expected Offensive Points / 7) and provided as a reference only – not used in the calculations.
Columns 3-9: Expected Offensive Points (EP) Per 4th Down Attempt of Decision. This is based on the probability of making the first down, the starting field position, the expected offensive points from each specific field position, and the average net punting distance. The decision table provides the net expected offensive points per 4th down attempt of Going For It versus Punting. A positive number indicates a net advantage for the offense and a negative number indicates a net advantage to the opponent. I'll use a team's own 40 yard line with 4th down and 1 as the example.
EP of Decision To Go For It = EP (Make It) + EP (Fail)
Expected Offensive Points of Making It on 4th Down Is straightforward:
EP (Make It) = Probability of Making It X Expected Points At This Field Position = 72% X 2.2 = 1.58
Expected Points of Failing to Make It on 4th Down is obviously negative but also a bit tricky. You are still going to give the opponent the ball if you decide to punt rather than Go For It. So, the opponent would have some expected points anyway but based on a different field position. Therefore, I use the NET Expected Points in the calculation:
Net Expected Points = Expected Points After Failure To Convert – Expected Points After Punt
Expected Points After Failure To Convert = (3.2) Points (they are now on your 40 not their own 40)
Field Position After Punt = their own 20 yard line
Expected Points After Punt = (1.4) Points (they are now on their own 20)
Net Expected Points = (3.2) – (1.4) = (1.8) Points
EP (Fail) = Probability of Failing To Make It On 4th Down X NET Expected Points = 28% X (1.8) = (0.50)
EP of Decision To Go For It = EP (Make It) + EP (Fail) = 1.58 + (0.50) = 1.08
Background: The folks at Football Outsiders analyze college football using two systems (FEI and S&P+), Advanced NFL Stats provides analysis of pro football, the Mathlete has his analysis, and I am sure there are several others. The claim to fame for most of these systems is that a computer can take advantage of a statistical analysis of huge amounts of data: "nearly 20,000 possessions every season in FBS college football" or "every play of all 800+ of a season's FBS college football games (140,000 plays)", etc. A computer analysis is required because the human brain is simply incapable of processing this amount of data.
In addition to the primary result of ranking college football teams, these systems provide other analysis such as Never Punt With Denard? Fourth Down Strategy Revisited, the success rate of scoring in college football from every starting position on the field, or When Should We Go For It On 4th Down?
Here is the FO table that I used to calculate Expected Offensive Points by field position.
The Statistical Analyzes Set Decision Thresholds That Are Too Precise and Too Low: The decision threshold for all of these analyzes appear to have been set at breakeven (+0.00). This ignores the inherent margin of error and assumes a coach should take significant risks even when the rewards are essentially zero. (One example is the recommendation that teams should Go For It on 4th and 1 from their own 15 yard line!) The result is a loss of credibility in the analysis and a reluctance to believe and/or follow any of the recommendations. Here are three examples of the recommendations of when to Go For It on 4th down. The first is from the Mathlete:
The second is from Advanced NFL Stats:
The third is from the seminal investigation of the choice in football between kicking and trying for a first down on fourth down, David Romer, "Do Firms Maximize? Evidence From Professional Football", 2005.
Notice that all three of these analyzes recommend that a team should Go For It on 4th and 1 (Mathlete) or even 4th and 2 (Advanced NFL Stats and Romer) from your own 10-13 yard line! The reason? All three of these use the very precise and very low criteria that any value above 0.00 is an advantage to the offense and, therefore, warrants going for it on 4th down. This would be analogous to ticketing everyone that is going 0.01 miles over the speed limit – technically correct but impractical in the real world. IMO, anyone presented with the recommendation, "Go For It with 4th and 1 yard every time on your own 13 yard line" would be in disbelief and would dismiss any and all other recommendations from the same analysis.
In addition, the end result of these decisions is that "This evidence suggests that a rough estimate of the potential gains from going for it more often on fourth downs over the whole game is …an increase of about 2.1 percentage points in the probability of winning." (David Romer, "Do Firms Maximize? Evidence From Professional Football" 2005, Page 28). With a 12 game college football season, this corresponds to just one additional win every four seasons! Thus, you would expect a coach to Go For It on 4th down in hundreds of different scenarios (depending on field position, yards to go, expected conversion rates, expected net punting distance, expected field goal distance, game circumstances, etc.) on the prediction that every 4 years the team will win one extra game.
Using statistical analysis to make decisions about a very few events within a football game is mathematical fallacy (especially at the precision proposed): It is somewhat ironic that the advantages gained through the statistical analysis of tens of thousands (or hundreds of thousands) of data points is, in fact, why the results are not, can not, and should not be used to make decisions during a football game. In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
The law of averages is a term used to express a belief that outcomes of a random event will "even out" within a small sample. As invoked in everyday life, the "law" usually reflects bad statistics or wishful thinking rather than any mathematical principle. While the law of of large numbers does reflect that a random variable will reflect its underlying probability over a very large sample, the law of averages typically assumes that unnatural short-term "balance" must occur.
The 4th down analysis relies on the law of averages and not the law of large numbers. Decisions based on the law of averages (also called the gambler's fallacy) are a recipe for failure.
One of the critical inputs of 4th down analysis is the conversion rate that is anticipated on 4th down for various yards to go. Because there are so few 4th down attempts, all of the analyzes use 3rd down conversion rates instead. Let's assume that the anticipated conversion rate on 4th and 1 yard to go is 75% and that this is based on actual data from thousands of 3rd down and 1 attempts. The theory of large numbers predicts that, over a large number of 4th and 1 attempts, a team should expect that 3 out of every 4 attempts of 4th and 1 will be successful. Unfortunately, there are likely to be only a very few 4th and 1 attempts in a single football game – often less than a total of 4. If you have just one 4th down and 1, it is either successful or unsuccessful. If successful, the result will be better for the offense than the expected value in the analysis (hooray!). If unsuccessful, the result may be that you lose the game even though the expected value analysis does not directly state this as a possibility (it may be stated in a footnote – it may not).
The application of expected value to make decisions in football is problematic. The concept of expected value originated in the 17th century, was defined explicitly in 1814 by Pierre-Simon Laplace, and is used extensively in probability and statistics. However, the expected value is only a theoretical value and may be unlikely or even impossible (such as having 2.5 children). Expected value is difficult to reconcile in football since the only possible outcome of a possession is 0, 3, or 7 points (ignoring safeties, missed PATs, or 2 points PATs). It is difficult to make decisions based upon "a net advantage of 1.1 expected points per 4th down attempt".
"I don't mean much to you, but you mean everything to me."
- Nearly Every Michigan Opponent, 2010.
After reading the excellent diaries prescribing proper fourth down etiquette according to the numbers, odds, and expected values, it makes sense to me that when you have the nation's leading rusher on a 4th and 1, you go for it. Period. But this isn't about our offense's decisions.
Most defenses must put together three consecutive good plays. Michigan's defense has a hard enough time doing that as is, but for whatever reason I've noticed this year that our defense has faced an usually large number of aggressive opponents making the (tactically correct) decision to go for it on 4th down. Part of me thought this was just me being paranoid. But tonight after the game I wanted to see the actual numbers, and boy was this hunch right.
Here is a chart with Michigan and our Big10 opponents in 2010 and the number of 4th downs each team has had to face:
Chart of Opponents Going For It On 4th Down
|Average Per Game||Total||Converted||
Does that jump of the page at you? It should.
Michigan, through five games, is facing almost twice as many fourth down conversion attempts as the next team (MSU) and three times or more as many as the rest of our Big10 opponents. Our defense is facing an enormous task of shutting down these hyper aggressive teams.
Does that mean our defense is just bad?
Contrary to that, I assert that it is an underlying trend in the games that Michigan plays, one that reconciles nicely another statistical blip coming from my stats on Normalized PPG and YPG, wherein our opponents typically far exceed their season standards when playing against Michigan.
Our opponents most of the time play their best game of the year against us.
Michigan is by far and away the red-letter, circled-twice, highlighted, make-or-break game of the year for every single team we have played. This will likely continue through Ohio State. Our opponents each and every week have thrown (and will throw) the whole playbook at us, and take risks when they normally would not - for a chance at knocking off Michigan.
- UConn wanted ever so badly to bust open it's season as a Big East Title contender.
- UMass wanted to be The Horror II.
- BGSU wanted to be Toledo.
- Indiana was absolutely out for blood big time.
Only Notre Dame, with their new head coach and coming off of a win and playing us at home, (despite us being rivals neither ND nor Michigan believes the other to be THE big rival), doesn't fit the bill of someone willing to sell their own mother in order to beat Michigan....and Notre Dame was 0 for 0 on 4th downs this year.*
There is playing to win and then there is playing as if the season ends today, and that is what we often times find ourselves facing on defense.
Can anyone really argue against the notion that the four teams listed above weren't playing their lights out when they played Michigan this year? Indiana's season, for all intents and purposes, is now over. They had hopes for an eight win season, now it's likely they will struggle to reach six. I have a hard time believing Indiana is going to come out anything but flat next week @osu.
Looking ahead, can we take some positive away from this?
As a direct result of our opponents being hyper aggressive against us this far into the season, it inflates our opponents' PPG, YPG, and TOP. Don't get me wrong, what UConn, UMass, and Indiana did was absolutely the correct strategy - but from a Michigan perspective we don't want our opponents to play correctly by the math. We would much rather them settle for 3 or punt the ball back to Denard. All of these things result in less of our defense on the field, less points for the opposition.
And if our big remaining opponents do that we will allow fewer PPG, and this gives our offense a better chance to equalize for the win. PSU, Iowa, Wisconsin, osu - all of these teams could fit the bill as a more "stodgy" and "conservative" Big10 school. (MSU has already shown a preponderance for trick plays and going for it on fourth down).
Ok. Sounds good to me, but I'm still pissed off about our defense!
Fine. Do yourself a favor and only read the offensive UFR and only watch the youtube highlights of Every Offensive Snap. It does wonders for the blood pressure. Understand that our defense performed precisely to expectations today, but so did our offense!
But seriously, in the meantime, take a deep breath. We now have three road victories in as many years. Road games in the Big10 are brutal (PSU lost, Wisconsin lost, OSU/UM/NW all nearly were upset). And for godssakes get excited! It's MSU week!
*In other years, the UM/ND game builds up differently and everyone lets loose, but this particular year it did not set up that way.
[Ed: This week's Mathlete column expands on fourth down decision-making. I haven't seen a graph anywhere near as clear as those included below about how shifting the parameters of the offenses and defenses in question makes major impact on what a correct decision is. This is not a situation where you can just read the decision off a chart. Feel and personal preference will always play a role. It's a complex decision.]
Last week I wrote on the value of special teams but a very interesting side topic arose: fourth down decision making. It started with this chart:
About which I remarked:
The going for it actually peaks between 30 and 35 as more coaches don’t really know what to do so they just go for it.
So I decided to look and see what the decision chart should look like on an expected points basis.
Anything close to two different colors is a virtual toss-up. Any gains near a color transition are negligible and not worth noting, but there are very real gains to be made in the heart of the yellow section, where coaches are taking their offenses off of the field far too quickly.
A couple of quick rules of thumb:
- Don’t punt on the opponent’s side of the field.
- Really consider going for it on 4th down after crossing your own 40.
- Field goals only make sense if there are more than 5 yards to go and you are between the 10 and 30 yard lines. If you’re in opponent territory and these two criteria aren’t true, you should be going for it.
I know this is not the first time a topic like this has been presented, David Romer was mostly criticized for his paper on the topic a couple years back (thanks for the reminder Colin). [Ed: Not around here.] Of course there was the great Patriot debate last season when the Patriots elected to go for it on 4th and 2 with the lead in their own territory. Even though the majority of the arguments against this work amount to "people like David Romer and The Mathlete don’t know anything about football and just live in their parent’s basement" I did want to look at the main objections and see if they had any validity.
Objection 1: Does not account for “quick change” momentum
Below you’ll see a chart of the expected points on a drive based on field position, and how teams have actually fared. I also included drives obtained by turnover as comparison to the other “quick change” drive source.
There could be a case that drives started on a short field due to a 4th down stop generate more points than normal drives, but the small sample size reduces how strongly that argument can be made. From 2007-2009, the total points accounted for on drives obtained by 4th down stops (2523) is less than the projected points would be for any drives starting at the same field position (2580). This difference is meaningless statistically, something very damaging to the idea "momentum" helps the opposing offense after their defense gets a fourth down stop.
Adding in the turnovers does nothing to build a case for momentum after big defensive stops or turnovers. The turnover-started drive line tightly hugs the average line. As a whole, the turnover expected points line is slightly higher than the average line, but only by enough to generate an extra touchdown every 50 drives. That's about one every two years or so.
Although it can often feel like there is a big momentum swing after a big stop or turnover, there is scant evidence that it is more than our memories selecting the most traumatic or exhilarating scenes to hold onto. [Ed: for an example of this human tendency to ascribe meaning to unusual events where there is none, see any of the zillion "hot hand" studies.]
Objection 2: It assumes all offenses and defenses are average
To get a gauge on what “good” can mean in comparison to average, I plotted the best offense and best defense of the last three years against the average team’s expected points per drive.
As a rough approximation, the best offense is about a 1 point per drive better than average and the best defense makes offenses about a point worse per drive.
Scenario 1: Good offense
If your offense is as good as Florida, you should never punt against an average defense. Maybe if you are deep in your own territory, but only in the most extreme situations. This assumes that a new first down gives the Florida offense an extra point over an average team in expected value and a 10 percentage point increase in the likelihood that they convert.
A punt is conceding any chance of scoring and an offense this good should not give up that right so easily. This is the basic philosophy behind the vaunted no punting HS coach in Arkansas. His team isn’t necessary good because he doesn’t punt. He doesn’t punt because his offense is good. Why waste another scoring opportunity?
Scenario 2: Going against a good defense
Playing against a good defense changes the dynamic extensively but it does not mean forgoing the fourth down attempt altogether. With a reduced likelihood of success on 4th down and a reduced payout if the conversion is successful, the 4th down attempt still is an optimal strategy more than is currently utilized. Even against a top national defense, you should still not punt in opponent territory. The field goal becomes a more viable option against the stronger defense and punting becomes a much better idea all the way out to midfield.
[Ed: I think this is moving towards correct strategy since it takes a caveman or a seriously long-yardage situation for someone to punt from inside the opponent's 40 these days. That range from midfield to the opponent 40 is a spot we might see move towards fourth-down aggression in the next few years.
Also note that coventional current strategy gets way less wrong once you ramp up the ability of the defense. If we jacked it up even farther, it might get to the point where punting from the 36 (or even on third down) is a good idea. The flaws in strategy here are leftovers from an era when punting was actually the best option. Thinking has not kept pace with scoring since.]
Scenarios 3/4: Good defense or opponent good offense
The conventional wisdom is that if you trust your defense, you don’t go for it on fourth down. [Ed: In my experience the conventional wisdom is remarkably malleable on this point. If you have a good D and the announcer agrees with the call, the good D will be cited as a reason why.] In reality, the strength of your own defense (or the strength of the opposing offense) is largely irrelevant to the decision. Fourth down decisions are all about offensive opportunity. A 4th down decision to punt is the decision to take the ball out of your offense’s hand, leaving the relative impacts on your defense to negate each other. A 4th down failure puts your defense in a worse situation, but it doesn’t guarantee points for the other team; a good defense is still a major asset in stopping or limiting the other team with good field position. A punt doesn’t guarantee that the other team is going to be stopped, but a good defense makes it more likely. In the end, it’s still all about the offense.
Objection 3: Does not account for game specific situations
This objection does ring true, but its application is much narrower than most people believe. The main flaw with the expected points model is that for most of the game all points are largely equal but at the end of the game, a field goal or even time can become crucially important. If a field goal can tie a game, take the lead, or move said lead from one possession to two (or vice-versa), the decision-making process suggested above can shift radically. This could mean punting near midfield to prevent a short field goal drive for the other team or taking a field goal instead going for it on fourth in field goal range.
These situations are rare, however, and only come into effect in the fourth quarter. When there are likely to be even 2-3 additional possessions, the expected points model still holds up.
Another potential game situation not accounted for above is the presence of a high quality field goal kicker. A very accurate field goal kicker will move the blue field goal “bubble” in the above charts down, making fields more practical in short yardage situations. An above average kicker from long range will move the bubble left. Even a great kicker won’t make kicking inside the 5 practical in very many situations.
Conclusion: In Which Romer Is Re-Iterated
Teams need to be using kickers and punters less and their offenses more. Especially teams with good offenses. If you have a good offense, bringing out the punter should only be done in long distance situations or when deep in your own territory. Scoring touchdowns is the valuable thing in football and giving away a quarter of your plays to kick on fourth down greatly reduces your ability to score them, the gain in field position from a punt is worth less than it is currently perceived to be and the idea that momentum is obtained from a quick change of possession is to be slight at best and most likely non-existent.
One final thought I haven’t been able to quantify yet: if you switch to a fourth down mindset, what opportunities does it open up in play calling during the first three downs of a series. Planning on four plays for a first down instead of three would surely have some value for an offense to adjust and re-optimize their play calling, and the total offensive value could become even greater.
Note: apparently Brian Burke at Advanced NFL Stats and I have been having some of the same offseason thoughts as he just put up another piece on 4th down decision making, and this after we both introduced similar defensive player evaluation metrics within a month of each other.