I will read this shortly, but I'm trying to find some what to make the title shorter now. Current Title will soon be the sub-title.
here's one vote for "John Beilein's head in a Futurama jar"
Before beginning, let me state that (to the best of my knowledge) the statistical calculations that all of these analyzes use are absolutely correct. However, further review and some additional analysis reveals there are very good reasons coaches do not (and should not) always use the statistical percentages to make decisions.
Synopsis: Using probability and statistics, several different analyzes have concluded that football teams should Go For It far more often on 4th down (even when they are in their own end of the field). However, virtually no football coach (at the college or pro level) even comes close to following this scientific decision-making criteria. So, the obvious question is, "Why don't coaches make decisions during a game based on the statistical percentages?" There are three shortcomings of using the statistical analysis:
Therefore, to compensate for these shortcomings, decision thresholds that are less precise and significantly higher should be used.
Revised Decision Threshold Table: The analyzes that have been documented to date have not included a decision table showing the magnitude of the net advantage used to make recommendations. Here is a decision table that shows the magnitude of the net advantage and uses less precise and significantly higher thresholds. While this significantly reduces when a team should Go For It on 4th down, the results are more credible, more likely to be followed in actual games, and more likely to benefit the offense.
In the table, ONLY the 10 events in the larger blue font in the light blue shaded cells (the upper right hand corner of the table) are recommended to Go For It on 4th down. They are:
The recommendation is to Go For It under these criteria when the score is relatively close and time is not a factor. Specific game conditions (such as weather conditions, earlier possessions during the game, current score, time remaining, confidence in having a play that will get the first down, etc.) may provide insight to modify the recommendations. I stopped at the opponent's 30 yard line because inside the 30 yard line, field goals become reasonable for most college football teams. In college football, teams have significantly different field goal success rates and any analysis should be based on that team's anticipated field goal success rate.
Comments on the Decision Table: Key points to consider when reviewing the decision table are:
An Example of An Advantage of 1.1 Expected Points Per 4th Down Attempt:
Since it is obviously impossible to score 1.1 points, what does an advantage of 1.1 expected points mean? Well, it does not mean you are going to be successful and make the 1st down. Even if you make the 1st down, it does not mean you are going to score on this possession or that you will score before your opponent. It only means that over a large number of similar possessions over several games the net points you score divided by the number of 4th down attempts will equal 1.1 (each specific attempt may result in: turnover on downs, a punt, a FG, a missed FG, a TD, a subsequent turnover, etc.). Here is one possible scenario of what actual game results could look like with 4th down and 1 at your own 40 yard line:
This is, deliberately, a very simple scenario. It will likely take many more attempts than just 4 to approach the expected points in the table and there are many other possible scoring combinations. However, it does provide examples of how expected points translate to actual game results. Items to note in this example:
This example also illustrates the significant risks involved in the decision: if the game ends prior to the 4th attempt, the offense is at a net disadvantage of 3 points and the result may be that you lose the game (even though the expected point analysis does not directly state this as even a possibility).
Development of the Proposed Decision Table: The decision table was derived from two basic sources: Football Outsiders Figure 1: Offensive Efficiency From Field Position and the Mathlete Never Punt With Denard? Fourth Down Strategy Revisited. I used the FO table (shown below) to calculate Expected Offensive Points by field position and the Mathlete's table for 4th down conversion rates. Note that many of the data points I used are not the exact numbers from these two sources. I smoothed the actual data to eliminate some minor anomalies (BTW, this does not affect the results – it is just a pet peeve with me).
The results in the decision table appear to be reasonable based on a comparison to these two sources as well as the Advanced NFL Stats When Should We Go For It On 4th Down? and David Romer, "Do Firms Maximize? Evidence From Professional Football", 2005. I did not use dynamic programming but used what I believe is a reasonable approximation. If anyone has a decision table with different or more accurate numbers, it would be great to compare and contrast the results. The decision table consists of:
Column 1: Yards To Go
Column 2: 4th Down Success Rate. This is based on The Mathlete's data for an average college football team. This introduces the first potential for a significant margin of error. Even if this was the actual success rate for a specific team over the first 10 games of the current season, does anyone believe it is the exact success rate for the 11th game? Of course not. But this does not completely invalidate the analysis. It does mean the decision should use less precise with higher thresholds.
Yellow Row At Bottom of Table: Starting Field Position (on the 4th down play).
Light Blue Row At Bottom of Table: Expected Points Per Offensive Possession (from that field position)
Orange Row At Bottom of Table: Probability of Scoring (7 Points) This is (Expected Offensive Points / 7) and provided as a reference only – not used in the calculations.
Columns 3-9: Expected Offensive Points (EP) Per 4th Down Attempt of Decision. This is based on the probability of making the first down, the starting field position, the expected offensive points from each specific field position, and the average net punting distance. The decision table provides the net expected offensive points per 4th down attempt of Going For It versus Punting. A positive number indicates a net advantage for the offense and a negative number indicates a net advantage to the opponent. I'll use a team's own 40 yard line with 4th down and 1 as the example.
EP of Decision To Go For It = EP (Make It) + EP (Fail)
Expected Offensive Points of Making It on 4th Down Is straightforward:
EP (Make It) = Probability of Making It X Expected Points At This Field Position = 72% X 2.2 = 1.58
Expected Points of Failing to Make It on 4th Down is obviously negative but also a bit tricky. You are still going to give the opponent the ball if you decide to punt rather than Go For It. So, the opponent would have some expected points anyway but based on a different field position. Therefore, I use the NET Expected Points in the calculation:
Net Expected Points = Expected Points After Failure To Convert – Expected Points After Punt
Expected Points After Failure To Convert = (3.2) Points (they are now on your 40 not their own 40)
Field Position After Punt = their own 20 yard line
Expected Points After Punt = (1.4) Points (they are now on their own 20)
Net Expected Points = (3.2) – (1.4) = (1.8) Points
EP (Fail) = Probability of Failing To Make It On 4th Down X NET Expected Points = 28% X (1.8) = (0.50)
EP of Decision To Go For It = EP (Make It) + EP (Fail) = 1.58 + (0.50) = 1.08
Background: The folks at Football Outsiders analyze college football using two systems (FEI and S&P+), Advanced NFL Stats provides analysis of pro football, the Mathlete has his analysis, and I am sure there are several others. The claim to fame for most of these systems is that a computer can take advantage of a statistical analysis of huge amounts of data: "nearly 20,000 possessions every season in FBS college football" or "every play of all 800+ of a season's FBS college football games (140,000 plays)", etc. A computer analysis is required because the human brain is simply incapable of processing this amount of data.
In addition to the primary result of ranking college football teams, these systems provide other analysis such as Never Punt With Denard? Fourth Down Strategy Revisited, the success rate of scoring in college football from every starting position on the field, or When Should We Go For It On 4th Down?
Here is the FO table that I used to calculate Expected Offensive Points by field position.
The Statistical Analyzes Set Decision Thresholds That Are Too Precise and Too Low: The decision threshold for all of these analyzes appear to have been set at breakeven (+0.00). This ignores the inherent margin of error and assumes a coach should take significant risks even when the rewards are essentially zero. (One example is the recommendation that teams should Go For It on 4th and 1 from their own 15 yard line!) The result is a loss of credibility in the analysis and a reluctance to believe and/or follow any of the recommendations. Here are three examples of the recommendations of when to Go For It on 4th down. The first is from the Mathlete:
The second is from Advanced NFL Stats:
The third is from the seminal investigation of the choice in football between kicking and trying for a first down on fourth down, David Romer, "Do Firms Maximize? Evidence From Professional Football", 2005.
Notice that all three of these analyzes recommend that a team should Go For It on 4th and 1 (Mathlete) or even 4th and 2 (Advanced NFL Stats and Romer) from your own 10-13 yard line! The reason? All three of these use the very precise and very low criteria that any value above 0.00 is an advantage to the offense and, therefore, warrants going for it on 4th down. This would be analogous to ticketing everyone that is going 0.01 miles over the speed limit – technically correct but impractical in the real world. IMO, anyone presented with the recommendation, "Go For It with 4th and 1 yard every time on your own 13 yard line" would be in disbelief and would dismiss any and all other recommendations from the same analysis.
In addition, the end result of these decisions is that "This evidence suggests that a rough estimate of the potential gains from going for it more often on fourth downs over the whole game is …an increase of about 2.1 percentage points in the probability of winning." (David Romer, "Do Firms Maximize? Evidence From Professional Football" 2005, Page 28). With a 12 game college football season, this corresponds to just one additional win every four seasons! Thus, you would expect a coach to Go For It on 4th down in hundreds of different scenarios (depending on field position, yards to go, expected conversion rates, expected net punting distance, expected field goal distance, game circumstances, etc.) on the prediction that every 4 years the team will win one extra game.
Using statistical analysis to make decisions about a very few events within a football game is mathematical fallacy (especially at the precision proposed): It is somewhat ironic that the advantages gained through the statistical analysis of tens of thousands (or hundreds of thousands) of data points is, in fact, why the results are not, can not, and should not be used to make decisions during a football game. In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
The law of averages is a term used to express a belief that outcomes of a random event will "even out" within a small sample. As invoked in everyday life, the "law" usually reflects bad statistics or wishful thinking rather than any mathematical principle. While the law of of large numbers does reflect that a random variable will reflect its underlying probability over a very large sample, the law of averages typically assumes that unnatural short-term "balance" must occur.
The 4th down analysis relies on the law of averages and not the law of large numbers. Decisions based on the law of averages (also called the gambler's fallacy) are a recipe for failure.
One of the critical inputs of 4th down analysis is the conversion rate that is anticipated on 4th down for various yards to go. Because there are so few 4th down attempts, all of the analyzes use 3rd down conversion rates instead. Let's assume that the anticipated conversion rate on 4th and 1 yard to go is 75% and that this is based on actual data from thousands of 3rd down and 1 attempts. The theory of large numbers predicts that, over a large number of 4th and 1 attempts, a team should expect that 3 out of every 4 attempts of 4th and 1 will be successful. Unfortunately, there are likely to be only a very few 4th and 1 attempts in a single football game – often less than a total of 4. If you have just one 4th down and 1, it is either successful or unsuccessful. If successful, the result will be better for the offense than the expected value in the analysis (hooray!). If unsuccessful, the result may be that you lose the game even though the expected value analysis does not directly state this as a possibility (it may be stated in a footnote – it may not).
The application of expected value to make decisions in football is problematic. The concept of expected value originated in the 17th century, was defined explicitly in 1814 by Pierre-Simon Laplace, and is used extensively in probability and statistics. However, the expected value is only a theoretical value and may be unlikely or even impossible (such as having 2.5 children). Expected value is difficult to reconcile in football since the only possible outcome of a possession is 0, 3, or 7 points (ignoring safeties, missed PATs, or 2 points PATs). It is difficult to make decisions based upon "a net advantage of 1.1 expected points per 4th down attempt".
I will read this shortly, but I'm trying to find some what to make the title shorter now. Current Title will soon be the sub-title.
Why Coaches Shouldn't Make Decisions based on Statistics
Sorry if I just un-did any edits you made in the last few minutes -- I had it open for awhile doing my copyediting.
The one thing I would add is that of practicality in today's non-statistical-understanding environment. Over a small sample, if you are the coach who went for it when it was absolutely the right thing to do statistically, and because of small sample you ended up losing two more games in four years rather than winning one, you're probably going to be "that crazy coach who was fired for doing stupid shit like going for it on 4th down all the time."
I was being too generous. I was thinking that at least an individual team (coach) would approach the expected points over several games/years. But, as you point out, it may be that one team is on the positive end of the results over several years while another team is on the negative end of results.
Overall, the two teams in combination are close to the expected values but individually one coach gets fired and the other gets a contract extension.
i have read this before its a good idea but last year when good ole RR went for the fake punt it didnt go our way so i think its best not to go for it in our own end
I'm going to continue with my tried and true way of analyzing these kinds of things.....
.....let the coaches make the decisions. I'll then say, "yeah" or "nyet."
End of analysis. A lot quicker too.
something like it. The way my far less statistically-based manner of cognition suggested all this to me was: Yes, odds are good--but you ARE going to base this decision on game conditions (where you ARE on the field, score, even team psychology at this point in game), personnel, and statistical tendencies to date, aren't you???
That said, I think that boldness often pays in football--even pig-headed boldness (viz. Dantonio and Miles). Even when teams come to expect trick plays they don't know WHAT tricks. And a first on fourth can really change momentum. Maybe once our personnel and mastery of system are a bit more sound. . .
GodFather of the spread or no, I think that RichRod has a little bit of Lloyd in him, unfortunately. The psychology tends to be--"let's punt and regroup. . . "
Even if spellcheck allows "analyzes" because that happens to be the present tense singular verb, and even if the plural of analysis is, for some reason, pronounced "analy-seize". (That is to say, I can see how this would have slipped through, but it was a little distracting through the first 10 paragraphs of an epic amount of... analysis?)
This is a seriously impressive piece of work. Nice job.
Strict obedience to the expected value numbers would only garner you (on average!) one more win every four years?! Wow. Any idea what the standard deviation in number of wins over 4 years would be? E.g., how likely is it that sticking with expected value results would cost you two wins in the same span of time?
There's another, more practical, nonmathematical reason not to strictly follow expected results tables: very few people in the general public are statistically literate. Remember last year's Patriots-Colts game? Belichick made what was, statistically, the right call on a 4th down, but they didn't make it and lost the game. How many sports announcers did you hear criticizing the call, and how many defended it? Since most of the population is not particularly sports-savvy, and therefore depends the media "experts" to explain these things, how good for your job security is it if you're the coach and you make the "right" decision statistically but it doesn't work out? Belichick has enough rings to get away with it, but not every coach has that luxury. More often than not, a coach's job depends more on the opinions of the statistically uneducated. That alone is reason to be more cautious than the statistics recommend.
And the colors in the graphs are pretty.
I agree with you that a pure EV-based analysis will lead to some regrettable decisions. I also think you're on the right track when introducing the cost of failure. However, I think your dismissal of the conclusion in the Mathlete's and others' work and the credit you give to the current conservativism among coaches is exaggerated.
I'm pretty sure that by using down, distance, score, some factor for quality of defensive vs offensive strength, quality of kicking/punting game and the time left in the game, you can create an IPhone/Android app that can give you the optimal decision every time. I'm pretty sure that you wouldn't be going for it on 4th and 1 inside your own 20 in the first quarter up 3, but you would definitely see an increase in risk taking if you sold this.
Price it at $3 and tons of football fans at all levels would buy it to see if their coaches are making the right 4th down decisions. Sooner or later the coaches would catch on and buy it, too :)
I hope I did not leave the impression that I was advocating that coaches never go for it on 4th down or that I was "dismissing" the work that the Mathlete and others have done. I believe all the previous work was excellent.
My proposal was just that a higher threshold should be applied to their analysis. The higher threshold basically skews the decision to increase the rewards and decrease the risks.
Hopefully Rich Rod won't have to worry about 4th downs all together. I know the 3rd down conversion % has dipped past two games, but hopefully a health Denard will get that back up
I find it interesting that you conclude that it is inherently preferable to raise the decision- making threshold in favor of a more conventional, conservative strategy. One could equally make the case that you could lower the threshold, since the boundary cases don't much matter one way or another.
The deciding factor seems mostly based on perceptual, psychological considerations. Yes, bucking the conventional wisdom is likely to open a coach up to more criticism and second-guessing. Perhaps instances of failed, unconventional strategy could demoralize a team and embolden the enemy. On the other hand, one might postulate that a highly aggresive, unpredictable strategist could befuddle opposing coaches and intimidate opposing players: "Those guys are crazy, unrelenting offensive maniacs!" Going on fourth down might inspire greater confidence in an offensive unit, since they know the coach trusts them to get the job done in situations where other coaches would surrender. Anyway, it's just an alternative theory; one we will probably never learn the validity of, since very few coaches seem willing to risk their jobs defying the conventional wisdom.
The ultimate decision analysis would be based on win probability. This would require a large, multidimensional matrix of variables including distance, field position, score, and time remaining. The question would be: does the weighted probability of outcomes increase or decrease the likelihood of victory? A difficult problem, but doable. Even so, psychological impacts of strategy decisions are a real-world consideration and can never be accurately factored into a definitive mathematical analysis. The only thing for certain is that second-guessing is eternal and great fun at that.