Football is not Baseball: Why You Should Ignore Success Rates
Warning, this post is meta-stat nerd.
What is Success Rate, and How Did It Come To Be?
The first question is pretty straightforward and the second I can only guess.
Success Rate is a measure is an attempt to measure how good a player or team is at the traditional concept of “staying ahead of the chains.” There are some slightly different calculations but for the most part a success is defined as at least 40-50% of yards to go on 1st down, at least 50-70% of yards to go on second down and first down achievement on third or fourth down. Typically the target is 50% success rate.
Although I doubt there is any recorded history on how this came to be (I believe its origin or at least its popularization comes from Football Outsiders) I have two theories. The first is that this is how football fans, players, and coaches have been conditioned to think, especially old school, grind-it-out football folks. You still hear it often among clichéd commentators: the offense’s number-one priority is to stay ahead of the chains, don’t put yourself in bad down and distance, stay away from obvious passing downs. All of these things are good things for a football to do.
The second reason I think it came to be is that advanced football stats came to be after advanced metrics for baseball had come a long ways. One of the key tenants of Moneyball/SABR revolution in baseball is that On Base Percentage >>> Batting Average. On top of that, one of the fundamental advanced baseball stats is OPS, On Base Percentage Plus Slugging Percent, a combination of Success and Magnitude. One paralleled by Football Outsiders* in their S&P metric.
*I want to be clear that this is not a critique of Football Outsiders. They do tremendous work and are at the forefront of advanced football analysis.
Why Football is Not Baseball
Good OBP is critical for baseball because you are dealing with a finite, irreplaceable resource, outs. You get 27 of them per game. Once you generate an out there is no way to get it back; you are 1 step closer to the end of your chance to score, and you only have 27 total steps per game. OBP measures a team or individual’s ability to forego outs when they come to the plate. Not getting out will always improve your chances of winning while getting an out will almost always decrease your odds of winning (this is not an article about the sacrifice bunt).
Contrast that with football, where the only finite resource is time. Even if the quarterback gets sacked and loses 10 yards, one play later the effect of that loss can be wiped out. In a sense a set of downs is finite, but not an individual set of downs. If there were a team correlation, first downs converted would be more appropriate and I don’t really see a true individual equivalent.
The Goal Is To Score Points
Consistently being in good down and distances is not a bad thing, but it’s not nearly as important for today’s offenses. Modern offenses have a much greater ability to convert unfriendly down and distances than offenses of old. Plus, the offense’s goal is to score points, not get first downs. Getting first downs obviously helps score points, but a metric like EV/PAN that directly accounts for how each play contributes to scoring is a much stronger measure, not just a complimentary stat like Slugging Percent. In baseball the complimentary stat is needed because of the finite nature of outs. In football, everything is a sliding scale and categorizing plays as pass-fail is simply too black and white for a sport that has more gray.
A couple of examples of how success rate can be misleading (first down gain, second down gain, third down gain):
4,3,2: This is a 67% success rate but is a three and out.
3,3,4: This is a 33% success rate but a first down, plus the first two plays are nearly identical but the first two downs of the first group are both successes and the second group are both failures. Over a large group of data some of these will iron themselves out, but why put such a black and white metric over something that is not. 2nd and 7 is almost the same as 2nd and 6, but 2nd and 1 is very different from 2nd and 6. Success rate completely misses the magnitude of plays.
This is why for football, an Expected Value model is much more valuable. With an enough data, you can get a pretty good description of the expected points based on all down, distance and yardline combinations. Once you have this you can evaluate the shades of gray for each play. A three yard carry on first and ten is nearly as good as a four yard one. A nine yard carry is even better. Expected Value can quantify the subtle and substantial differences between plays. The value difference between first and ten and the twenty and first and ten at the thirty will be the same whether it was one ten yard play or three runs totaling ten yards, although the value per play will justifiably be better. Success rates can vary wildly based on how you get from point A to point B, EV only carries where you start and where you finish.
What is Success Rate Good For?
It is an interesting stat and isn’t totally without value, I just think that it is unnecessary and shouldn’t be a fundamental part of team evaluation. There are lots of stats that fit this characterization. For a lot of teams it’s how they mentally operate, especially in the running game. Success rate does a good job evaluating running backs in traditional ground games. It might not totally align with scoring points and winning games, but it does align well with accomplishing a team's offensive objectives. Running backs often get tightly bunched near the mean in an EV model but success rate can be a way to further separate individual backs. Success rate will hold up between the tackle pounders but knock down the home run threat. EV may consider them the same (or more likely the home run threat will be higher) but the consistency of the old school back will be valued better by success rates.
I don’t think success rate has much value for the passing game. Completion percentage and YPA are more than adequate to indicate both explosiveness and consistency.
Coming Next: The Wisconsin Case Study and Optimal Offense and Defense Response
The underlying context of “ignore success rates” is that the traditional running game is overrated. If your main goal as an offense is to avoid bad third downs, and you are good at it, you will likely end up with a lot of third and short or third and manageable. Even if you they are all “good” third downs, each third down is a chance for the defense to take the field. We all remember the classic drives with multiple third down conversions, but we forget all the ones that could jump the odds and failed after giving the defense one too many chances to get off of the field. Explosive plays are essential to a productive modern offense and unless you are running a Chip Kelly or RichRod style ground attack, explosive plays are much more likely through the air than on the ground.
Next week I will follow up with a detailed look on the relative values of Russell Wilson and Montee Ball to Wisconsin’s 2011 offense. Ball had the TDs and the hype and Wilson was considered a quality second option. I’ll dig deep into the numbers and show why Wilson was the real threat of the Wisconsin offense.
Following that, I’ll have the final article in this series looking at how offenses (and maybe moreso defenses) can effectively maximize their expected points for and against through a better perspective on managing offensive output versus managing each down’s success or failure.
For a "meta stat" post, this is very light on data. Instead of debating the merits of EV vs SR, why not just show some data on which one is more predictive of wins?
I'm looking forward to the case study.
upvoted as soon as I saw meta: stat nerd :-D
Mathlete, if I had 1/10 as much talent as Pat Stansik I would write you your own love song!
Look forward to the followup posts. Need Moar Mathlete posts.
I agree that the "staying ahead of the chains" adage is like "establish the ground game", "win between the tackles", or any number of meaningless announcer-isms. Staying ahead of the chains is only meaningful based on your offensive strategy. If you're playing Three Yards and a Cloud of Dust, then you're expecting 3 plays of 3 or so yards to move the ball. If you're playing Mike Leach's Air Raid, you're expecting 1 play of 10 yards. West Coast/Spread/Run and Shoot fall in between, say, 2 plays of 5 yards to keep moving. I guess it's a way to try and quantify offensive efficiency, but not much else.
Imagine an offense where all you do is throw Hail Mary's. Your success rate would be terrible, but your effectiveness would probably be reasonable (complete 2 or 3 a game, and you've got a shot).
Looking forward to some numbers.
Actually, your argument seems to suggest football is EXACTLY like baseball.
Football's success rate is just like batting average - a useful, but limited data, because it only measures Success. What you need is OPS something that measures both Success and Magnitude.
I would think and augmented Success Rate with Yards Gained would do it.
hit or out, success or no. There is no partial or extra credit. It's more like W/L on a smaller scale: adequate for a quick glance, but not as descriptive as a "weighted" stat (which is not as descriptive as a set of stats).
If you're looking at FO stats for more detail, at the "individual" level (because nothing, as of yet, is actually individual), then it would be DYAR or DVOA, depending on whether you want counting or rate. At the team level, you'd have DVOA.
And that's still only one way of looking at things. There are other sites that do similar work, and as with other sports, you'll get the best picture by looking at them collectively and matching that with your own observations.
... when establishing their numbers, afaik. [That is, if everyone scores on first-and-goal, then scoring isn't really a big play over average, but failure is a big play under average. And so forth.]
The biggest criticism I have of the FO methods is that while they claim to be descriptive and predictive, there are numerous examples where they cannot be descriptive of a single game - most notably long returns and other big plays. In brief, FO has found two things about 40+ yard plays. First, past a certain point it's more about where you started than what you did e.g. Denard going for 45 against tOSU; he would have scored from 90, too, so a 90-yard run shouldn't be twice as much as a 45-yard run. Second, long returns in particular are non-repeatable events e.g. James Harrison returning an interception 100 yards in the Super Bowl; if he picks that ball 100 times, how many does he get all the way to the end zone? Discounting this play improves future prediction accuracy at the cost of present-game descriptiveness.
A descriptive methodology would assign that play (or Brandon Herron's TD returns) a much higher value (perhaps as much as 9 points - 3 for taking away the almost-certain field goal and 6 for the TD itself), and likely conclude it was the single biggest point swing in the game (even over the TD catch at the end to win it).
I too look forward to the numbers.
FO based their work on "The Hidden Game of Football." Per the FO Almanac, on 1st down you need to gain 40% of yards for another 1st down; on 2nd down 60% of remaining yards; and on 3rd or 4th down 100%.
They're not only rating offense, but skill players. Beyond success rate, they also have DVOA.
I don't know where someone brought this up, but I have bookmarked this Advanced NFL Stats post on First Down Probability, that says that FO is too low on required 1st & 2nd down yards.
If the probability of getting another 1st down is 66%, for 2nd down to be the same, you need to get to 2nd and 5.5 yds, which is .5 yds more than FO (at minimal needed gain). And you need to get to 3rd and 1.5 yds, which is .9 yds more.
It would be ineresting to see a re-evaluation of 1st down probability to see if it's changed. Also I wonder how college differs from NFL.
Of course, if you're more interested in the probability of an offense scoring points, then these differences are moot.