[Ed: MCalibur, apparenly an economist found himself collateral damage on today's shotgun blast at "X is stupid" sports economists. Maybe I should have come up with a label like "freakonomists" so as to not implicate people who are just interested in the numbers without the look at me pub. Anyway, here's an excellent diary on what your goals should be on second and third down. Implications for a second and medium are interesting.]
A while back The Mathlete sent out a Thundercat signal for some help shucking data for his database; at least that’s how I remember it. Any un-lame kid of the 80’s knows that when you see the Thundercat Crest you put on your spiked suspenders, pick up your laser shooting panther paw nun chucks, jump into the tank you built singlehandedly, and you roll; that’s all there is to it. I had no choice.
Anyway, we voltroned* our abilities together and came up with something pretty sweet. I have put together my own database, with Mathlete’s help, and can now do some of the same tricks he can. I’ve focused onto BCS-BCS matchups extending the thought of excluding mismatches; Michigan v. Eastern Michigan is still a significant mismatch.
*Oops, wrong cartoon but, then again, you simply cannot over-reference 80’s cartoons/shows. I pity the fool that disagrees. I feel bad for youngins that don’t know the glory of 80’s children’s programming. Also, am I the only one who thinks that Voltron and Zoltan might be related?
When I’m not eliciting unreasonable responses from otherwise reasonable people, I’m usually crunching numbers of some kind as if they were a motley band of mutants and aliens led by a grody and ancient mummy demon priest. Very often the numbers have something to do with football in general and, most often, Michigan football specifically. This time I wondered “how do we know if a play was successful or not?” This question has been asked and answered by some smart people before, but being the curious little twit that I am, I wanted to gauge it on my own.
One way to go about it is Mathlete Style: Expected Points, a good but abstract method. One potential problem with focusing on EP is that doing so can drive you to scoring points where as the real goal is to win. It’s a subtle but important distinction. Depending on the situation, maximizing EP might not be the same as maximizing the probability that you will win. Maybe you would rather not score if doing so means giving Peyton Manning the ball back with 25 seconds left and less than a 1 score deficit. Besides, The Mathlete has this beat covered.
Another method is to use 1st Down Probability, the likelihood that a team will convert a new set of downs given the current down and distance. I think this is more appropriate to the microcosm of a play because the goal of a play is not necessarily to score it is to keep the ball and move it forward, in that order. Scoring is the goal of an entire drive. To calculate 1DP, you do the same thing you would to derive EP, except you keep track of first downs instead of points.
Whenever you have a mountain of data, you need a way to focus your attention on what matters while still maintaining the value of having so much data in the first place. For this study, I’ve filtered on the following criterion:
Exclude plays involving a penalty of any kind.
The game must be close. My arbitrary definition is: all plays in the first and second quarter, third quarter plays where the lead is less than 17, and fourth quarter plays where the lead is less than 10. These values are arbitrary, but there are so many plays available that the sample sizes are still large enough that any additional precision is of negligible value. Also, any unimportant plays are swarmed by a large number of plays that are important, then math deals with the noise.
Results of the play are limited to –10 and +25 yards. The logic here is two fold. On the negative side, the average sack is good for about 6 to 8 yards, anything bigger than that is a fluke play (botched snap for example). On the positive side, most plays aren’t designed to go for huge gains. However, there are instances when an OC calls a play like that in order to exploit an advantage and not necessarily as part of a base strategy. Though relatively infrequent, both types of plays happen with enough regularity that they significantly shift the averages even though they are vastly outnumbered by more typical gains. This filter only excludes about 0.5% of all plays to the negative side and about 5.3% to the positive side.
Each play in the database has been assigned a 0 or 1 depending on whether or not it was part of a first down series, touchdowns are counted as first downs in this survey. Essentially, every play in a four down sequence is counted as a being part of a 1st down unless a punt or turnover occurs before a new set of downs is achieved. Filtering the plays that made the cut (over 105k) by down results in the following scatter plot:
Every point on the chart above has at least 15 samples, most have several hundred, some have several thousand, and 1st and 10 has almost 42,000 samples. The trends are self evident and really, really, strong. A few comments on other decisions I’ve needed to make here:
The small black dots represent 4th down plays. They are essentially overlaid with the 3rd down plays which makes sense, the objectives in both cases is the same, convert to a 1st Down. If you’re in a 4th down decision, use the 3rd down line.
The curves for 1st and 2nd Down were both pegged to 100% probability of converting a new set of downs at zero yards to go; pretty obvious as to why, it’s the rules. On 3rd Down however, I opted not to peg it to y3 = 1 at x = 0 because even though the R-squared value doesn’t suffer by much (0.005 lower), the resulting curve significantly over estimates 3rd down success inside of 3rd and 5. Also, I think the gap could be real; how much error is there in spotting the ball (especially on QB sneak type plays)? To me this data implies that the ball is mis-spotted to deny a 1st Down conversion approximately 9% of the time. The incremental error of spotting the ball doesn't matter until you end up at 4th and inches.
For 1st down plays, I intervened on behalf of noise reduction by only including plays where the distance was in multiples of 5. The reason is that the rules say you start at 1st and 10 and the only way you end up with 1st and something other than a multiple of 5 is A) you’re inside the opponents 10, and B) multiple penalties or 1st down repeats after spot fouls. Plays that were rejected are largely noise; the legitimate plays (ex. 1st and X inside the opp. 10) act like 2nd down plays, so use that in those cases.
Generating Hard Targets
Now that we have a survey, we can use the information to answer the question I asked “what makes a successful play”? The question has been tackled before in the seminal tome The Hidden Game of Football. The DVOA system developed by Football Outsiders is based in concepts discussed in Hidden Game. Hidden Game presents the following goal schedule:
On first down, a play is considered a success if it gains 45 percent of needed yards; on second down, a play needs to gain 60 percent of needed yards; on third or fourth down, only gaining a new first down is considered success.
So, the goal schedule by down should be 4-ish yards on 1st Down10, 3 yards on 2nd and 6, and 3 yards on 3rd and 3. I haven’t read Hidden Game but this doesn’t look right, particularly in short yardage situations. For example, 2nd and 1 is a failure if you do not convert a new set of downs. Sure, the consequences of that failure are small because you are virtually guaranteed another chance to convert but gaining zero yards (we only have whole yard resolution) is failure by definition.
Brian Brown of Advanced NFL Stats fame has a better definition: a play is a success as long as your chances to convert a new set of downs are not hurt by the result of a play. The great thing about this definition is that it considers the opportunity cost of running a play. This simple idea probably explains why a lot of OC’s call conservative plays on 1st and 10, if you don’t advance the ball by about 4 yards, you’re worse off than you started. Brown focuses his work on the NFL and has done this work for the League but he stopped at the first chart leaving the answer to the question abstract-don’t hurt your chances of getting a new set of downs. OK, but how do you avoid that?
Running an optimization routine on our curves gives us the concrete answer, a goal schedule by down and distance in chart form.
3rd down is obvious, you need to gain all of the yards remaining or you’ve failed. Fourth down decisions should be avoided.
The 1st down requirement is virtually flat at a 37% yield, lower than what Hidden Game suggested.
The 2nd down requirement is asymptotic to 65% yield but reaches a requirement of 80% yield by 5 yards to go. Essentially, you need at least 4 yards on 2nd and 5 to not have wasted the down.
One last thing this data allows us to think about is a set of guidelines for when to be aggressive (B, E, aggressive) with the play call. Flee Flicker, anyone? Transcontinental (4:50)? Fumblerooski?
First down is all business, you must move the ball 37% of the way or you’re screwing yourself. Third down is also all business, you need to convert or risk deciding which poison tastes the best. Second down however, depending in the situation, that’s a down you can get jiggy with.
On a generic 1st and 10, there’s a 64% chance of converting a new set of downs. So, as long as you end up with about a 64% chance of converting on 3rd down, you can do whatever you want on second down as long as you don’t lose yards or give the ball away. That means, you need to end up at 3rd and 3 or better. On 2nd and 3 or better call in the B2s and Outkast, baby, ‘cause it’s time to drop bombs (over Baghdad).