[Author note: This thing is long and pretty technical. That said, I think there will be sufficient payoff and value for you the reader. Still, be ye warned.]
Have you ever wished there were a convenient way to rate rushers the same way we rate passers? Sure, passer rating has its weaknesses—all mathematical formulas do—but despite it's issues, I've come to appreciate passer rating as a very useful framework to evaluate a player/team when it comes to passing the ball. In the same way that finding a corner piece to a jigsaw puzzle helps you figure out it's entire quadrant, once you have an idea of what to expect from the passing game you can leap to other touchstones to determine what to expect from the running game. A rusher rating would be just the sort of touchstone needed to really start messing around for those of us who are so inclined. This diary lays out what I think should work for these purposes.
To recap some of my previous work: passer rating combines four important factors—completion percentage, yards per attempt, interception rate, and touchdown rate—and blends them into one number. For rushing stats, important information for coming up with an analogous metric has been hard to come by until cfbstats.com came along. Tons of fascinating and useful data, for free. God bless the internet.
To come up with the rating, I looked only at positions that would be considered normal rushers (QB, RB, TB, FB, HB, SB, WR) that have an average YPC greater than zero. If you can’t meet those criterion, then you cant represent a normal rusher, thus sayeth the me. Other positions register rushing attempts but allowing the odd rush by a punter to color your view of what normal looks like would be dumb. See the chart below for more information. Also, if a guy averages negative YPC, uh, find something else to do, kthx. Other than that, no other filter was applied but some math wonk tricks were and I’ll talk about those as we go.
Completion Percentage → Gain Percentage : Parsed play by play is necessary to generate a replacement for completion percentage. I opted to go for Gain Percentage: the percentage of attempts that resulted in more than zero yards. I figured the basic goal of a pass is to complete it (brilliant insight, I know) and the basic goal on a rush attempt is to gain positive yardage so…any gain of more than zero yards is mission accomplished. This parameter is as much about team skill as it is about player skill but the same can be said for Completion Percentage.
Yards Per Attempt: Direct analogue.
Interception Rate → Fumble Rate: The direct analogue would of course be fumbles lost per attempt but that’s not the right way to do it IMO. The luck factor that influences whether or not the team actually loses possession has nothing to do with the fact that bringing possession into question is a terrible idea. So, all fumbles whether lost or not are counted in the calculation.
There is also a bit of mathematical wonkiness deployed as well. Mike Hart is famous—at least around here—for his deftness at protecting the rock. It was awesome: 991 carries, 5 loose balls, 3 losses of possession. That was an aiight career, but these guys were kinda, sorta, maybe, better (!) at protecting the rock:
OK, so the wonkiness…a lot of people who register meaningful rushing attempts do so at a pretty low level of opportunity. Even stud RBs often split carries with other backs: Eddie Lacy siphoned off carries from Mark Ingram before becoming the man, and T.J Yeldon did the same to Eddie Lacie. So in order for fumbles to make sense for players that get meaningful carries in low doses, we need to consider the question: at which point does a low fumble rate cross the threshold from wait-and-see to holy-crap-check-that-dude-for-stickem?
What we have here is a chart comparing the observed percentage (red dots) and the mathematical probability (blue line) that a player will have at least 1 fumble versus the number of carries he has registered. The red dots are binned in increments of 1 so the sample sizes out past 150 are pretty thin but if bigger bins are used, you’d see a scatter of points that more closely follow the mathematical fit, because… math. The blue line was derived using logistic regression.
The weirdness at zero for the mathematical expectation might be concerning as it suggests that there’s a 20% chance you’ve fumbled despite not having a single carry to your credit. However, that is just an artifact of the data. It is possible to fumble on your one and only carry as actual observations show. What the math does, though, is it considers the sample size of the observations and then finds the best fit possible to the overall dataset. There are ways of dealing with that issue, but…I rather talk about football. Also, KISS. This is good enough for my intended purpose.
Anyway, the point of doing all that is it allows me to apply what I’ll call the Phantom Protocol. Basically, I take that curve, subtract it from 1, and add the resulting value to the player’s fumble total. As the number of carries increases, the effect of the phantom fumble recedes thus leveling the playing field and letting us evaluate players with low sample size as best we can. The result of this bit of data manipulation is that a guy with no fumbles in 16 carries is assigned an average fumble rate and by the time 100 carries are registered, the penalty is not perceivable. Below 16 carries, the assigned penalty is pretty stiff but this trick levels the playing field to let us look at guys with few carries and not just dismiss them with the low sample size red card. Sure, 16 carries is still a low sample but at least the rating self corrects for the fact that fumbles take time to manifest.
Most importantly though, the protocol adequately acknowledges players with low fumble rates even though they have a lot of carries. It’s easier to have a 1% fumble rate after 100 carries than it is to have the same rate after 789 carries. That said, after a while the fumble rates should be allowed to speak for themselves. Quizz Rodgers and Mike Hart need their proper allocation of DAP; nothing more, nothing less. I think the ghost protocol concept accomplishes exactly that.
Touchdown Rate: This one is also directly analogous but here again I’ve deployed the ghost protocol to credit guys with low sample the expectation of an eventual TD. TDs come about much more freely than fumbles do with goal line attempts and the like so this credit vanishes very quickly. But fair is fair: the protocol giveth and it taketh away.
Those are the components directly analogous to the ones used in passer rating and these would be enough to go about the business at hand. However, whereas a passer’s job is to get the ball into the hands of a play maker, players that are given the ball whether by pass of handoff are called upon to be the playmaker. Certainly the scheme, play call RPS, and execution of the supporting cast all have major influence on the results of a play but the ball carrier can do things that elevate the call from good to great. I wanted to be all formal-like and call this the Impact Run Rate but this [stuff] is s’posed to be fun, man. Hence—
Another Dimension: the Dilithium Quotient
The 20 yard threshold is usually referenced as registering a play as a big play. That would certainly qualify as a big play by any standard but that threshold seems to have been established somewhat arbitrarily in my opinion. On average, a generic runner on a generic team in a generic game gains about 4 yards per attempt with a standard deviation of about 7.5. Its called the standard deviation for a reason as a huge swath of observations (about 2/3rds) occur within 1 SD of the mean, or between –3 and +11 (remember: discrete data). The other 1/3 of observations get split evenly with 1/6 below -3 yards and 1/6 above 12. I’ve used objective criterion, you know, math, to define Impact Runs as those that register 12 yards or more. To register one of these the player’s entire team has to execute the play correctly, then the carrier he has to do something special (i.e. juke a dude, break a tackle, be fast). This is the real life manifestation of the Madden Circle Button and its informative. It’s the difference between Barry Sanders and Emmitt Smith.
Denard Robinson was great at this but it might be surprising to hear that he wasn’t the best. Percy Harvin in the spread option was ridiculous in this category. Percy had touched the ball a lot when he was a Gator and 27% of the time, he darted for an impact run. By Contrast, Denard’s DQ% was ‘only’ about 15%. Could you imagine Denard breaking loose almost twice as often? Of course, the scheme, the team’s execution of the scheme, and the player’s deployment within the scheme has a lot to do with this number. Florida circa Percy Harvin was galaxies away from Michigan circa Denard Robinson. Percy Harvin was the 3rd rushing option in Florida’s spread and shred, Denard Robinson was options 1-10. Also, being the QB in the spread-option means you are concern #1 for defenses: the cornerstone. That was triply the case when facing Michigan with Denard in the captain’s chair. Harvin was usually one-on-one with a guy 10 times slower than he was who was also probably pooping his pants.
Denard’s DQ% was pretty stable around 15% (scheme be damned) but his utility rate (723 career carries) was second to none save minor conference QBs. His closest proxy Pat White (684 career carries) broke loose at a 19% clip in RichRod’s Scheme. However, the Big EEEast sans Miami and Virginia Tech wasn’t quite the Big TEEEN. Denard went up against stout defenses way more often than Pat White did and did so without the benefit of Steve Slaton or Noel Devine and the benefit of a revolutionary offensive scheme. When Pat White lost RichRod is DQ% dropped to under 12%, Denard didn’t bat an eye. Everyone *knew* they had to stop Denard and only him on *every play* and they still had their hands full trying to actually do it. The fact that Michigan could never position itself for him to win the Heisman trophy will always be one of my sports fan laments. For ever and ever and ever. He better get a Legends Jersey or I’m qui’in’. I don't care if that’s silly. You’re silly. Where’s my bourbon?
Blending It All Together
Passer Rating was developed such that an average QB would end up with a rating of 100 according to the data set that was used to develop it, which was gathered two maybe three football eras ago when linemen couldn’t really block and scholarship limits weren’t so much. I’m not sure how they went about the process of pinning the rating to average==100 and I don’t have the data to try an replicate the results…so, I kinda, sorta, you know, pulled something outta my [hat]. That is to say: I did what I think is correct or at least valid. I normalized each parameter by it’s par value, summed them together, then forced resulting rating to equal 100. Ultimately the 100 thing is completely arbitrary, but negative numbers are weird, I guess. All said, a rating of 100 means the player was a solid runner but not special, below that you wonder if he should be running at all.
Where in the World is
Carmen San Diego Mario Mendoza
Now that we have a calibrated formula its time to get down to business, application. I calibrated the rating so that 100 was a normal guy, but to figuring out what par should be is a little more complicated. I mentioned earlier that if you cant get to a rating of 100 I don't think you should be a primary running option and I also think we should only look at primary running options to establish our benchmark. But being a primary running option means different things depending on where you’re lining up.
When trying to crack a nut like this I often find that the data itself will help you figure out where to chop it. In the chart below I have plotted Average Rating vs. Amount of Carries. Obviously, the better runner you are, the more carries you should see but runners that are REALLY good are few and far between…this chart shows that dichotomy very nicely. I like to look for population gaps and/or inflection points in a performance curve. Those usually a good places to drop an anchor as far as I’m concerned. When they are near each other it’s a dead giveaway. Based on the data itself I’m using 115 for RBs, 70 for QBs, and 120 for WR as performance benchmarks.
So, this is all well and good but the real test is whether or not things make sense. Here the values for the B1G in 2013:
This generally looks pretty reasonable to me in terms of an overall ranking as well as a relative ranking. The players/team you’d expect to be at the top and bottom of the list are where they are supposed to be. If anything I’d criticize the Mendoza line at 115 given how we all feel about Michigan’s running game last year. Maybe 115 is just the threshold of suicide and 130 or better is what we fans really want from our teams. But, even this jibes with what I think.
As with passer rating, this rating depends on player skill, surrounding support, and offensive scheme. Toussaint’s YPC and Gain%—components heavily influenced by surrounding support (i.e. the O-Line)—are way under par. So is his Dilitium % which is a skill/talent/speed thing but the dude had a bum knee and he’s not that far off of par there. Makes sense. So, he hit the Mendoza line even though he had bad support in front of him, sorta like Gardner. These numbers make sense to me.
Re: Smith Vs. Green
I mentioned in my last diary that it was interesting to hear grumblings about De'Veon Smith being ahead/competitive with Derrick Green because I think the numbers bear this out. Check this out:
These guys played with the same support and in the same system so the differentiators on display here are essentially Skill and Opportunity. Neither Green nor Smith actually registered a fumble but the Ghost Protocol affect Smith’s rating more because he has far fewer carries. Indeed, Smith’s rating is also bolstered by a phantom touchdown, but this effect dissipates faster because TDs occur more frequently. So the math is screwing Smith over here a bit. Meanwhile, Smith’s Gain % and YPC (hitting the right hole at the right time) and DIL% (juking, speed, whatever) were the highest on the team last season. Yep, Small samples yadda yadda. Just sayin’.
Anyway, that's a lot of words and I hope this was worth the read. Of course, I will be referring to this information in future diaries. Thanks for reading and let please provide and criticisms or comments you might have in, uh, the comments section.