good luck with that
[Author note: This thing is long and pretty technical. That said, I think there will be sufficient payoff and value for you the reader. Still, be ye warned.]
Have you ever wished there were a convenient way to rate rushers the same way we rate passers? Sure, passer rating has its weaknesses—all mathematical formulas do—but despite it's issues, I've come to appreciate passer rating as a very useful framework to evaluate a player/team when it comes to passing the ball. In the same way that finding a corner piece to a jigsaw puzzle helps you figure out it's entire quadrant, once you have an idea of what to expect from the passing game you can leap to other touchstones to determine what to expect from the running game. A rusher rating would be just the sort of touchstone needed to really start messing around for those of us who are so inclined. This diary lays out what I think should work for these purposes.
To recap some of my previous work: passer rating combines four important factors—completion percentage, yards per attempt, interception rate, and touchdown rate—and blends them into one number. For rushing stats, important information for coming up with an analogous metric has been hard to come by until cfbstats.com came along. Tons of fascinating and useful data, for free. God bless the internet.
To come up with the rating, I looked only at positions that would be considered normal rushers (QB, RB, TB, FB, HB, SB, WR) that have an average YPC greater than zero. If you can’t meet those criterion, then you cant represent a normal rusher, thus sayeth the me. Other positions register rushing attempts but allowing the odd rush by a punter to color your view of what normal looks like would be dumb. See the chart below for more information. Also, if a guy averages negative YPC, uh, find something else to do, kthx. Other than that, no other filter was applied but some math wonk tricks were and I’ll talk about those as we go.
Completion Percentage → Gain Percentage : Parsed play by play is necessary to generate a replacement for completion percentage. I opted to go for Gain Percentage: the percentage of attempts that resulted in more than zero yards. I figured the basic goal of a pass is to complete it (brilliant insight, I know) and the basic goal on a rush attempt is to gain positive yardage so…any gain of more than zero yards is mission accomplished. This parameter is as much about team skill as it is about player skill but the same can be said for Completion Percentage.
Interception Rate → Fumble Rate: The direct analogue would of course be fumbles lost per attempt but that’s not the right way to do it IMO. The luck factor that influences whether or not the team actually loses possession has nothing to do with the fact that bringing possession into question is a terrible idea. So, all fumbles whether lost or not are counted in the calculation.
There is also a bit of mathematical wonkiness deployed as well. Mike Hart is famous—at least around here—for his deftness at protecting the rock. It was awesome: 991 carries, 5 loose balls, 3 losses of possession. That was an aiight career, but these guys were kinda, sorta, maybe, better (!) at protecting the rock:
|Jacquizz Rodgers||Oregon State||789||1|
OK, so the wonkiness…a lot of people who register meaningful rushing attempts do so at a pretty low level of opportunity. Even stud RBs often split carries with other backs: Eddie Lacy siphoned off carries from Mark Ingram before becoming the man, and T.J Yeldon did the same to Eddie Lacie. So in order for fumbles to make sense for players that get meaningful carries in low doses, we need to consider the question: at which point does a low fumble rate cross the threshold from wait-and-see to holy-crap-check-that-dude-for-stickem?
What we have here is a chart comparing the observed percentage (red dots) and the mathematical probability (blue line) that a player will have at least 1 fumble versus the number of carries he has registered. The red dots are binned in increments of 1 so the sample sizes out past 150 are pretty thin but if bigger bins are used, you’d see a scatter of points that more closely follow the mathematical fit, because… math. The blue line was derived using logistic regression.
The weirdness at zero for the mathematical expectation might be concerning as it suggests that there’s a 20% chance you’ve fumbled despite not having a single carry to your credit. However, that is just an artifact of the data. It is possible to fumble on your one and only carry as actual observations show. What the math does, though, is it considers the sample size of the observations and then finds the best fit possible to the overall dataset. There are ways of dealing with that issue, but…I rather talk about football. Also, KISS. This is good enough for my intended purpose.
Anyway, the point of doing all that is it allows me to apply what I’ll call the Phantom Protocol. Basically, I take that curve, subtract it from 1, and add the resulting value to the player’s fumble total. As the number of carries increases, the effect of the phantom fumble recedes thus leveling the playing field and letting us evaluate players with low sample size as best we can. The result of this bit of data manipulation is that a guy with no fumbles in 16 carries is assigned an average fumble rate and by the time 100 carries are registered, the penalty is not perceivable. Below 16 carries, the assigned penalty is pretty stiff but this trick levels the playing field to let us look at guys with few carries and not just dismiss them with the low sample size red card. Sure, 16 carries is still a low sample but at least the rating self corrects for the fact that fumbles take time to manifest.
Most importantly though, the protocol adequately acknowledges players with low fumble rates even though they have a lot of carries. It’s easier to have a 1% fumble rate after 100 carries than it is to have the same rate after 789 carries. That said, after a while the fumble rates should be allowed to speak for themselves. Quizz Rodgers and Mike Hart need their proper allocation of DAP; nothing more, nothing less. I think the ghost protocol concept accomplishes exactly that.
Touchdown Rate: This one is also directly analogous but here again I’ve deployed the ghost protocol to credit guys with low sample the expectation of an eventual TD. TDs come about much more freely than fumbles do with goal line attempts and the like so this credit vanishes very quickly. But fair is fair: the protocol giveth and it taketh away.
Those are the components directly analogous to the ones used in passer rating and these would be enough to go about the business at hand. However, whereas a passer’s job is to get the ball into the hands of a play maker, players that are given the ball whether by pass of handoff are called upon to be the playmaker. Certainly the scheme, play call RPS, and execution of the supporting cast all have major influence on the results of a play but the ball carrier can do things that elevate the call from good to great. I wanted to be all formal-like and call this the Impact Run Rate but this [stuff] is s’posed to be fun, man. Hence—
Another Dimension: the Dilithium Quotient
The 20 yard threshold is usually referenced as registering a play as a big play. That would certainly qualify as a big play by any standard but that threshold seems to have been established somewhat arbitrarily in my opinion. On average, a generic runner on a generic team in a generic game gains about 4 yards per attempt with a standard deviation of about 7.5. Its called the standard deviation for a reason as a huge swath of observations (about 2/3rds) occur within 1 SD of the mean, or between –3 and +11 (remember: discrete data). The other 1/3 of observations get split evenly with 1/6 below -3 yards and 1/6 above 12. I’ve used objective criterion, you know, math, to define Impact Runs as those that register 12 yards or more. To register one of these the player’s entire team has to execute the play correctly, then the carrier he has to do something special (i.e. juke a dude, break a tackle, be fast). This is the real life manifestation of the Madden Circle Button and its informative. It’s the difference between Barry Sanders and Emmitt Smith.
Denard Robinson was great at this but it might be surprising to hear that he wasn’t the best. Percy Harvin in the spread option was ridiculous in this category. Percy had touched the ball a lot when he was a Gator and 27% of the time, he darted for an impact run. By Contrast, Denard’s DQ% was ‘only’ about 15%. Could you imagine Denard breaking loose almost twice as often? Of course, the scheme, the team’s execution of the scheme, and the player’s deployment within the scheme has a lot to do with this number. Florida circa Percy Harvin was galaxies away from Michigan circa Denard Robinson. Percy Harvin was the 3rd rushing option in Florida’s spread and shred, Denard Robinson was options 1-10. Also, being the QB in the spread-option means you are concern #1 for defenses: the cornerstone. That was triply the case when facing Michigan with Denard in the captain’s chair. Harvin was usually one-on-one with a guy 10 times slower than he was who was also probably pooping his pants.
Denard’s DQ% was pretty stable around 15% (scheme be damned) but his utility rate (723 career carries) was second to none save minor conference QBs. His closest proxy Pat White (684 career carries) broke loose at a 19% clip in RichRod’s Scheme. However, the Big EEEast sans Miami and Virginia Tech wasn’t quite the Big TEEEN. Denard went up against stout defenses way more often than Pat White did and did so without the benefit of Steve Slaton or Noel Devine and the benefit of a revolutionary offensive scheme. When Pat White lost RichRod is DQ% dropped to under 12%, Denard didn’t bat an eye. Everyone *knew* they had to stop Denard and only him on *every play* and they still had their hands full trying to actually do it. The fact that Michigan could never position itself for him to win the Heisman trophy will always be one of my sports fan laments. For ever and ever and ever. He better get a Legends Jersey or I’m qui’in’. I don't care if that’s silly. You’re silly. Where’s my bourbon?
Blending It All Together
Passer Rating was developed such that an average QB would end up with a rating of 100 according to the data set that was used to develop it, which was gathered two maybe three football eras ago when linemen couldn’t really block and scholarship limits weren’t so much. I’m not sure how they went about the process of pinning the rating to average==100 and I don’t have the data to try an replicate the results…so, I kinda, sorta, you know, pulled something outta my [hat]. That is to say: I did what I think is correct or at least valid. I normalized each parameter by it’s par value, summed them together, then forced resulting rating to equal 100. Ultimately the 100 thing is completely arbitrary, but negative numbers are weird, I guess. All said, a rating of 100 means the player was a solid runner but not special, below that you wonder if he should be running at all.
Where in the World is
Carmen San Diego Mario Mendoza
Now that we have a calibrated formula its time to get down to business, application. I calibrated the rating so that 100 was a normal guy, but to figuring out what par should be is a little more complicated. I mentioned earlier that if you cant get to a rating of 100 I don't think you should be a primary running option and I also think we should only look at primary running options to establish our benchmark. But being a primary running option means different things depending on where you’re lining up.
When trying to crack a nut like this I often find that the data itself will help you figure out where to chop it. In the chart below I have plotted Average Rating vs. Amount of Carries. Obviously, the better runner you are, the more carries you should see but runners that are REALLY good are few and far between…this chart shows that dichotomy very nicely. I like to look for population gaps and/or inflection points in a performance curve. Those usually a good places to drop an anchor as far as I’m concerned. When they are near each other it’s a dead giveaway. Based on the data itself I’m using 115 for RBs, 70 for QBs, and 120 for WR as performance benchmarks.
So, this is all well and good but the real test is whether or not things make sense. Here the values for the B1G in 2013:
|Team Name||Player Name||RB Rat||Attempt||Yds/ATT||TD%||FMB%||Gain%||Dillitium%|
This generally looks pretty reasonable to me in terms of an overall ranking as well as a relative ranking. The players/team you’d expect to be at the top and bottom of the list are where they are supposed to be. If anything I’d criticize the Mendoza line at 115 given how we all feel about Michigan’s running game last year. Maybe 115 is just the threshold of suicide and 130 or better is what we fans really want from our teams. But, even this jibes with what I think.
As with passer rating, this rating depends on player skill, surrounding support, and offensive scheme. Toussaint’s YPC and Gain%—components heavily influenced by surrounding support (i.e. the O-Line)—are way under par. So is his Dilitium % which is a skill/talent/speed thing but the dude had a bum knee and he’s not that far off of par there. Makes sense. So, he hit the Mendoza line even though he had bad support in front of him, sorta like Gardner. These numbers make sense to me.
Re: Smith Vs. Green
I mentioned in my last diary that it was interesting to hear grumblings about De'Veon Smith being ahead/competitive with Derrick Green because I think the numbers bear this out. Check this out:
|Player Name||Att||TD||Fum||Gain %||Yds/ATT||TD%||Fum%||DIL%||RB Rat|
These guys played with the same support and in the same system so the differentiators on display here are essentially Skill and Opportunity. Neither Green nor Smith actually registered a fumble but the Ghost Protocol affect Smith’s rating more because he has far fewer carries. Indeed, Smith’s rating is also bolstered by a phantom touchdown, but this effect dissipates faster because TDs occur more frequently. So the math is screwing Smith over here a bit. Meanwhile, Smith’s Gain % and YPC (hitting the right hole at the right time) and DIL% (juking, speed, whatever) were the highest on the team last season. Yep, Small samples yadda yadda. Just sayin’.
Anyway, that's a lot of words and I hope this was worth the read. Of course, I will be referring to this information in future diaries. Thanks for reading and let please provide and criticisms or comments you might have in, uh, the comments section.
Congrats to Fitz!
The Ravens have 7 running backs on the roster although fourth round pick Lorenzo Taliaferro was arrested for public drunkeness and destruction of property last weekend.
The Detroit News has an article today about Michigan's offensive Line. Some interesting comments were from Fitz Toussaint who said he knew the line needed to to develop. He also mentioned that he will continue training around campus for Pro Day so he can continue working with Derrick Green and De 'Veon Smith. He feels he has more he can help teach them. Kudos to Fitz for sticking around.
Kyle Kalis mentions that he suffered a severe sprained ankle in the Minnesota game which was the reason for his reduced time.
There is also a discussion about how the line has gelled. Let's hope it shows Saturday.
Edit: MLive has a more indepth article on Kalis. I'll add it here:
This article has a different story and says Kalis didn't want to sit. He was hurt but didn't want to come out. He sat down with Funk and Borges and was told he was the guy and would be the guy but there were reasons he needed to sit. He took it as a reason to work harder and it paid off.
I love the attitude.
In the aftermath of the CMU game, I’ve seen a few comments about running backs that go something like this: “If you took out X’s long run, his YPC would have only been Y, so he really wasn't that effective,” or variations thereof. This got me thinking a little about the limitations of using YPC to summarize running back performance, so I've put together a couple ways of looking at running back performance against Central.
First off, sample size concerns are rampant. Statisticians frown on many, many things, but they take particular umbrage when you do anything with a really small sample (read: less than 30). But, like our beloved coaches, we live in the real world where we have to make decisions based on incomplete information; so we continue on despite the limitations of the dataset.
Strength of competition is also suspect. We don't know for sure how good CMU will be this year, but we do know they were outscored by fifty points in the only game they've played this year. They may not be great this year.
Yards per carry is calculated by summing all rushing yards for a player and dividing by number of carries, making it an average (or sample mean). A sample mean is a very useful way of summarizing data with one nagging flaw: it is particularly vulnerable to outliers. The median, on the other hand, as the most central value, can be interpreted as a more typical expectation for a dataset. One extremely high or low value will have virtually no impact on the value of the median. Here's an example: Derrick Green's YPC for the CMU game was 6.1, 2 whole yards higher than Toussaint's 4.1. But Green's median carry of 3 is an entire yard shorter than Toussaint's 4. The YPC might lead you to conclude Derrick Green was a better bet for getting yards than Toussaint, but the median says at least 50% of Toussaint's carries went for 4 or more yards in comparison with Green's 3 or more yards. Since If you needed four yards for a first down, you may want to give it to Toussaint. That's potentially valuable information not contained in the YPC. Then there's the pesky fact that TD runs have a maximum length. If we're two yards out from the end zone, that's the maximum the player can get for that carry. This artificially lowers the YPC of a player who gets the ball over the line; in particular Toussaint's YPC would probably have been higher.
The table below contains a few measures of central tendency for the players who had at least 3 carries (three is still too small, but a line had to be drawn somewhere and Rawls' touchdown seemed to merit his inclusion in this list). Rawls gets no standard deviation because three is a small number.
QB Devin Gardner wins the YPC sweepstakes with a blistering 7.4 YPC bolstered by a median carry of 6 yards. I would advocate getting this man some more carries, but that's a) already happening and b) potentially troublesome for our passing game. Regardless, Gardner does a good job here no matter what metric you use: no negative yardage, a great longest run and two touchdowns on only 7 carries. At least for this game, our shiny "more passing-oriented" quarterback was our most effective running back, which speaks a bit to the value of athleticism at that position.
Among the running backs, Toussaint and Green duke it out for maximal effectiveness depending on which measure you use. Green wins on YPC, longest run, and least negative minimum run. Toussaint had a higher median, most touchdowns, and most carries. Rawls has the highest median of the RB's, but since he only had three carries, sample size tells us to pay no heed.
____ Yards and a Cloud of Dust
Hearkening back to the days of Three Yards and a Cloud of Dust (TYaaCoD), I wanted to know who was more reliable if you need three yards every time you rush. The table below contains the percent of carries the player achieved at least three yards, embodying the spirit of slightly-in-jest Schembechlerian Michigan Football.
Personally, though, I find three yards slightly lacking. If you run three yards every rushing play and you rush every play, you end up facing 4th and 1 every series. Our Fearless Leader would still go for it on fourth down every time (Heil Hoke!), but it's not an optimal situation to find yourself in. What you really want is someone who can pick up 3.5 yards or so every play, so you get a new set of downs after every three. The play-by-play is unhelpful in this regard, however, only listing integer values for yards. So I also calculated the Four Yards and a Cloud of Dust (FYaaCoD) metric, which is how the table below is sorted. If you get four yards every carry, you can go on rushing forever.
I did make a slight modification to the success rates of both metrics: I counted a touchdown as a success regardless of how many yards the play was because there is no further to go.
|Row Labels||Total Yds||Carries||TYaaCoD||FYaaCoD|
For TYaaCoD, you would want the following players rushing in order: 1. Green 2. Gardner 3. Rawls 4. Toussaint 5. Smith 6. Johnson. All players are between 50% and 75% successful at getting 3 yards against CMU, which is heartening. Moving to FYaaCoD, you would want 1. Gardner. 2. Rawls 3. Toussaint 4. Green, 5. Johnson 6. Smith.
There's some shuffling when you move to FYaaCoD: Derrick Green drops from first to fourth, and Smith falls to sixth at a slightly disappointing 29% success rate. Rawls still has only three carries, but two of them pass the FYaaCoD test, so he has a terrific success rate of 67%. Almost as good as Devin Gardner, who had over twice as many carries. Devin's ability to scramble is probably for real. Toussaint's actual strength as a running back comes through a bit more on the FYaaCoD metric. On his 14 carries, he hit 4+ yards 57% of the time, and he often surpassed four. That increases the chance of success for future plays, as the distance to the first down marker is smaller.
I thought about running the same analysis with passing yards, but it didn't feel right since yards per catch vary widely based on the play. Your wideout running the deep route will end up with more yards per target than the slot ninja you toss the bubble screens to. That is more schematic than based on individual skill. It is true that running plays are also not all created equal. But every running play starts behind the line of scrimmage and heads as far as possible into enemy space, making comparison a reasonable exercise.
Any statistical summary is just that: a summary. We lose information when we look at average, median, min, max, total yds, TYaaCoD, FYaaCoD, etc. that is available to us in the actual dataset. Our lizard brains just can't process significant amounts of data in numerical form in any reasonably quick fashion. But there is one thing we are great at: reading charts. So I've assembled the information from each rushing effort for everyone with 3+ rushes in order from least yards gained to most. I've colored the touchdowns Highlighter Yellow™ so you can include/exclude them from your mental calculations as needed.
For recent time's sake, Drake Johnson. Fare thee well, 2013 Drake. We hardly knew ye.
A. We were completely misguided to push for Devin-Gardner-to-wide-receiver last year when his natural position is clearly running back. The fact that QB's get an extra blocker has no bearing on this.
B. At this exact moment in time, the staff's decision to go 1. Toussaint 2. Green 3. The Field. is pretty justified. We saw flashes of brilliance from both of them—maybe even more from Green—but Toussaint overall had a better day. If Green sheds a few pounds and picks up just a hair more speed in the process, though—and I think we all expect that to happen— he could become the clear #1 even by mid-October. De'Veon Smith is not yet ready for world-beating, but he did display that vaunted balance. Hold off on judgment on him at this point.
C. Charts are indeed fun to look at.
D. Norfleet had one rushing effort for 38 yds, which I didn't include in this analysis because dividing by zero is difficult and because his YPC would make Brian cry.
Contrary to popular opinion a couple days ago, Brady Hoke said that Derrick Green is now the number 2 back behind Toussaint.
"Fitz (is) No. 1," Hoke said. "Then it'd be Green No. 2, then (Thomas) Rawls and (De'Veon) Smith and then Justice Hayes."
Obviously a lot of this has to do with the injury to Drake Johnson. It is interesting to me that he moved up the depth chart quickly. Sure, he had that great run against Central, but that had more to do with great blocking. Overall, his running seemed pretty "meh" that game.
According to this scuttlebutt (edit: not scuttlebutt but direct quote), Fitz would be the starter tomorrow but the 2 freshmen would be behind the returning RB crew of Rawls, Hayes, and Drake. Since the season starts in 10 days that seems unlikely to change dramatically in a short period of time. Could be a nod to the veteran players but seems the "Green is 2nd in line" talk is premature.
In fact, Hoke said his hierarchy right now would probably feature Toussaint, redshirt freshman Drake Johnson and juniors Thomas Rawls and Justice Hayes -- with Green and fellow freshman De'Veon Smith falling somewhere behind the pack.