Evaluating coaches is a tricky thing. Ultimately it comes down to wins and losses but even comparing one situation to another in the unbalanced world of college football is a tricky proposition. Mike Shula has a higher career winning percentage as a head coach than Brady Hoke. However Hoke has spent all but the last year at non-BCS schools where Shula was at Alabama. School prestige, resources and recruiting all play major roles in team success along with coaching. Many of them often go hand in hand but I think I am finding some ways to parse out different pieces of the puzzle independently. This is my first of hopefully many off-season looks at coaches, and who at excels at what parts of coaching.
To evaluate how coaches develop and evaluate talent I needed a way to separate out better inputs (recruits) from the output (team success and draft placement). Team success is a viable way to look at it and at some point I would like to circle back to compare PAN and recruiting for a comparison, but for today’s exercise I am going to look at recruiting ranking to draft position.
The main challenge with this method is that draft placement is such a lagging indicator from recruiting. Since only some of the 2007 recruits and most from 2008 on have yet to be drafted, I am only looking at recruiting classes from 2002-2006.
I have now been able to add all four recruiting services to my database. Since we are only looking at classes up until 2006, that means just Scout and Rivals for all years except 2006 when ESPN came on board, as well. Recruits are given a number value based on national rank, position rank and stars. Each year has 25,000 points assigned across all players so the early years with fewer players have their individual ceilings a bit higher. Consensus 5 star players are typically 50-60 pts. Generic three stars are in the low teens and below. Anyone without a position rank or less than 3 stars is zero points.
Here is Michigan’s 2012 class for reference.
Evaluating Draft Picks
Because of the much higher value to higher draft picks, the draft pick evaluations are fitted using an exponential formula.
This works out to about 500 for the first pick and then each round is half of the same pick in the previous round (1st pick in second round about 250, 1st pick in the third about 125, etc.). This puts the total points for a 255 player draft at 24,600, almost identical to the total for a year’s worth of recruits.
Players are counted towards the coach that recruited them. This will only be somewhat an evaluation of player development since the coach gets “credit” for the player they recruited even if they leave the next year. I have also restricted the search to coaches with at least 1,000 total recruiting points over the five year period. This is about equal to two top 15 classes or five top 50 classes. This gives us 43 qualifying coaches to review.
First thing I did was look at each coach and how many recruiting points they accumulated versus how many draft points they had.
|Rank||Coach||Recruit Pts||Draft Pts||Ratio|
|41||John L Smith||1,187||273||0.23|
The first thing that jumped out at me was that there seemed to be a strong correlation between total recruit points and total draft points. This is going to be true to some extent, but it seemed that ability for the top schools to load up wasn’t properly accounted for. So I plotted the two versus each other and found a very strong correlation was present.
Since we are looking for more on talent evaluators and developers than MOAR 5 stars, I used the correlation between the two to adjust recruiting points to give a more fair comparison between the lower end and the top end. This allows for a more common evaluation tool between elite programs/recruiters and the rest.
|Rank||Coach||Adj Recruit Pts||Draft Pts||Adj Multiplier|
|35||John L Smith||474||273||0.57|
Now we have something to talk about.
One thing that jumped out at me was that NFL guys did seem to have a bit more success. Maybe their buddies were just doing them favors, but there are a lot more guys with NFL experience at the top than the bottom. Oh, except for the big guy coming in last at #43. Weis’s monster class of 2006 (934 team points, my #7 class of the last 11 years) yielded two 6th round draft picks. His first class which was much less regarded still only yielded a single fourth round draft pick. In the words of our fearless leader, #MissYouBigGuyXOXO.
Lloyd Carr comes in just below average on the adjusted scale. Barry Alvarez checks in at #1 among Big Ten coaches and #2 overall. Wisconsin’s lineman machine is real. The evil genius Nick Saban is #3 based on his last three classes at LSU. Ohio coaches new and old round out the top ten.
Of the nine elite recruiters (3,000 or more adjusted recruiting points) Pete Carroll and Jim Tressell come out on top, with Phillip Fulmer close behind. The bottom three are all southern coaches: Bobby Bowden, Larry Coker and Mark Richt. Bob Stoops, Mack Brown and Lloyd Carr make up the middle third.
Ted Roof takes home the prize for most recruiting prize without a single draft pick with 515 points and nothing to show for it. Top performers who missed the cutoff included Dan Hawkins, Bret Beliema’s first class, Ed Orgeron, Mike Stoops and Greg Schiano.
Many thanks to all who have helped populate the recruit database. We are 25% of the way done.
Still have lots of ideas for future posts including the final post on how to use game theory to maximize success based on the overvalued running back and success rates. If there is interest, I would like to do a retrospective on previous seasons through the eyes of advanced analytics and throw up some of the best WPA graphs of the season. Hopefully I can start with 2003 in the next month. I am open to any ideas you have out there, as well.
If you are on the twitters follow me at @the_mathlete. I am trying to post little snippets that aren’t quite column worthy there. Recently I have tweets about which state’s recruits stay in-state the most (Utah and Arkansas) and least (NY/NJ and Hawaii) and used my recruiting points ranking to list the top 4 Michigan high schools in producing 3* or better talent (Cass Tech, OLSM, Detroit Renaissance & FHH), correctly guessed by @Joshua_Block.
Thanks to all that helped build the coaching database. Now it's time to move on to recruits. I have uploaded all available recruiting sites databases back to 2002 in an effort to connect them to team rosters. Of 16,865 recruits, I have connected most of them to players in the databse. However there are about 2500 players still unconnected. Some of them were academic or legal casualties, some of them were transfers. Most of them are offensive lineman that never showed up on the play by play in the first place.
For those who consider themselves recruiting and or Google ninjas, I can use your help. I have listed the players, school they signed with and year they signed for all the missing entries. Whatever info you can help fill in would be a great help. There are more instructions in the spreadsheet and feel free and contact me with any questions you might have. My email is in the instructions. Thanks again for everyone's help.
Before signing day I took a look at how team recruiting rankings were predictive of future success. I found that good defenses almost always come with good recruits, but on offense great offense often comes without being fully stocked, although it doesn’t hurt.
This week I wanted to look more at the individual level by comparing recruiting rankings to draft success. For most positions college success is going to translate well into future draft status. Michigan might have the biggest exception to that rule in Denard Robinson (although some think he might be a top WR pick). For almost everywhere on the field but rushing quarterback, college success and production are highly correlated to NFL stock. It’s not perfect but it’s a great place to start.
The debate on do recruit rankings matter rages on. Dr. Saturday, may he blog in peace, annually refreshed his look to affirm their accuracy. Rarely do you find anything resembling an analytical take down but from even the best writers on college football can come the anecdotal dismissal. Hopefully those of us who prefer to use data have already won you over and this can be a nice look at some of the ups and downs within the overall success of recruiting rankings. If you’re there yet, hopefully you are after you read this.
The Data Sets
On the recruit side, the pool of players will be the recruiting classes of 2002-2006. All but 2-3 of those players have had their shot to be drafted between the 2005 and the 2011 drafts. I will only be looking at the players who were ranked for their position, as well. This means I have all 4 & 5 stars and the best of the 3 stars. I excluded fullbacks and specialists because the numbers are pretty low and they are mostly all 3 stars or less.
It’s All in How You Word It
There are two key arguments against recruiting rankings. The first is the one used by Bruce Feldman in his recent article on Stanford linked above. It’s the yeah but what about…argument. Ignore recruiting rankings because Stanford is good. Ignore recruiting rankings because JJ Watt is good. There of course exceptions. There are plenty of flameouts and come from nowhere success stories but this is a volume game and the exceptions don’t disprove the rule.
The second argument is the famed failure to divide. Here are two true statements:
If you are drafted, you are more likely to be a three star or less recruit than four or five star.
The more stars you have the more likely you are to be drafted.
The first statement is used by opponents of rankings but isn’t really a relevant statement. The second is the key point. If every single five star was drafted, there would still be six times more three stars and below drafted than five stars. Because four stars and above are so selective they can’t win the quantity game but they dominate the likelihood game. The NFL is full of unheralded recruits but for every five start there are literally hundreds of unheralded recruits playing college football. The pool just starts much bigger.
Tell Me Something I Don’t Know
So at this point we can all agree that recruiting rankings matter, right? If you’ve made it this far you’ve earned a chart.
Percent of Recruits Drafted
|Position*||5 star||4 star||3 star|
*Position based on recruited position, not drafted position
Across all positions, each additional star more than doubles your likelihood of being drafted. It’s not only true in the aggregate but at the position level, as well. There isn’t a single position where a 3 star recruit is more likely to be drafted than a four star. And this is a self-selected group of 3 stars and not the entire pool. In almost every case, a fifth star is another large bump from 4 stars. OLB, OT and WDE are virtually equivalent between 4 and 5 stars. Even a largely college specific position like Dual-Threat QB (RQB) and undefined positions like Athlete show the same trend.
The top positions for 5 star success are Athlete, DT, ILB and Safety at over 60% and the tight end position which was a perfect 4/4 in getting 5 stars drafted.
But getting drafted is only half the story, the other is draft position.
Average Pick For Drafted Players
|Position||5 star||4 star||3 star|
At the position level, the draft spot doesn’t hold up quite as well as the previous chart, but overall there is a strong trend favoring the higher starred players. On average, a drafted five star player will be picked in the middle of the third round, nearly a full round ahead of the average four star player and another 17 picks ahead of ranked three star players.
On twitter on Friday I teased a question about which position did five stars underperform four star counterparts. There is actually a position on each side of the ball. On defense it’s outside linebackers that don’t follow the trend and on offense it’s the tackles.
I think it’s interesting that Rivals has struggled to match top high school talent at position like tackle, outside linebacker and defensive end at the rate they have at other positions. Despite the weakness at these positions, similar positions like guard, inside linebacker and defensive tackle have had their rankings hold up quite well.
Don’t get too hung up on the magic of the fourth or fifth star. They are a nice aggregation but there isn’t going to be much difference between the last five start and the first four star. The bottom line is the higher ranked a recruit is the better they are likely to be, with plenty of exceptions. Positions like tackle, weakside d-end and outside linebacker the difference between a four star and a five is almost negligible. And there are no guarantees. Loading up on top talent gives you the highest likelihood of having team success and successful individuals, but when you get down to the specific player level it becomes a crapshoot. More 5 stars players never hear their names called than ones who do. For four stars it’s still a nearly 4:1 chance against getting drafted.
And now back to our regularly scheduled programming…
Previsouly: Parts 1a, 1b, 1c
I have done a terrible job of branding this series. The idea behind it is that football has changed and coaches haven’t. The game used to be about managing down and distance, putting yourself in a makeable third down, and hoping your defense can win with 17 points. Now offenses are more sophisticated at both running and passing. Third downs that used to be virtually out of reach are still tough but more possible and the upsides of going for bigger chunks of yardage on first and second down have begun to outweigh the risks of longer third downs. This changes how both offensive and defensive coaches need to think and how they allocate resources and personnel. Some pieces are now worth more and others less.
The traditional running game used to be the focal point of this philosophy. The traditional running game is the best football tool for limiting variance on a down by down basis. The quarterbacks job is to hand the ball off, throw a couple of beautiful play action deep balls a game, bail out a third down or two, then feed words like "focused" to the media.
As I spent the last several years combing through nearly ten years of play by play data, I kept coming back to the same question: Why do teams run the ball so much? I parsed the data time after time to try and find something I had missed and I couldn’t find it. Of the top individual PAN seasons among QBs and RBs since 2006, only 3 running backs (Boise St’s Ian Johnson in 2006 and Montee Ball and Trent Richardson this year) cracked the top 100. But PAN doesn’t take into account burning the clock at the end of a game. So I switched to WPA (Win Percent Added) which accounts for the clock. Under WPA rankings, Toby Gerhart in 2009 is the only running back to break into the top 200 seasons. 199 quarterback seasons and only 1 running back season.
Now this isn’t to say that a running game isn’t valuable. Of my ten highest rated offensive seasons noted below only Oklahoma, Hawaii and Houston didn’t feature prominent rushing attacks. In fact of the ten, I would categorize 5 as rushing spreads, 3-4 (Baylor is tough to categorize) as college passing spreads and Wisconsin as a traditional run-first offense.
The running game is alive and well but the traditional running back is harder to justify.
The Wisconsin Case
Montee Ball had an outstanding season and along with Trent Richardson clearly a top 2 back in the country. But was he the most valuable player on his own offense? Here are the traditional numbers for Ball and Russell Wilson
307 att, 1923 yards & 33 TDs rushing (NCAA record 39 overall TD)
225/309, 3175 yards & 33 TD & 4 INT (NCAA record 191.8 pass efficiency)
and the advanced metrics
+6.1 PAN and 0.10 WPA/Game
+11.4 PAN and 0.37 WPA/Game
The Wisconsin offense was a thing of beauty that could have been a national title contender if their –1 defense didn’t lead them to three losses while scoring at least 29 points in each of them.
So who was more responsible, Wilson or Ball? Wilson averaged more yards/play, had almost no turnovers and significantly higher advanced metrics. But let's dig down a bit and compare the two.
Nearly half of all Russell Wilson’s plays (rushes and passes) went for 7 yards or more. Ball had 28% of his plays go for the same distance. For negative plays, they are nearly even with sacks and all Ball without. The area were Montee Ball’s plays went was in the 0-3 yard range, i.e. the manage the down and distance range. This obviously wasn’t a bad season for Ball, it was a great season and he was still dominated by his quarterback in terms of output.
Now this take into consideration down and distance considerations so I put together a similar slide with EV.
Montee Ball had 15% of his plays go for at least a half standard deviation above average. Russell Wilson’s number was twice that at 30% with minimal negative offset.
Looking at a second way, here is there play EV value ranked.
As good as Montee Ball was last year, the offense should have even gone to Wilson, more.
RIP Running Back?
Obviously not as a position but as a premiere position I have a hard time justifying the running back’s historical position as at nearly the same level as the quarterback. Even at their best great running backs at similar value to decent quarterbacks. Two offseasons ago I did a study on returning starters and found that of all positions on the field, returning starts by running backs had the least effect of any position on future team success. Before signing day when I looked at the value of recruiting ranking to future team success, running back recruiting was one of the lowest correlations to future offensive success.
It’s not that running backs can’t be valuable. Montee Ball’s +6 PAN is outstanding. It’s more that a big upside for a running back is rare, hard to predict and is still less than you can get from a quarterback. Of the 29 QB’s and RB’s that were +3 or better last year only five were running backs, the rest were quarterbacks. Running back has become a low marginal production position.
Wrapping This Up Next Week
There is a good argument to be made that Wilson’s success is a byproduct of the attention paid to Ball. It obviously didn’t occur in a vacuum and I have no doubt that Wilson benefited from the attention paid Ball more than vice versa. In next week’s final part of this series we’ll look at how teams can adjust their strategies on both sides of the ball to maximize the new realities.
We now return you to your commitments in progress
I don’t think Success Rate is a misguided stat as much as I think it is a misguided strategy. I think the overall concept of S&P that Bill uses is very sound, I just think the emphasis should be more on the P than the S.
My biggest problem with the stat is that it is black and white. As comments on his article note, a metric that works on a sliding scale would a significant step in the right direction. On 1st and 10 losses and gains of 4 aren’t and shouldn’t all be treated the same. Just as gains of 5 and up are all valuable, just not equally as valuable. For my metric the sliding scale is factored into the expected points at any play. So there is some element of success rate built into PAN, but it is an integrated, sliding scale as opposed to a separate, black and white component.
There are only three things that matter for evaluating a team on a drive, where did you start, how many points did you score and what position did you give the ball back to your defense/special teams. Plays taken to achieve results and time elapsed off of the clock can be valuable in certain situations, but in general those three data points are the key. If we can effectively measure each play in how it contributes to those three key factors at once, why break it up into two pieces and why make it black and white?
Even though there are some differences and I got things off on a bit of the wrong foot, I think there is more in common than different with the two approaches. What I think is the ultimate issue, however, is coaches calling plays with success rate in mind. Advanced NFL Stats did a great article on this very subject (especially the Importance of Run Success Rate section). He found evidence at the NFL level that coaches are coaching to down by down success rate as opposed to drive success rate. Coaches appear to be attempting to win each battle and at times losing sight of the war.
The battle/war concept is what I think is the most interesting of this so you’ll have to wait until part 3 of this series where I’ll look at how strategy can adapt to score more points while risking a bit of short term success rate. Early next week I’ll post part 2, a look at how Wisconsin’s offense runs and how Russell Wilson was really the most dangerous part of the Badger offense.
Warning, this post is meta-stat nerd.
What is Success Rate, and How Did It Come To Be?
The first question is pretty straightforward and the second I can only guess.
Success Rate is a measure is an attempt to measure how good a player or team is at the traditional concept of “staying ahead of the chains.” There are some slightly different calculations but for the most part a success is defined as at least 40-50% of yards to go on 1st down, at least 50-70% of yards to go on second down and first down achievement on third or fourth down. Typically the target is 50% success rate.
Although I doubt there is any recorded history on how this came to be (I believe its origin or at least its popularization comes from Football Outsiders) I have two theories. The first is that this is how football fans, players, and coaches have been conditioned to think, especially old school, grind-it-out football folks. You still hear it often among clichéd commentators: the offense’s number-one priority is to stay ahead of the chains, don’t put yourself in bad down and distance, stay away from obvious passing downs. All of these things are good things for a football to do.
The second reason I think it came to be is that advanced football stats came to be after advanced metrics for baseball had come a long ways. One of the key tenants of Moneyball/SABR revolution in baseball is that On Base Percentage >>> Batting Average. On top of that, one of the fundamental advanced baseball stats is OPS, On Base Percentage Plus Slugging Percent, a combination of Success and Magnitude. One paralleled by Football Outsiders* in their S&P metric.
*I want to be clear that this is not a critique of Football Outsiders. They do tremendous work and are at the forefront of advanced football analysis.
Why Football is Not Baseball
Good OBP is critical for baseball because you are dealing with a finite, irreplaceable resource, outs. You get 27 of them per game. Once you generate an out there is no way to get it back; you are 1 step closer to the end of your chance to score, and you only have 27 total steps per game. OBP measures a team or individual’s ability to forego outs when they come to the plate. Not getting out will always improve your chances of winning while getting an out will almost always decrease your odds of winning (this is not an article about the sacrifice bunt).
Contrast that with football, where the only finite resource is time. Even if the quarterback gets sacked and loses 10 yards, one play later the effect of that loss can be wiped out. In a sense a set of downs is finite, but not an individual set of downs. If there were a team correlation, first downs converted would be more appropriate and I don’t really see a true individual equivalent.
The Goal Is To Score Points
Consistently being in good down and distances is not a bad thing, but it’s not nearly as important for today’s offenses. Modern offenses have a much greater ability to convert unfriendly down and distances than offenses of old. Plus, the offense’s goal is to score points, not get first downs. Getting first downs obviously helps score points, but a metric like EV/PAN that directly accounts for how each play contributes to scoring is a much stronger measure, not just a complimentary stat like Slugging Percent. In baseball the complimentary stat is needed because of the finite nature of outs. In football, everything is a sliding scale and categorizing plays as pass-fail is simply too black and white for a sport that has more gray.
A couple of examples of how success rate can be misleading (first down gain, second down gain, third down gain):
4,3,2: This is a 67% success rate but is a three and out.
3,3,4: This is a 33% success rate but a first down, plus the first two plays are nearly identical but the first two downs of the first group are both successes and the second group are both failures. Over a large group of data some of these will iron themselves out, but why put such a black and white metric over something that is not. 2nd and 7 is almost the same as 2nd and 6, but 2nd and 1 is very different from 2nd and 6. Success rate completely misses the magnitude of plays.
This is why for football, an Expected Value model is much more valuable. With an enough data, you can get a pretty good description of the expected points based on all down, distance and yardline combinations. Once you have this you can evaluate the shades of gray for each play. A three yard carry on first and ten is nearly as good as a four yard one. A nine yard carry is even better. Expected Value can quantify the subtle and substantial differences between plays. The value difference between first and ten and the twenty and first and ten at the thirty will be the same whether it was one ten yard play or three runs totaling ten yards, although the value per play will justifiably be better. Success rates can vary wildly based on how you get from point A to point B, EV only carries where you start and where you finish.
What is Success Rate Good For?
It is an interesting stat and isn’t totally without value, I just think that it is unnecessary and shouldn’t be a fundamental part of team evaluation. There are lots of stats that fit this characterization. For a lot of teams it’s how they mentally operate, especially in the running game. Success rate does a good job evaluating running backs in traditional ground games. It might not totally align with scoring points and winning games, but it does align well with accomplishing a team's offensive objectives. Running backs often get tightly bunched near the mean in an EV model but success rate can be a way to further separate individual backs. Success rate will hold up between the tackle pounders but knock down the home run threat. EV may consider them the same (or more likely the home run threat will be higher) but the consistency of the old school back will be valued better by success rates.
I don’t think success rate has much value for the passing game. Completion percentage and YPA are more than adequate to indicate both explosiveness and consistency.
Coming Next: The Wisconsin Case Study and Optimal Offense and Defense Response
The underlying context of “ignore success rates” is that the traditional running game is overrated. If your main goal as an offense is to avoid bad third downs, and you are good at it, you will likely end up with a lot of third and short or third and manageable. Even if you they are all “good” third downs, each third down is a chance for the defense to take the field. We all remember the classic drives with multiple third down conversions, but we forget all the ones that could jump the odds and failed after giving the defense one too many chances to get off of the field. Explosive plays are essential to a productive modern offense and unless you are running a Chip Kelly or RichRod style ground attack, explosive plays are much more likely through the air than on the ground.
Next week I will follow up with a detailed look on the relative values of Russell Wilson and Montee Ball to Wisconsin’s 2011 offense. Ball had the TDs and the hype and Wilson was considered a quality second option. I’ll dig deep into the numbers and show why Wilson was the real threat of the Wisconsin offense.
Following that, I’ll have the final article in this series looking at how offenses (and maybe moreso defenses) can effectively maximize their expected points for and against through a better perspective on managing offensive output versus managing each down’s success or failure.