Recruiting Bias and Accuracy

Submitted by Gopherine on May 6th, 2011 at 3:00 PM

[ED: Moved to the diaries. This obviously took some work beyond the level of a standard post. ZL]

Brian recently commented on the awesome post by UpUpDownDown over at BHGP that analyzed the teams and conferences that are best at developing their recruits into NFL players. 

Part of UUDD’s argument is that player development (and, in particular, playstyle) is a driving factor behind the Big Ten outperforming (and the Big 12 underperforming) expectations with respect to defensive players and offensive lineman.  Brian had an alternative/additional explanation: a combination of recruiting service bias and difficulty in evaluating high school lineman.

I think there may another element at work: scouting services overrating certain sections of the country and underrating others, particularly the Midwest. Rivals (the source of the rankings used) doesn't even have a Midwest analyst. Meanwhile, OL rankings are particularly inaccurate since many high school kids need to put on 50 pounds before they can play in college. The flipside—skill position players more easily projectable—sees a much, much lower spread amongst conferences. The worst-performing conference is the ACC at 94% of expectation; the best is the Big East at 108%. That's a much lower spread than you see in the D and OL numbers, one that looks like an even distribution distorted by a little randomness.

If there was a regional bias in recruiting rankings, hard-to-evaluate OL would be the place it would show up most prominently. I think there is. Your ratings are just wrong when Wisconsin has two four-star linemen in the last five years, as they do on Rivals. They are not evaluating linemen correctly. I'm not sure what Big 12's hole of suck on defense represents but I'd be more convinced it was a playstyle thing if they were running 3-3-5s or something. Going up against Blaine Gabbert and a bunch of other passing spreads doesn't make much difference to anyone but a few linebackers, it seems.

Not content to let our fearless MGoLeader’s assertions hang out there without poking around the data a little bit, I asked Mr. UUDD for his dataset* and set to work determining (1) whether Midwestern recruits are underrated by the recruiting services, and (2) whether offensive lineman are comparatively more difficult to evaluate. 

Specifically, I looked at (1) whether non-5 star Midwestern recruits outperform the “percent drafted” expectations for their star ranking,** suggesting that Midwestern recruits are underrated, and (2) whether the spread is smaller among the “percent drafted” numbers for offensive line recruits relative to all recruits, suggesting that the rankings are relatively less accurate.

Midwestern Recruits Slightly Outperform Expectations

The first piece is that there is a bias by the recruiting services against Midwestern recruits because the services spend relatively less time and resources tracking the Midwest. That bias translates into lower recruiting rankings for Midwest recruits, resulting in underrating of those recruits. Chart:

Midwest Recruits
Recruiting Stars Overall Percent Drafted Midwest Percent Drafted
5 Stars 38.0% 33.3%
4 Stars 16.7% 19.6%
3 Stars 8.1% 9.2%
2 Stars 4.9% 5.6%

Midwestern recruits of the 2-4 star variety slightly outperform draft expectations relative to their peers from other parts of the country.  However, the sample sizes here are way too small to reveal whether or not this difference is significant.

Of course, the chart doesn't disprove my mildly paranoid belief that Midwesterners are consistently being slighted by the jerks on the coasts, so let's call this a win. 

Note that the Midwestern 5 star recruits underperform the mean. This has no impact on the claim (5 star recruits can't be underrated), but it's interesting nonetheless. Really small samples for 5 stars is all the explanation I need. 

Stars Matter Less for Offensive Line Recruits

The second piece is that the big boys are harder to evaluate because they are less prepared for college football than their smaller brethren.  Offensive lineman in particular often need a redshirt and a whole lot of S&C before they can show potential. Thus, recruiting rankings for offensive lineman are less accurate because the evaluation essentially comes down to "he's big and does not apparently soil himself."

OL Recruits
Recruiting Stars Overall Percent Drafted OL Percent Drafted
5 Stars 38.0% 20.6%
4 Stars 16.7% 14.2%
3 Stars 8.1% 7.3%
2 Stars 4.9% 5.0%

Once again, the data is consistent with the claim, but not at statistically significant levels. The spread between the chances of being drafted as a 2 star offensive lineman and a 5 star offensive lineman is much smaller than the spread for all positions. In other words, stars may matter less for the big guys, but we need more recruiting cycles to know for sure.

* Huge, huge thanks to UpUpDownDown for sharing his work. As I found out very quickly trying to replicate the dataset, the data is extremely difficult to cross reference because a lot of recruits have the same name or slightly modified their name during their college career. 

** Note one small wrinkle in the dataset: players that are eligible to declare for the draft, but haven’t, are counted as undrafted.  Thus, a number of players from the recruiting classes of 2008 and 2007 that will eventually be drafted are nonetheless included in the denominator, but not the numerator, in the percent drafted numbers.

Edit: More Fun

In response to comments, the following charts reflect the overall percent drafted for only the 2002-2006 recruiting classes, and the N values for each set. I agree that including '07 and '08 players that haven't declared isn't ideal, but I wanted to be able to compare apples to apples with UUDD's analysis.

2002-2006 Classes
Recruiting Stars Overall Percent Drafted
5 Stars 41.5%
4 Stars 20.4%
3 Stars 10.7%
2 Stars 6.3%


N Values

Recruiting Stars 02-08 Overall 02-08 Midwest 02-08 OL 02-06 Overall
5 Stars 258 42 34 188
4 Stars 2120 311 323 1437
3 Stars 4637 797 770 3211
2 Stars 3859 646 681 2900




May 5th, 2011 at 5:59 PM ^

What's preventing a startup of Scouts 2.0 under a freeware license?  Blogs with twitter following have replaced local news, and is "better" at the college level than ESPN sometimes.  Will we ever see a blog network set up its own recruiting service?  Get some guys like Tim to cover games, we have youtube for film, what else is there?

Can the Midwest just have some freelancing bloggers go to a few football games, hoist up their impressions, and link to some youtube?

Yes, I understand that $$$ is a limiting factor.

Zone Left

May 5th, 2011 at 7:57 PM ^

The only thing stopping anyone from starting a business is effort and money. Creating a really solid network and developing the relative levels of credibility would take some time.

A group of bloggers could certainly start evaluating talent on their own, but it would be really difficult to develop credibility because they wouldn't (at least at first) have access to the kids or any real experience evaluating talent, which are the two things Rivals and Scout provide. Anyone could see that Noel Devine was awesome in high school, but it takes some real skill to tell the difference between a Big 10 level OG and a Big East level OG.


May 5th, 2011 at 7:11 PM ^

I think their is a bias, especially for Rivals.  If you look at their content it seems like majority of their resources are going to the south and California. 

They have an analyst for the state of Florida alone.  I understand that it might be a priority since that state produces lots of talent, but I don't understand not having a guy for the midwest.  I think it is kind of ridiculous for the midwest not to have a dedicated analyst. 

I think Scout does a better job at the midwest, they have guys like Allen Trieu who has lots of knowledge of the area. 

True Blue in CO

May 5th, 2011 at 8:33 PM ^

If MidWest recruits do not get the same amount of quality time in outdoor athletics as Southern schools in high school, does this influence athlete skill levels when they come out of high school? Then they develop more in college and improve their rankings compared to their year round counterparts coming out of the South, then this could be part of the story. Just a thought.


May 6th, 2011 at 12:26 AM ^

Being the massive nerd that i am, I go to CMU(not that cmu), i decided to run a chi squared hypthesis test on your data. This gave me that there is a 72.1% association for the midwest data and a 37.7% association for the oline data. Neither of these number are very good so it further backs up Brian's assumptions


May 6th, 2011 at 9:33 PM ^

yea even though i grew up in Ann Arbor, i decided to follow my dream of playing college football and head to pittsburg. It just turns out that playing college football and getting an engineering degree is hard as fuck so i ended up quitting the team (and i wasn't that good even or D3) so mad prop to omameh.


May 6th, 2011 at 10:39 AM ^

I'd love to see N values on your chart, so I know how exactly what you mean by "small sample size"


** Note one small wrinkle in the dataset: players that are eligible to declare for the draft, but haven’t, are counted as undrafted. Thus, a number of players from the recruiting classes of 2008 and 2007 that will eventually be drafted are nonetheless included in the denominator, but not the numerator, in the percent drafted numbers.

I'm wondering how the numbers would change if you left off all classes that have not made it to their senior years.  (i.e. only look at data regarding recruits from the class of 2006 and earlier).  Obviously none of the members of the recruiting class of 2010 have entered the draft so draft percentages are therefore artificially lowered.  At this point we don't care if someone left early and got drafted or graduated and then got drafted.  Either that or remove all players with elgibility remaining who did not enter the draft.

The numbers just don't jibe with my impressions of what star ratings mean.  For example, 5-star is supposed to mean future first round pick...that is why there are only about 30 5 star recruits every year nationally.  4-star means future NFL player (probably gets drafted), which sets the number of 4 stars available nationally.  So the 5 star draft percentage should be HIGH (like 90%) in my mind.  And 4 star draft percentage should be up around (over?) 50%.


May 6th, 2011 at 2:12 PM ^

Interesting 11% or 1 out of 9 college kids recruited are drafted. I'd be curious to see how many actually make an NFL roster since numerous kids get drafted and don't make the team...and some unstaffed do make the team.

Great analysis regardless. I love how this site nerds out sometimes.


May 6th, 2011 at 3:16 PM ^

To determine significance run a probit with the left-hand side drafted or not, the right hand side would include rankings and a dummy variable for Midwest or not for the first hypothesis., just limited to linemen.  For the second hypothesis do the same except include a dummy for offensive linemen times the rating.


May 6th, 2011 at 4:41 PM ^

The NFL Draft makes no sense at times. UNC, Pitt, Clemson, and USC sent a ton of kids to the NFL this Draft. It seemed like the entire Pitt and UNC defences went pro. Funny, I don't remember them being very good. 2 quarterbacks drafted won the National Championship as well, 1 drafted 1st overall and has never taken a snap under center or called a play more complicated than "32", the other (McElroy) almost went undrafted alltogether and is a brainiac football junkie and probably has the best shot at actually completing passes in an NFL offense of all the qb's taken. Go figure. When do we start keeping track of how many kids are draft busts per conference?


May 6th, 2011 at 5:09 PM ^

Is it possible that there is a lower proportion of offensive linemen drafted (not taking into account star ranking for a minute here)?  I suppose that would infer that more offensive linemen in the NFL are undrafted free agents (I don't think it is the case that there are more lineman per starter position in college than other positions).  That could potentially skew the above analysis some.