Thanks for doing this. I have had the exact same thought about averaging the services, but never followed through. I definitely agree, ESPN can be left out, for a number of reasons. They are eccentric, to say the least. You've given me some interesting reading here . . .
REVISED "Veto-Based Aggregate Recruiting Rankings"
Note: I edited this since my original post to better utilize the ESPN data. Apologies if this makes some of the comments confusing.
Never having contributed anything of value to this site, I thought I’d take a shot at combining the Scout, Rivals, 247, and ESPN player rankings into one. The goal is to come up with a straightforward way to compare elite recruits’ status with the ranking services (i.e. without forcing people to juggle rankings and star ratings from four different sites in their heads).
Aggregating across the sites is not easy, partly because of data availability and especially because of the different methods used by different sites. There are countless ways to do this, with most requiring some kind of data imputation. Since no data imputation strategy would be liked by all, I’m proposing a different method that requires no imputation. Let’s call it the Veto-Based Aggregate Recruiting (V-BAR?) Rankings. (Crappy name & acronym? Check.)
The basic idea is that we restrict the rankings to players who appear in every site’s top X list (i.e. no one was unimpressed with the recruit) and then order them based on their average rankings across the sites. It’s “veto-based” because any site can prevent a player from appearing on the aggregated list.
I see two primary objections to this:
(1) It eliminates highly regarded prospects when only one outlier site is unsold on them.
(2) It gives excessive veto power to ESPN (and eliminates a lot of players) just because the ESPN 150 only ranks 150 prospects while the other services rank 247-300.
First, for (1). This is just the design of this ranking. Basically, to get on this list, there is consensus that you’re outstanding, and the average rankings tease apart just how highly you’re regarded. Interestingly, for each recruiting service, one guy stands out, rankings-wise, as a glaring omission. For Rivals, it’s Sheldon Day (#65 to Scout, #80 to 247, #144 to ESPN). For Scout, it’s Amos Leggett (#104 to Rivals, #75 to 247, #95 to ESPN). For 247, it’s Avery Young (#38 to Rivals, #13 to Scout, #119 to ESPN). For ESPN, it’s Zeke Pike (#72 to Rivals, #33 to Scout, #18 to 247). In general, though, there aren't too many really serious outliers.
Now for (2). Giving ESPN excessive veto power seems problematic, especially since ESPN’s rankings are often questioned for their quality and objectivity. Therefore, in addition to using the ESPN 150, I grabbed the next 150 players from ESPN’s recruiting rankings (link: http://espn.go.com/college-football/recruiting/prospects). So we have rankings for 300 players from ESPN.
By my count, 147 players appear in Rivals’ top 250, Scout’s top 300, 247’s top 247, and ESPN’s top 300. Here they are in order of average ranking across these four sites:
VETO-BASED AGGREGATE RECRUITING RANKINGS (as of 6/19/11)
(incorporates Rivals, Scout, 247, and ESPN)
|3||Mario Edwards||6.25||DE||Florida State|
|28||Dante Fowler||41.75||DE||Florida State|
|39||Jarron Jones||49.75||DT||Penn State|
|40||Trey Williams||50.5||RB||Texas A&M|
|44||Ronald Darby||57.5||CB||Notre Dame|
|45||Kyle Kalis||57.75||OT||Ohio State|
|49||Chris Casher||61.5||DE||Florida State|
|60||Tee Shepard||83.5||CB||Notre Dame|
|67||Brock Stadnik||91.25||OT||South Carolina|
|72||Mario Pender||98||RB||Florida State|
|80||Se'von Pittman||109.25||DE||Michigan State|
|86||Kendall Sanders||115||CB||Oklahoma State|
|90||Michael Starts||116.5||OG||Texas Tech|
|93||Angelo Jean-Louis||119||WR||Miami (FL)|
|102||Matt Davis||131.25||QB||Texas A&M|
|103||Brionte Dunn||131.75||RB||Ohio State|
|105||P.J. Williams||134.5||S||Florida State|
|109||Bralon Addison||137||WR||Texas A&M|
|132||J.J. Denman||173.5||OT||Penn State|
|134||John Michael McGee||173.75||C|
|138||Camren Williams||185.25||OLB||Penn State|
|143||Joshua Perry||194.75||OLB||Ohio State|
|145||Dalvon Stuckey||207.25||DT||Florida State|
Nice work, now lets assign 5-star status to the top 20 or so for now for a 'universal 5 star'
Very nice work and I'd love to see this method become our primary metric on this site.
If you wanted to find a way of accounting for ESPN's limited rankings without just totally ignoring them, you could make your "veto-based" approach a double elimination. For example, average the top 3 scores for any player from the 4 sites. If any player is missing 2 scores, then they're excluded. This would resulting in the 'universal average' being slightly above their actual average if they happened to be ranked by all four sites, but it'd be fair for all players and ranking sites.
Great work. I think it would be smart to throw out the top and bottom rankings of the 4 sites. I see far too many examples of ESPN tailoring their high rankings to Under Armour All-American game (on ESPN) participants just so they can brag about the players that join that one. This would hopefully remove some noise involved with the rankings.
I don't think removing half the data set will improve the validity or the usefulness of the information.
The only time that would hurt is if the rankings for that player were scattered like buck shot. Given the choice between removing a datapoint or two and leaving players out all together, I would think that the former would be preferred.
Another option is just to remove the lowest score. Everyone's score decreases and players not ranked in one poll don't get penalized.
Hey, thanks for the suggestion.
It looks like there's another, more conservative way to handle ESPN. Their website suggests that they actually rank guys well beyond 150; they just don't make a big deal of it. Here's the link: http://espn.go.com/college-football/recruiting/prospects/_/class/2012
When the procrastination bug bites next (soon, I'm sure), I'll extend ESPN's rankings to a top 300 and then compile a single list with all four sites (Rivals, Scout, 247, and ESPN).
More generally, thanks for the comments, everyone.
You could also try weighting the different sites as well. Maybe give Rivals and Scout 30% each of a players score and 20% each to ESPN and 247. This way you don't completely remove the opinions of other sites, but you also decrease their relevance if you feel they are biased or misguided.
I'm with you - and thought about doing that - but I'm trying to avoid subjectivity to the greatest extent possible.
For someone who just starting following recruiting this really makes sense. Leaving out ESPN seems better since the other sites go more in-depth with the numbers (amount of players). I wish there was a way to include the watch list for ESPN since they had more players to add to the list. Not sure how many players were on that list but maybe all those players would get a standard 100 score so the total group could be more inclusive of the Rivals Scout & 247 players.
This should go under 'useful stuff' if you ask me.
I'd revise the methodology to give an unranked guy a rank of, say, 350. That way it wouldn't be a total veto if 3 of the 4 sites love a guy.
Another thing worth doing would be giving an average star ranking (e.g., 3.25)
Your suggestion to use a rank of 350 gets at my imputed data worry. Basically, no matter which number you choose, it'd be pretty arbitrary and inaccurate. For example, what if a site believes that a prospect isn't worthy of the top 1000 but that's not clear from the site? We probably wouldn't want to drop that information.
Also, just to be clear, even if every site somehow ranked every player immediately outside of its top X (e.g., Rivals had Sheldon Day at #251), there couldn't be any changes to the top 75. There would be some movement toward the bottom, I'm sure, but it's not clear that the rankings would improve enough to justify the subjective calls required. Part of my underlying rationale is that it says something extra about a prospect if all of the sites included him on their lists.
Its totally arbitrary, but I just don't like giving any one site complete veto power. So an arbitrary low rank like that would mitigate the damage. Your method is more clear, I admit, but also less accurate IMO. It doesn't have to be subjective - just give everyone the same number for being unranked...
But yeah, for the top 50-75 guys probably its not going to have much effect.
"For example, what if a site believes that a prospect isn't worthy of the top 1000 but that's not clear from the site? We probably wouldn't want to drop that information."
You're already dropping that information. Or. to put another way, you're not attempting to make any judgement one way or the other about their being in the top 1000 or not, or if he's barely mising the cut.
Regardles, I think its OK, even if the site thinks he's not worth being ranked in the top 1000, to consider that as an outlier, if the other 3 sites say he is top 150. Basically, you'd be capping one site's influence to not completely override the others. I think that's very relevant.
By using a number like 350 you'd just be making a consistent assumption. The bigger issue is an example where a guy just missed. Say he's 250th on 247's list. 350 would underrank him, but its still an improvement over pulling him out completely.
and plugged in some numbers. For the 4 guys mentioned as glaring omissions from 1 of 4, I gave them the next ranking (so Day would be #251 to Rivals, Leggett = Scout's 301, Young = 247's #248, and Pike is ESPN's #301). Here's where they would shake out:
- Young - average 104.5 - overall rank #77
- Pike - average 106 - overall rank #78 (79 I guess if you include Young, but I'll ignore that)
- Day - average 135 - overall rank #106
- Leggett - average 143.75 - overall rank #113
So, no big changes at the top, just 2 in the top 100
Thanks for checking this. Even this overstates the impact of excluding those guys, since these sites probably didn't have each of them just outside of their top X. With Pike, for example, ESPN has him at #422 by my count. (The other sites don't let you come up with such a precise ranking.)
Mat's points are reasonable, though. I still don't agree with imputing a 350 for everyone, but there's some additional information on those sites that would allow for more precision. Rivals uses its up-to-6.1 rating, Scout has a relatively small number of 4 stars outside of its top 300, 247 has its up-to-100 rating, and ESPN just flat ranks everyone. A better way to impute data would be to find a guy's secondary rating (e.g., his 1-to-100 rating on 247) and then give him the average ranking for guys with that same rating. For example, if the guys rated as 90 on 247 are ranked between #250 and 300, I'd impute a ranking of 275. (Does that make sense? Lots of numbers in there.) This would be time-consuming, so I'm not sure that I'll get to it, but I might take a look later this week/weekend.
In order to make this a manageable process, you still need to define whom to rank and whom to omit. I'm open to suggestions but thinking of ranking guys who show up in at least three of the following:
-- Rivals' top 250
-- 247's top 247
-- ESPN's top 300
-- Scout's top 300
My semi-blind guess is that you'd get about 250 guys on that list.
Pretty damn sweet. Michigan isn't doing too bad by this metric.
I also like the idea of giving a player unranked on one site or another a ranking of 350 for that site. This would give a clearer picture of the apparent value of a team's class as a whole.
Lets just call this the official top 150 recruits for 2012. Thanks for taking the time to make this list, good stuff. GO BLUE
from the computer scores in the BCS. Choose a certain threshold for how many players get ranked and counted (in BCS, it's the top 25 per poll) - for this, perhaps top 150, or if you can order the ESPN data, top 200 maybe. Then assign values such that the #1 person in each service gets max points (150 or 200 or whatever you pick), #2 gets max minus 1, etc. Then average the 4 scores and divide by max points (this gives a value from 0 to 1, a percent). Highest average value is at the top of the list. Sort by average, then rank from 1 to how ever many unique players there are across the services. This way, all players can be included, they'll just have a value of 0 from one or more services, which will drag down their average but not keep them out of the list entirely.
As noted, don't copy the BCS where the lowest and highest are dropped out as there aren't enough data sources to do that.
BTW, a bit OT, but here's where the computers and BCS put Michigan last year (took a lot a manual data entry to dig up these lower-than-top-25-places):
8 (after Iowa loss): 28(1 pt, dropped, so 0) 31
9 (after bye week): 28(0 pts) 26
10 (after Penn State loss): 28(0 pts) 36
11 (after Illinois win): 29(0 pts) 32
12 (after Purdue win): 29(1 pt, dropped, so 0) 27
13 (after Wisconsin loss): 29(0 pts) 45
14 (after OSU loss): gone from BCS ranking
15 (bye): still gone (duh)
At risk of sounding dumb.........won't that method give the exact same ranks though just in percentage form?
I guess a percentage could be pleasing to the senses for some but personally I like it better seeing the average rank between the four sites.
Edit: I forgot to say thank you turd ferguson for your delightful post and your hilarious name.
Weight recruits that aren't ranked by one aspect with a low weighting, removing the "veto" aspect. Not sure that's a good thing.
As someone who is very skeptical of ESPN's rankings, I'd still like to have the original list that didn't even use it.
Great job though... I actually think this could be used as a universal metric (for this site, at least).
somebody did this, we've all been chasing recruit numbers from the 4 sites for months now. Whether you drop the outlier high score or outlier low score I'm glad you did this. If you have the time try running both sets vetoing high and low to see if it makes much of a difference. This is something I'd think ESPN would be doing if they weren't too busy screwing up the averages even more with their own ratings.