Interesting to see how it all stacks up. Maybe for future graphs their positions could be included?
I think you will get your wish.
I’m sorry to post again, but I think the improvements are significant enough – thanks to some intelligent feedback – to warrant a new posting.
Below is my attempt to aggregate the Rivals, Scout, 247, and ESPN rankings into a universal list. The goal is to draw from all of the data available to create a single list that eliminates the need to juggle rankings, ratings, and stars from four different sites when comparing prospects.
First, though, I’ll describe the logic and process.
There are countless ways to do this, and none of them is perfect. Importantly, even though I’m a Michigan fan, I never considered how this would look for Michigan before deciding how to do it. I’m trying to make this as objective and sensible as possible given time and data constraints.
The first decision one has to make is whom to include. In my first draft, I included only those who appeared in the top X lists for all four sites. Others thought that requirement was too rigid, so I’ve relaxed it here. The players appearing on this list appear in at least three of the four following lists: Rivals’ top 250, Scout’s top 300, 247’s top 247, and ESPN’s top 300. This eliminates the “veto power” nature of the first rankings (and the related outlier worries), since two sites would have to leave out a prospect for him to be excluded.
The next decision is how to rank those who make it. The most straightforward way to do this is to take the average ranking for each prospect across the four sites. In an ideal world, each site would rank every prospect so there would be no missing data. That isn’t reality. Therefore, I imputed rankings where they were missing. Here’s how I did in for each site (this is boring if you aren't interested):
I hope that makes sense, and I’m happy to answer questions in the comments. Please feel free to share feedback or point out errors.
Also, if one of these sites significantly changes its rankings in the next few days I’m going to kill someone.
|3||Mario Edwards||DE||6.25||2||8||14||1||Florida State|
|28||Dante Fowler||DE||41.75||11||39||43||74||Florida State|
|39||Jarron Jones||DT||49.75||67||14||21||97||Penn State|
|40||Trey Williams||RB||50.5||24||20||24||134||Texas A&M|
|44||Ronald Darby||CB||57.5||64||32||64||70||Notre Dame|
|49||Chris Casher||DE||61.5||83||57||96||10||Florida State|
|60||Tee Shepard||CB||83.5||51||49||145||89||Notre Dame|
|67||Brock Stadnik||OT||91.25||165||69||72||59||South Carolina|
|72||Mario Pender||RB||98||53||88||204||47||Florida State|
|80||Se'von Pittman||DE||109.25||61||79||196||101||Michigan State|
|87||Kendall Sanders||CB||115||54||94||142||170||Oklahoma State|
|91||Michael Starts||OG||116.5||148||70||119||129||Texas Tech|
|94||Angelo Jean-Louis||WR||119||113||223||88||52||Miami (FL)|
|103||Matt Davis||QB||131.25||144||38||187||156||Texas A&M|
|104||Brionte Dunn||RB||131.75||124||28||154||221||Ohio State-ish|
|107||P.J. Williams||S||134.5||173||127||124||114||Florida State|
|111||Bralon Addison||WR||137||155||119||137||137||Texas A&M|
|117||Jelani Hamilton||DE||145||79||62||92||347*||Miami (FL)|
|134||Reginald Davis||WR||169.25||214||120||261*||82||Texas Tech|
|135||Amos Leggett||CB||169.5||104||404*||75||95||Miami (FL)|
|142||J.J. Denman||OT||173.5||242||111||181||160||Penn State|
|144||John Michael McGee||C||173.75||82||169||210||234|
|153||Camren Williams||OLB||185.25||243||138||116||244||Penn State|
|161||Joshua Perry||OLB||194.75||131||231||235||182||Ohio State|
|164||Raphael Kirby||OLB||199||126||233||287*||150||Miami (FL)|
|169||Dalvon Stuckey||DT||207.25||207||171||184||267||Florida State|
|186||Warren Ball||RB||230||212||52||206||450*||Ohio State|
|199||Trevor Knight||QB||264.75||228||274||261*||296||Texas A&M|
|216||Deontay Greenberry||WR||331.5||115||244||209||758*||Notre Dame|
|217||Michael Richardson||DE||371.25||756*||294||226||209||Texas A&M|
A final note about ESPN
Several commenters in my previous diary expressed that they’d like to see these rankings without ESPN. I don’t think there’s enough reason or evidence to dismiss ESPN entirely. However, for those who are interested, here’s how some recruits would rank among the above prospects if ESPN were excluded: Kalis (23), Washington (55), Magnuson (57), Ross (76), Dunn (79), Diamond (82), Richardson (104), Jenkins-Stone (106), Stanford (123), Burbridge (131), Pipkins (169), Strobel (181), Wormley (183), Bolden (201). Of course, the list of prospects included would change if ESPN were ignored altogether.
Interesting to see how it all stacks up. Maybe for future graphs their positions could be included?
ask and thou shall receive...
Good work, Turd. I would greatly appreciate that breakdown as well. If I were choosing a favorite poo, you'd be in my top three, behind Mr. Hankey and in front of Bono (he can't be #2 again).
Winnie is an obvious OSU cooler pooper, what with his scarlet shirts and redneck temperment, and him & all his autograph-for-honey schemes will come out soon enough.
Really nice. Only recommendation I might have which I've seen others use when doing this is to not use the actual overall rank for kids outside the top whatever for that site. Reasoning behind that is just that you've already ensured these guys are in 3 of the 4 rankings so only one of the sites is going to be an outlier, you could chalk that up to just poor scouting by that site.
For example, Kwontie Moore is 218th but has overall ranks of 97, 116, 266 and 1054(!). He's getting dragged way down because of that last number but since that one seems to be the outlier, it could be discarded.
Since Scout ranks the most with 300, I'd recommend that anyone with a rank higher then that just be scored at 301 for that site. It would largely eliminate that outlying score which isn't really being considered anyway since they need to appear on 3 of the 4 lists.
Just my 2 cents. Love this though. Eliminates most of the biases across different sites.
This is one of the many ways to impute data. In truth, though, I don't like it as much.
With a small number of data points (four), I think you have to be careful about dumping data. That's especially true if some of these sites are looking to the others for guidance (meaning that we really wouldn't have four independent data points).
More generally, I think it's important information that Site X really dislikes a certain recruit. If you give everyone a 301 or 350 or something, you lose all of that information. My preference is to incorporate everything you have, doing the best job you can with available data to pinpoint a ranking. Kwontie Moore is particularly unusual because Scout thinks so little of him that he didn't make their MLB rankings at all. My view is that this is relevant information (especially b/c Scout feels so strongly about that that it's willing to be an outlier). Other views might be reasonable, too.
Fair enough. I figured it might be worthwhile since you have that 3 out of 4 qualifier. Other people just throw everyone in a jumble even if they're 300th on Scout and unranked by everyone else.
hammered by ESPN. Reeves would be well within the top 150 but ESPN gave him a 700 something which is near twice his other 3 scores combined. Like the Soviet judge at the Olympics.
Edit:Kieth Marshall has the 2nd biggest outlier after Zeke Pike. The Rivals 55 rating more than triples his average from 6 to 18.25, whereas Zekes ESPN outlier makes his average 3.262 times greater than the other 3 services consensus.
It's strange to see where they actually agree on anybody. It feels like Scout and 247 rate guys as Sophomores and leave it that way for a while, and Rivals and ESPN rate kids as Juniors, but they all change down the road. Some of the outliers are just funny. Magnuson 34th to Rivals, 169th to ESPN. TRich 31st to 247, 195th to Rivals. The top 15 are funny to see just how much they disagree. Reminds me of the Keystone Cops in their clumsy dilligence. Pipkins numbers are going to skyrocket his senior year, lol.
Nice work. This is probably the best way to get a sense of how a recruiting class stacks up.
One note, as much as I wish it weren't true, Pittman did commit to MSU.
Appreciate the time that you must have put in to do this. This gives me lots of new stuff to ponder. Heaven forbid that there is a hair that goes unsplit or a pebble that remains unturned . . .
PS - love the Bri"onte Dunn commitment area - "Ohio Statish" . . . .
Question - how does Marshall have rankings of 5, 6, 7, and 55?! That's one serious outlier.
Also, it might be nice to bold or italicize recruits of interest. I'm lazy, and I was only really interested in finding recruits that we're going after or that we have.
I like this a lot though, nice job.
EDIT: I realize you did bold the commits, but I'm talking about targets, too.
One thing I've noticed is that we've offered a ton of these guys.
We've already got 6 guys on the list and it's not a stretch to say we probably lead for 6 and are in play for another dozen.
Given 219 players, each team from a BCS conference, assuming all teams are on equal ground (they aren't, I know) should average 3.42 players from this list. Texas has 12 and doesn't have to leave the state to recruit. Must be nice.
For the most part you see a general consensus from Rivals, Scout, and 247. ESPN almost has a monopoly on outliers. Running down the list you see ESPN differ greatly from the other 3, for years I thought I was just imagining it.
With outliers, median might be a better choice than mean.
Interesting idea, but my own view is that this, too, drops too much information. I made a similar point above, but we're working with limited data points for each observation, and I tend to think that the outliers tell us something important. Basically, these are the prospects for which a site feels so strongly that a recruit is over/underrated that it's willing to go on a limb.
Zeke Pike offers an interesting example. His rankings are 72, 33, 18, and 412. The 412 comes from ESPN and is its actual ranking (i.e. not really imputed). If we were to take the median, Pike would have a 52.5, which should easily place him in the top 50. Maybe ESPN's crazy and that's the right thing to do, but the fact that they dislike him that much seems relevant to me. His current ranking of #105 feels more appropriate.
No information is "dropped" when using median; the data still exists. The idea is to use the method that best estimates a true average. When outliers are rampant (e.g. salaries, home values) median is typically a better average than mean. The Pike example was cherry-picked, as CW has his stock dropping. Pipkins makes an easy in-kind counterpoint. Regardless, I applaud your effort and hope you provide us with future updates.
Technically, I guess you're right that medians don't "drop" data, but they definitely lose some of the nuance. Mat articulated this well below, but I'll use a real player. I'm not cherry-picking this based on a trend; the numbers just illustrate the point well.
Take Devin Fuller. He's ranked #17, #37, #39, and #150 by the recruiting services. Scout has him at #150, which clearly reflects some uncertainty about him as a prospect (since they're willing to publicly be much more down on him).
The median of Fuller's rankings would be 38. It wouldn't change at all whether Scout ranked him at #38, #150, or as a one-star player. My view is that it's a mistake not to incorporate that information. If Fuller committed to Michigan tomorrow, I would note Scout's reservations in my mind, not just write them off as an outlier.
If there were 100 recruiting services here, I would completely agree. (This, by the way, would bring us closer to your salary & home value situations.) Hell, maybe one of them would rank a top 10 recruit at #80,000 because the kid slept with the guy's mom. We don't have enough data points to make that a good idea, though.
Thanks for posting this. I hope this ranking system actually catches on around this blog.
While we're talking about great user created content, does anyone know what became of the recruiting map that was posted about a month ago? I thought that was also a great idea and took a bit of work. Would love to see an updated version of that.
From what I can tell, it hasn't been updated since at least when Pittman chose MSU. I believe it is open source content though, so a user should be able to add Pittman to MSU and also add/subtract anyone from Michigan.
I know theres still some quibbling about how much the outliers should affect the rankings, but removing the veto thing is the biggest fix you could make in that regard.
I agree with you about the median/mean issue - cutting out information with so few sources is highly debateable.
Here's a few ideas for taking these rankings up a notch if you want to take it further:
1. Add an average star ranking. Star rankings are something everyone's familiar with so if a guy is 3.25 or 3.75 stars, we'll all know what that means.
2. Run the same (or similar) process for position specific rankings.
3. Quantify unanimity of the services. It would be pretty simple to calculate a number and then covert it into something digestable for non-technical folks "e.g., high level of agreement, moderate, low"
If you do all this I think you could have a pretty popular link and maybe even a blogpage that might get enough hits to give you some pocket change. The key would be to make it easily digestable. Just a thought...
All good ideas. Thanks for the suggestions in the previous thread and this one. If time permits, I might jump on some of those.
Like I said on your other diary I really like what you've done but as a few other mentioned those outlier rankings really bother me and affect the standings too much.
When you see one recruiting site disagreeing on a player by 400-900 ranks something is going wrong.
What do you define as an outlier? On the small scale looking at someone like Keith Marshall at #14 the 4 sites have him at 55, 5, 7, 6 which if you call the 55 an outlier would bring him into the top 3.
But then that one rank doesn't look as off when you have Amara Darboh at 194, 161, 148, 572 (outlier of 378 spots) the already mentioned Kwontie Moore 116 1054 97 266 (outlier of 788 spots) or Leonte Caroo 211 86 245 763 (outlier of 552 spots).
This is why a median would work well as was suggested above.
If one guy gets ranks of 6, 8, 10, and 170 why would you rank him the same as a guy who gets ranks of 7, 7, 9, 9?
The median is the same. But they shouldn't be viewed as equivalent.
By tossing out the 170 you're ignoring the fact that one site has serious doubts about a guy, while there is unanimity in opinon of the other.
There's many many examples where throwing out the top and bottom values (half the infomration you have) leads to misleading results.
First, your calculation isn't correct. Second, the aim is to develop a reasonable estimate for the next credible data point. Assume this next data point is provided by Lemming (I know, I know), using your made up player above, which is a better guess at Lemming's ranking of that player, 48.5 (mean) or 8 (median)? Given the type of data, most statisticians would argue the latter.
You're not trying to guess what another recruiting analyst would do, you're trying to aggregate what various sites think about someone. Throwing out opinions doesn't do that well.
I realize that median is generally superior to mean, but in this case (when you want to include all opinions, including the outliers) its not.
At the very least, if you're going to rate by the median you'd want to add an uncertainity value.
And yes, you're right that my math would be different, but you got my point.
I really like your analyses. I'd like to see these rankings account for the ranking of a player within his position group (complicated by the fact that some players aren't ranked the same in all positions). There has to be a factor that could correct for situations where some positions are just naturally ranked lower, but where having the top player at a given position should reflect favorably on the school the player goes to.
Thanks this is some great info I really like what you did and agree that a median would be less informative. I am a majoring in engineering and one major thing i have learned is that when you has less than 10 independent data points you should not disclude any or rely on anything but the mean. With this said I think you could get extremely creative by considering a way to include the median as 1/5 of the data points. By doing this you would then calculate the median for each player, make a separate column for it like you have for Rivals, Scouts, 247 and ESPN and include that number to give you your mean. this would make a player ranked 7,7,9,9 ranked much better than a 7,7,9,200 but would make a player with a major outlier like Marshall and players mentioned above less of a factor. I also like the star rankings average that would be cool as mentioned above and the idea for individual positions but I know how much time this takes. What you have is excellent and everyone will always have an opinion to what is wrong with your way of combining stats so don't worry too much about those who don't appreciate everything you did.
With all of this said I would like to post a diary of some information I created last Friday but I am having trouble inserting an excel spreadsheet like this from my computer. Can someone tell me how to upload a table like this to mgoblog?
and i can certainly understand your plea for no quick changes in the sites' ratings. Tried pulling the data to run some scenarios myself, got pretty decent formulae for extracting info from the webpages, but then ran into the fact that for too many of the guys, they do not use standard spellings of their names, so there's a ton of manual work involved to build up this kind of calculation. Bravo!
Agree on your position of using mean vs. median; for a player with rankings from all 4 sites, the median is going to be mathematically equivalent to dropping the highest and lowest and averaging the remaining two (and thus eliminates too much useful data).
This is good stuff. Well done. Something to go over as we wait for the next commit.
very interesting. For those of us who are poor numbercrunchers, this is a treat.
Funny how each site has the rankings flipped for Ross & Stone which makes them each top prospects even though you may not know it by looking at only one site.
I don't think that much granularity really will make the difference. What does it mean to be the #251 overall or the #291 overall? More precisely, since recruits are ranked by position, does it matter that much if you're the 40th TE or the 45th TE? I would say no.
As an easy rule of thumb, I figure that anybody that makes a 150/250/300 list is heavily scouted and most sites assign them the 5/4 stars. Beyond that, it's hard to tell within the sea of 3 stars who is there on ability and who is there because they didn't camp/late bloomer/play for a small terrible team etc. There is a consensus at the top with the 5 star/high 4 star level (as in, he's really good!), followed by a steep level of variation. This is to be expected, as obviously the rankings are subjective. You don't know why the #20 CB is ranked 200th but the #21 CB is ranked 240th. is that sign of a huge gap in talent? Or is it because the guy who was in charge of scouting the 21st CB didn't speak up, or didn't have enough influence? You're trying to get very granular on a flawed set of data.
To look at the variability, just take your matrix and look at standard deviations.
Taking a quick look at those numbers, there seems to be a consensus among the top 50 or so athletes. I don't feel like running the numbers, but I bet you can find a significant difference within the deviations of players between 1-50 and any other set of 50 players. Once you move past the top 50, the recruiting sites start disagreeing much more on individual placement.
So again, I think we're trying to reinvent the wheel here. Without each scout scouting all 1500 3*+ players, I don't think you're ever going to get a great ranking system beyond player #50 or so.
Some other fun things:
Lots of fun stuff in here. Thanks for the response.
The one point on which we most clearly agree: these rankings aren't great past #200 (for those last 20 guys or so). Initially, I dropped them and called it a top 200, but I figured that someone would get irritated by that and want to decide for him/herself what to make of the last 20.
More generally, I also agree that there's a decent amount of variation across the sites. I disagree, however, that this is reason to give up on aggregated rankings. The fact that there's variation across the sites is exactly why it's useful to aggregate like this. If there were general agreement, then it wouldn't be hard to figure out who's where. In general, too, we should believe that a recruit who's ranked #251 is more highly regarded than one who's ranked #291. It's true that those are difficult judgments for the sites to make, but on average, I see no reason not to assume that the recruiting service generally likes a player who's ranked a little higher more than one who's ranked a little lower.
On another note, I really like those points at the bottom of your post, and you hit on something that I think is important. You said that 15 recruits ranked better in aggregate than in any individual ranking. Part of the reason that I think this type of aggregation is useful is a trick that numbers play on people. When I look at a recruit who's ranked, say, #95, #100, #100, and #105, I might be tempted to say that he's roughly the 100th most highly regarded recruit. That's not true. He's actually #75. This is hard to see without aggregated rankings.
(I just noticed that you stuck this in the original post, too -- thanks for that -- so I'll post my response there as well.)