First diary, here goes.
Given the surprising attrition we've faced in the past few days, I decided to look into how distance affects the outcome of our players over the years. Frozemangos made a board topic and looked into Harbaugh's recruits over the past couple seasons. Seth suggested using his data base down in the thread, so I did.
I downloaded the data and cut out players who never made it here for grade reasons (think Demar Dorsey). Then I made some assumptions. Seth's database is coarse by location; i.e., it only gives the recruit's home state, and I was too lazy to look up specifically where within each player's home state he came from, so I assumed they came from the state capital. This was the easiest since I found a table of GPS coordinates for each state capital.
Then I used a python package called geopy to calculate the distance between any two latitude, longitude locations on Earth. It's actually a step better than using spherical trig: it includes the asphericity of the Earth, so it isn't assuming the two points lie along a great circle on a sphere. For those of you who aren't familiar with this language, a great circle is the shortest path between two points on a sphere. This is why plane trajectories don't look straight when projected onto a flat atlas (and why you go over Greenland when flying to Europe).
With the distances in hand, I munged the data a bit. I assigned a "1" to each player who finished playing at Michigan and never played at another college location: entered the draft early, played out their eligibility, got a firm handshake and retired, etc. I assigned a "2" to any player who left due to disciplinary reasons, and I assigned a "3" to any player who transferred before their eligibility was up. I then looked at the distances the players have traveled:
The furthest distance is Julius Welschof, and he's still on the team, so I cut him out to allow for more efficient binning of the data. I worked with the following:
Using bins of 300 km mostly smooths out the assumption of assigning the state capital for each player since most players probably live within 300 km of their state's capital. (I realize that I used Lansing for Michigan's location, and I feel bad about it, but I didn't feel bad enough to fix it.)
Within each of these bins, I computed the fraction of "1s", "2s", and "3s" for the data, and I also generated 1000 bootstrap samples (this is a statistical technique for estimating the uncertainty). You'll notice the uncertainty is larger where the bins are have fewer, as expected.
First, here is the breakdown of players in each group as a function of distance.
And here are the fractions of each group as a function of distance.
I used blue to represent the players who finished their college career here; I made the transfers red, and the disciplined green (because obviously). The data do not support a strong trend that players who come from further away transfer more. A more national dataset would be useful to make a more conclusive determination. There is insufficient data, especially above 2000 km.
In conclusion, I am better at working when the objective has nothing to do with what I am paid to do.
I for one, am not that concerned about the transfers. I think everyone should calm down. These are young people, and a lot of money is involved if they can hack these four years together properly.
Thanks to Seth for compiling an amazing dataset to work with. This was fun.