amid all of the encomia for Beilein it's important to note that regression has likelier beginnings when you start as a two-seed.
Hokepoints States the Principle of B
Tourney face. [Fuller]
Beilein teams go further in the tournament than their seeds. This is known. We've repeated it so often that smart bracketeers even calculate it into their expectations. I've saved the "why" and "wherefore" of this effect for a roundtable question since that gets into the basketball strategy stuff that I'm weak in.
What I can do is build a pivot table out of multiple bits of data; in this case it was lots of schmearing and pasting, column breaks, and vlookups from sports-reference.com's bracket history and annual coaches records. The important lesson here is you're supposed to know it was hard.
UPDATE: Here's the raw data.
The first thing I tried was straight-up expectations by seed: top seeds are expected to get to the Final Four, 2-seeds to the Elite Eight; 3- and 4-seeds to the Sweet Sixteen; 5-, 6-, 7- and 8-seeds to the round of 32. The results had Beilein #5 after Brad Stevens of Butler, Sean Miller, and some Mizzou coaches who often had 9 seeds. That suggested there's a problem with my figuring:
I'm expecting 9 and 10 seeds to never advance so they're always in the positive; every time an 8 loses to a 9 it's a hit. The actual distribution is, unsurprisingly, progressive:
With over 1300 teams in my study there's very little deviation from the logarithm. It suggests, for all our complaining, that the committee does a pretty good job.
|Seed||Exp Wins||Seed||Exp Wins|
Since I'm a history major who had to re-teach himself exponential functions this morning (if predicting basketball games required encyclopedic knowledge of Plantagenets I'd have Ken Pomeroy's job) please go easy on me if I dispense with the other stuff and just use the values Excel returned as a base expectation of tournament victories for each seed (at right). The formula according to Excel:
y= 1.1634Ln(x) + 3.2127
With an expectation for victories now I can get a reasonable comparison versus that, for example a 2-seed that advances to the Sweet 16 has 2 victories minus 2.41 expected = 0.41 fewer wins than they should have. The last thing was to remove coaches who've been to fewer than five tournaments. We're ready to rename March after a coach. But which one?
[Don't act all surprised; you knew I'd make you jump for it.]
|Best Coaches at Advancing Past Seed Since '93|
|Rk||Coach||Schools (tourneys)||Avg Wins
|2||John Beilein||Canisius(1), Richmond (1),
WVU (2), Mich (4)
|3||Tom Izzo||MSU (16)||0.714|
|4||Sean Miller||Xavier (4), Zona (2)||0.699|
|5||Billy Donovan||Florida (13)||0.687|
|6||Nolan Richardson||Arkansas (8)||0.669|
|7||Rick Pitino||Kentucky (5), L'ville (10)||0.609|
|8||Dean Smith||UNC (5)||0.565|
|9||Jim Calhoun||UConn (15)||0.522|
|10||Jim Larranga||George Mason (5),
Translation: in eight tourney appearances, Beilein has been good for about 3/4 of a win beyond his seed.
One deep run for a low-seeded team can make big a difference. Stevens deserves the crown of the March King for taking his 5- and 8-seeded Butler teams to consecutive national championship games. But if remove George Mason's surprising trip to the Final Four in '06 and Larranga drops into the negatives (-0.162). Remove 2013 Michigan and Beilein is at 0.374.
|Rk||Coach||Schools (tourneys)||Avg Ws
|62||Mike Krzyzewski||Duke (20)||-0.168|
To correct for outliers I'll recalculate among the tourney regulars by removing their best and worst runs, and increasing the minimum number of appearances to seven (sorry Stevens/Miller/Smith):
Best Tourney Regulars at Advancing Past Seed Since '93
|Rk||Coach||Schools (tourneys)||Avg Wins
|2||John Beilein||Canisius, Richmond,
|3||Rick Pitino||Kentucky, L'ville||0.611|
|7||Roy Williams||Kansas, UNC||0.383|
|8||Steve Lavin||UCLA, St. John's||0.321|
|10||John Calipari||UMass, Memphis, Ky.||0.318|
Lol Teams from Indiana. Jim Boeheim just misses this list. Other coaches of interest out of the 90 who qualified: Mike Davis (16th, +0.331); Jim O'Brien (17th, +0.300); Thad Matta (31st, +0.162), Bruce Pearl (38th, 0.095); Bo Ryan (45th, 0.012); Gene Keady (52nd, –0.060), Tom Crean (70th, –0.280), Bob Knight (73rd, –0.327), Kelvin Sampson (80th, –0.407), and Mike Brey (83rd, –0.496).
Oliver Purnell, formerly of Dayton and Clemson, [EDIT: now at DePaul] had the worst performance, but that should not overshadow the regularity of Jamie Dixon's Pittsburgh teams, which have managed to consistently underperform their seed virtually every year since 2004.
When we look at this by team instead of coaches—still keeping it to seven minimum tourney runs and removing best and worst runs—Michigan ends up 20th at +0.140. The top tourney teams are West Virginia (+0.684), Tulsa (+0.666), Butler (who had a couple of Sweet 16 runs before Stevens), Kentucky, Florida, MSU, UNC, UConn, UCLA, and Louisville. The worst: Clemson (-0.802), Notre Dame (-0.632), New Mexico, UNLV, Wake, Vandy, Pitt, Cincy, Oklahoma and Charlotte.
Other things. I still have some work to do to attach the stats (since '03 at least) to Kenpom—I'd like to see if there are any correlations to offensive/defensive squads or other key stats. The one thing I learned so far is there is zero correlation to either the coach's tourney experience or how many years the coach has been with the team.
That's why I used a logarithmic progression. A 2-seed is expected to get 2.4 wins; Michigan just needs to beat Tennessee and they'll be outperforming expectations this year by 0.60. If they lose, they'll underperform by just .040
Here's how seeds performed versus my log expectations:
|Seed||Wins over Expectation|
That's pretty damn close to zero. Two-seeds are expected to perform almost exactly as well as 2-seeds have performed historically.
my knowledge, but the seeding analysis--verifying its basic soundness--is very cool. The NCAA gets something right!
These findings also provide statistical support for the 5-12 and 7-10 "upsets" every year.
Any idea why that is? I mean, why the committee keeps underrating the 10s and 12s and overrating the 5s and 7s?
Also, any idea why there is asymmetry in the overperformance of 11-seeds (0.041) and the underperformance of 8-seeds (-0.116)? The other two matchups--7/10 and 5/12 are pretty symmetric w/r/t over- and under-performance.
So, first, a caveat that I'm recalling this from memory and have little data on hand to back this up, however, I remember several years ago some radio show or another covering this exact topic.
One of the stronger theories, if I recall, was that when you start to get into the 5-7 seeds, you are usually looking at teams that finished in the top half of power conferences but didn't win it (2014: OSU, Duke) and are usually inherently flawed while in the 10-12 seeds you can end up with conference winners of non-power conferences (2014: Mercer) and a team that wins its conference outright can be more balanced and capable of exploiting the flaws of their opponent.
Obviously, Dayton is an outlier to this theory as they did not win their conference but I suppose one could point to the dreaded "hot team" or "matchups" explanations which can bandied about without needing any data to support it :-)
I think this is basically correct, although Duke was a #3 seed and Mercer a #14.
I noticed that. I have a former co-worker with whom I have an annual $0.25 bet on our brackets who always talks about his 12-5 picks.
This provides evidence that it's a more common upset. However it's only 0.23 over the norm, so if you're picking all 12s over 5s you're probably going to get 3 wrong.
The statistics that I found a couple of years ago showed that since 1985, the 5 seed has a record of 72-36 against the 12-seed. That is a 66.667% win rate for the 5 seed.
I'm no statistician but I think its probably because there is a compressed differentiation in quality among teams between 5 and 12. The difference is not in reality always "7", but is actually more like "3." Or, parity.
the 12 seeds want it more.
but if JB drops to .375 ish when you remove 2013, how can he stay at .700 ish when you remove the coach's best and worst scenario ? Just wondering ....
And thanks for adding the LOL classification !
because taking out a bad performance helps offset the taking out the good performance.
On side note - How is that bracket working out with VCU?
har har hardy har. I believe I did state my caveat that I think VCU will have as tough a time with SF Austin as the top seeds.
I also took OSU and Duke over Michigan. I have never been so happy to have my bracket so annihilated.
& UConn in my Sweet 16...which looks great now...but were pretty crazy picks at time.
Couldn't it be better to look at the median performance for each coach?
First of all, this is really cool analysis. Two items:
(1) What's Coach K's rank in the final (best and worst removed) metric?
(2) Oliver Purnell is now at DePaul.
This is an excellent analysis. Can you please post the full results?
I had a client call at 11 and was having trouble getting it uploaded to google for some reason. In a bit.
I don't get it. Aren't we your "clients"?
Great analysis. One thing that seems hard to account for is top programs receiving higher seeds than what they have earned. To me it makes the Coach K stat line a bit more understandable (as Duke is overrated, if anything) and the figures from Izzo, Pitino, Donovan, etc. more impressive.
I think seeding bias is allayed by the fact that top seeds are given an appreciably easier road. Say a Duke team that ought to be the worst 2 seed is overrated to a one-seed. That hypothetical team gets a 16 seed instead of a possibly decent 15 seed in the first round, and the 9 instead of the 7 to get to the Sweet 16, and a 4 instead of a 3 to get to the Elite 8. They'll also most likely get to play a few miles from home.
I don't know how to separate the value of that higher seed from how good the team is.
but, out of a strange curiosity, I'd like to see how the Avg. Tourney Wins by Seed vs. Expectation graph looks in women's college bball. One knock I always assume is true is that there is too little drama in it b/c teams play to seed too often (ie there are too few upsets). Just curious if my assumption is accurate or if I'm a foo...
Is that there's too little drama because there are only a few teams capable of winning the tourney. UConn, Stanford, Tennessee and friends have had teams well above the competition and have gone on to crush the tourney considerably more often than dominant teams show up on the men's side.
Log transform, then linear comparison? A top seed can earn~ [-3.8,2.2] wins above expectation. A bottom seed [0,6] wins. There is a bias for lower seeds that is not accounted for.
True, but you have to factor in , which is the result of a history major who got up at 5 this morning with a poopy crying 3-week-old and once that was returned to a normative state went about googling how to work with logs since the last time he'd done so was in the mid-'90s with a TI-86.
The data are linked if you wanna make a go at it.
Spambot Fail! Ugh.