Hokepoints States the Principle of B

Submitted by Seth on March 25th, 2014 at 10:09 AM


Tourney face. [Fuller]

Beilein teams go further in the tournament than their seeds. This is known. We've repeated it so often that smart bracketeers even calculate it into their expectations. I've saved the "why" and "wherefore" of this effect for a roundtable question since that gets into the basketball strategy stuff that I'm weak in.

What I can do is build a pivot table out of multiple bits of data; in this case it was lots of schmearing and pasting, column breaks, and vlookups from sports-reference.com's bracket history and annual coaches records. The important lesson here is you're supposed to know it was hard.

UPDATE: Here's the raw data.

The first thing I tried was straight-up expectations by seed: top seeds are expected to get to the Final Four, 2-seeds to the Elite Eight; 3- and 4-seeds to the Sweet Sixteen; 5-, 6-, 7- and 8-seeds to the round of 32. The results had Beilein #5 after Brad Stevens of Butler, Sean Miller, and some Mizzou coaches who often had 9 seeds. That suggested there's a problem with my figuring:

wins over exp

I'm expecting 9 and 10 seeds to never advance so they're always in the positive; every time an 8 loses to a 9 it's a hit. The actual distribution is, unsurprisingly, progressive:

seed distribution

With over 1300 teams in my study there's very little deviation from the logarithm. It suggests, for all our complaining, that the committee does a pretty good job.

Seed Exp Wins Seed Exp Wins
1 3.21 9 0.66
2 2.41 10 0.53
3 1.94 11 0.42
4 1.60 12 0.32
5 1.34 13 0.23
6 1.13 14 0.14
7 0.95 15 0.06
8 0.79 16 0.00

Since I'm a history major who had to re-teach himself exponential functions this morning (if predicting basketball games required encyclopedic knowledge of Plantagenets I'd have Ken Pomeroy's job) please go easy on me if I dispense with the other stuff and just use the values Excel returned as a base expectation of tournament victories for each seed (at right). The formula according to Excel:

y= 1.1634Ln(x) + 3.2127

With an expectation for victories now I can get a reasonable comparison versus that, for example a 2-seed that advances to the Sweet 16 has 2 victories minus 2.41 expected = 0.41 fewer wins than they should have. The last thing was to remove coaches who've been to fewer than five tournaments. We're ready to rename March after a coach. But which one?

[Don't act all surprised; you knew I'd make you jump for it.]

Best Coaches at Advancing Past Seed Since '93
Rk Coach Schools (tourneys) Avg Wins
Over Exp
1 Brad Stevens Butler(5) 1.426
2 John Beilein Canisius(1), Richmond (1),
WVU (2), Mich (4)
3 Tom Izzo MSU (16) 0.714
4 Sean Miller Xavier (4), Zona (2) 0.699
5 Billy Donovan Florida (13) 0.687
6 Nolan Richardson Arkansas (8) 0.669
7 Rick Pitino Kentucky (5), L'ville (10) 0.609
8 Dean Smith UNC (5) 0.565
9 Jim Calhoun UConn (15) 0.522
10 Jim Larranga George Mason (5),
Miami(YTM) (1)

Translation: in eight tourney appearances, Beilein has been good for about 3/4 of a win beyond his seed.

One deep run for a low-seeded team can make big a difference. Stevens deserves the crown of the March King for taking his 5- and 8-seeded Butler teams to consecutive national championship games. But if remove George Mason's surprising trip to the Final Four in '06 and Larranga drops into the negatives (-0.162). Remove 2013 Michigan and Beilein is at 0.374.


Lol Duke
Rk Coach Schools (tourneys) Avg Ws
over Exp
62 Mike Krzyzewski Duke (20) -0.168

To correct for outliers I'll recalculate among the tourney regulars by removing their best and worst runs, and increasing the minimum number of appearances to seven (sorry Stevens/Miller/Smith):

Best Tourney Regulars at Advancing Past Seed Since '93
(best & worst runs removed)

Rk Coach Schools (tourneys) Avg Wins
Over Exp
1 Tom Izzo MSU 0.704
2 John Beilein Canisius, Richmond,
WVU, Mich
3 Rick Pitino Kentucky, L'ville 0.611
4 Nolan Richardson Arkansas 0.586
5 Billy Donovan Florida 0.570
6 Jim Calhoun UConn 0.413
7 Roy Williams Kansas, UNC 0.383
8 Steve Lavin UCLA, St. John's 0.321
9 John Chaney Temple 0.319
10 John Calipari UMass, Memphis, Ky. 0.318

Lol Teams from Indiana. Jim Boeheim just misses this list. Other coaches of interest out of the 90 who qualified: Mike Davis (16th, +0.331); Jim O'Brien (17th, +0.300); Thad Matta (31st, +0.162), Bruce Pearl (38th, 0.095); Bo Ryan (45th, 0.012); Gene Keady (52nd, –0.060), Tom Crean (70th, –0.280), Bob Knight (73rd, –0.327), Kelvin Sampson (80th, –0.407), and Mike Brey (83rd, –0.496).

Oliver Purnell, formerly of Dayton and Clemson, [EDIT: now at DePaul] had the worst performance, but that should not overshadow the regularity of Jamie Dixon's Pittsburgh teams, which have managed to consistently underperform their seed virtually every year since 2004.

When we look at this by team instead of coaches—still keeping it to seven minimum tourney runs and removing best and worst runs—Michigan ends up 20th at +0.140. The top tourney teams are West Virginia (+0.684), Tulsa (+0.666), Butler (who had a couple of Sweet 16 runs before Stevens), Kentucky, Florida, MSU, UNC, UConn, UCLA, and Louisville. The worst: Clemson (-0.802), Notre Dame (-0.632), New Mexico, UNLV, Wake, Vandy, Pitt, Cincy, Oklahoma and Charlotte.

Other things. I still have some work to do to attach the stats (since '03 at least) to Kenpom—I'd like to see if there are any correlations to offensive/defensive squads or other key stats. The one thing I learned so far is there is zero correlation to either the coach's tourney experience or how many years the coach has been with the team.



March 25th, 2014 at 11:21 AM ^

That's why I used a logarithmic progression. A 2-seed is expected to get 2.4 wins; Michigan just needs to beat Tennessee and they'll be outperforming expectations this year by 0.60. If they lose, they'll underperform by just .040

Here's how seeds performed versus my log expectations:

Seed Wins over Expectation
12 0.235
1 0.168
10 0.156
11 0.041
13 0.041
15 0.02
3 0.005
16 0.000
2 -0.014
14 -0.026
6 -0.034
4 -0.053
9 -0.062
8 -0.116
7 -0.164
5 -0.222

That's pretty damn close to zero. Two-seeds are expected to perform almost exactly as well as 2-seeds have performed historically.

El Jeffe

March 25th, 2014 at 11:54 AM ^

These findings also provide statistical support for the 5-12 and 7-10 "upsets" every year.

Any idea why that is? I mean, why the committee keeps underrating the 10s and 12s and overrating the 5s and 7s?

Also, any idea why there is asymmetry in the overperformance of 11-seeds (0.041) and the underperformance of 8-seeds (-0.116)? The other two matchups--7/10 and 5/12 are pretty symmetric w/r/t over- and under-performance.


March 25th, 2014 at 12:37 PM ^

So, first, a caveat that I'm recalling this from memory and have little data on hand to back this up, however, I remember several years ago some radio show or another covering this exact topic.

One of the stronger theories, if I recall, was that when you start to get into the 5-7 seeds, you are usually looking at teams that finished in the top half of power conferences but didn't win it (2014: OSU, Duke) and are usually inherently flawed while in the 10-12 seeds you can end up with conference winners of non-power conferences (2014: Mercer) and a team that wins its conference outright can be more balanced and capable of exploiting the flaws of their opponent.

Obviously, Dayton is an outlier to this theory as they did not win their conference but I suppose one could point to the dreaded "hot team" or "matchups" explanations which can bandied about without needing any data to support it :-)



March 25th, 2014 at 12:48 PM ^

I noticed that. I have a former co-worker with whom I have an annual $0.25 bet on our brackets who always talks about his 12-5 picks.

This provides evidence that it's a more common upset. However it's only 0.23 over the norm, so if you're picking all 12s over 5s you're probably going to get 3 wrong.


March 25th, 2014 at 10:56 AM ^

First of all, this is really cool analysis. Two items:

(1) What's Coach K's rank in the final (best and worst removed) metric?

(2) Oliver Purnell is now at DePaul.

Artichokes Anonymous

March 25th, 2014 at 11:48 AM ^

Great analysis. One thing that seems hard to account for is top programs receiving higher seeds than what they have earned. To me it makes the Coach K stat line a bit more understandable (as Duke is overrated, if anything) and the figures from Izzo, Pitino, Donovan, etc. more impressive.


March 25th, 2014 at 12:04 PM ^

I think seeding bias is allayed by the fact that top seeds are given an appreciably easier road. Say a Duke team that ought to be the worst 2 seed is overrated to a one-seed. That hypothetical team gets a 16 seed instead of a possibly decent 15 seed in the first round, and the 9 instead of the 7 to get to the Sweet 16, and a 4 instead of a 3 to get to the Elite 8. They'll also most likely get to play a few miles from home.

I don't know how to separate the value of that higher seed from how good the team is.


March 25th, 2014 at 12:21 PM ^

but, out of a strange curiosity, I'd like to see how the Avg. Tourney Wins by Seed vs. Expectation graph looks in women's college bball. One knock I always assume is true is that there is too little drama in it b/c teams play to seed too often (ie there are too few upsets). Just curious if my assumption is accurate or if I'm a foo...

Va Azul

March 25th, 2014 at 12:39 PM ^

Log transform, then linear comparison? A top seed can earn~ [-3.8,2.2] wins above expectation. A bottom seed [0,6] wins. There is a bias for lower seeds that is not accounted for.


March 25th, 2014 at 12:44 PM ^

True, but you have to factor in Phoenician mem.svg, which is the result of a history major who got up at 5 this morning with a poopy crying 3-week-old and once that was returned to a normative state went about googling how to work with logs since the last time he'd done so was in the mid-'90s with a TI-86.

The data are linked if you wanna make a go at it.