Hokepoints States the Principle of B

Seth March 25th, 2014 at 10:09 AM

Tourney face. [Fuller]

Beilein teams go further in the tournament than their seeds. This is known. We've repeated it so often that smart bracketeers even calculate it into their expectations. I've saved the "why" and "wherefore" of this effect for a roundtable question since that gets into the basketball strategy stuff that I'm weak in.

What I can do is build a pivot table out of multiple bits of data; in this case it was lots of schmearing and pasting, column breaks, and vlookups from sports-reference.com's bracket history and annual coaches records. The important lesson here is you're supposed to know it was hard.

UPDATE: Here's the raw data.

The first thing I tried was straight-up expectations by seed: top seeds are expected to get to the Final Four, 2-seeds to the Elite Eight; 3- and 4-seeds to the Sweet Sixteen; 5-, 6-, 7- and 8-seeds to the round of 32. The results had Beilein #5 after Brad Stevens of Butler, Sean Miller, and some Mizzou coaches who often had 9 seeds. That suggested there's a problem with my figuring:

I'm expecting 9 and 10 seeds to never advance so they're always in the positive; every time an 8 loses to a 9 it's a hit. The actual distribution is, unsurprisingly, progressive:

With over 1300 teams in my study there's very little deviation from the logarithm. It suggests, for all our complaining, that the committee does a pretty good job.

Seed	Exp Wins	Seed	Exp Wins
1	3.21	9	0.66
2	2.41	10	0.53
3	1.94	11	0.42
4	1.60	12	0.32
5	1.34	13	0.23
6	1.13	14	0.14
7	0.95	15	0.06
8	0.79	16	0.00

Since I'm a history major who had to re-teach himself exponential functions this morning (if predicting basketball games required encyclopedic knowledge of Plantagenets I'd have Ken Pomeroy's job) please go easy on me if I dispense with the other stuff and just use the values Excel returned as a base expectation of tournament victories for each seed (at right). The formula according to Excel:

y= 1.1634Ln(x) + 3.2127

With an expectation for victories now I can get a reasonable comparison versus that, for example a 2-seed that advances to the Sweet 16 has 2 victories minus 2.41 expected = 0.41 fewer wins than they should have. The last thing was to remove coaches who've been to fewer than five tournaments. We're ready to rename March after a coach. But which one?

[Don't act all surprised; you knew I'd make you jump for it.]

Best Coaches at Advancing Past Seed Since '93
Rk	Coach	Schools (tourneys)	Avg Wins Over Exp
1	Brad Stevens	Butler(5)	1.426
2	John Beilein	Canisius(1), Richmond (1), WVU (2), Mich (4)	0.752
3	Tom Izzo	MSU (16)	0.714
4	Sean Miller	Xavier (4), Zona (2)	0.699
5	Billy Donovan	Florida (13)	0.687
6	Nolan Richardson	Arkansas (8)	0.669
7	Rick Pitino	Kentucky (5), L'ville (10)	0.609
8	Dean Smith	UNC (5)	0.565
9	Jim Calhoun	UConn (15)	0.522
10	Jim Larranga	George Mason (5), Miami(YTM) (1)	0.461

Translation: in eight tourney appearances, Beilein has been good for about 3/4 of a win beyond his seed.

One deep run for a low-seeded team can make big a difference. Stevens deserves the crown of the March King for taking his 5- and 8-seeded Butler teams to consecutive national championship games. But if remove George Mason's surprising trip to the Final Four in '06 and Larranga drops into the negatives (-0.162). Remove 2013 Michigan and Beilein is at 0.374.

Also:

Lol Duke
Rk	Coach	Schools (tourneys)	Avg Ws over Exp
62	Mike Krzyzewski	Duke (20)	-0.168

To correct for outliers I'll recalculate among the tourney regulars by removing their best and worst runs, and increasing the minimum number of appearances to seven (sorry Stevens/Miller/Smith):

Best Tourney Regulars at Advancing Past Seed Since '93 (best & worst runs removed)
Rk	Coach	Schools (tourneys)	Avg Wins Over Exp
1	Tom Izzo	MSU	0.704
2	John Beilein	Canisius, Richmond, WVU, Mich	0.703
3	Rick Pitino	Kentucky, L'ville	0.611
4	Nolan Richardson	Arkansas	0.586
5	Billy Donovan	Florida	0.570
6	Jim Calhoun	UConn	0.413
7	Roy Williams	Kansas, UNC	0.383
8	Steve Lavin	UCLA, St. John's	0.321
9	John Chaney	Temple	0.319
10	John Calipari	UMass, Memphis, Ky.	0.318

Lol Teams from Indiana. Jim Boeheim just misses this list. Other coaches of interest out of the 90 who qualified: Mike Davis (16th, +0.331); Jim O'Brien (17th, +0.300); Thad Matta (31st, +0.162), Bruce Pearl (38th, 0.095); Bo Ryan (45th, 0.012); Gene Keady (52nd, –0.060), Tom Crean (70th, –0.280), Bob Knight (73rd, –0.327), Kelvin Sampson (80th, –0.407), and Mike Brey (83rd, –0.496).

Oliver Purnell, formerly of Dayton and Clemson, [EDIT: now at DePaul] had the worst performance, but that should not overshadow the regularity of Jamie Dixon's Pittsburgh teams, which have managed to consistently underperform their seed virtually every year since 2004.

When we look at this by team instead of coaches—still keeping it to seven minimum tourney runs and removing best and worst runs—Michigan ends up 20th at +0.140. The top tourney teams are West Virginia (+0.684), Tulsa (+0.666), Butler (who had a couple of Sweet 16 runs before Stevens), Kentucky, Florida, MSU, UNC, UConn, UCLA, and Louisville. The worst: Clemson (-0.802), Notre Dame (-0.632), New Mexico, UNLV, Wake, Vandy, Pitt, Cincy, Oklahoma and Charlotte.

Other things. I still have some work to do to attach the stats (since '03 at least) to Kenpom—I'd like to see if there are any correlations to offensive/defensive squads or other key stats. The one thing I learned so far is there is zero correlation to either the coach's tourney experience or how many years the coach has been with the team.

Comments

MGlobules

March 25th, 2014 at 10:45 AM ^

amid all of the encomia for Beilein it's important to note that regression has likelier beginnings when you start as a two-seed.

Joined: 11/17/2008

MGoPoints: 47516

Seth

March 25th, 2014 at 11:21 AM ^

That's why I used a logarithmic progression. A 2-seed is expected to get 2.4 wins; Michigan just needs to beat Tennessee and they'll be outperforming expectations this year by 0.60. If they lose, they'll underperform by just .040

Here's how seeds performed versus my log expectations:

Seed	Wins over Expectation
12	0.235
1	0.168
10	0.156
11	0.041
13	0.041
15	0.02
3	0.005
16	0.000
2	-0.014
14	-0.026
6	-0.034
4	-0.053
9	-0.062
8	-0.116
7	-0.164
5	-0.222

That's pretty damn close to zero. Two-seeds are expected to perform almost exactly as well as 2-seeds have performed historically.

Joined: 10/14/2008

MGoPoints: 149089

MGlobules

March 25th, 2014 at 11:35 AM ^

my knowledge, but the seeding analysis--verifying its basic soundness--is very cool. The NCAA gets something right!

Joined: 11/17/2008

MGoPoints: 47516

El Jeffe

March 25th, 2014 at 11:54 AM ^

These findings also provide statistical support for the 5-12 and 7-10 "upsets" every year.

Any idea why that is? I mean, why the committee keeps underrating the 10s and 12s and overrating the 5s and 7s?

Also, any idea why there is asymmetry in the overperformance of 11-seeds (0.041) and the underperformance of 8-seeds (-0.116)? The other two matchups--7/10 and 5/12 are pretty symmetric w/r/t over- and under-performance.

Joined: 07/07/2008

MGoPoints: 27314

WolverineRage

March 25th, 2014 at 12:37 PM ^

So, first, a caveat that I'm recalling this from memory and have little data on hand to back this up, however, I remember several years ago some radio show or another covering this exact topic.

One of the stronger theories, if I recall, was that when you start to get into the 5-7 seeds, you are usually looking at teams that finished in the top half of power conferences but didn't win it (2014: OSU, Duke) and are usually inherently flawed while in the 10-12 seeds you can end up with conference winners of non-power conferences (2014: Mercer) and a team that wins its conference outright can be more balanced and capable of exploiting the flaws of their opponent.

Obviously, Dayton is an outlier to this theory as they did not win their conference but I suppose one could point to the dreaded "hot team" or "matchups" explanations which can bandied about without needing any data to support it :-)

Joined: 01/07/2011

MGoPoints: 206

jmblue

March 25th, 2014 at 4:02 PM ^

I think this is basically correct, although Duke was a #3 seed and Mercer a #14.

Joined: 11/07/2008

MGoPoints: 102712

Seth

March 25th, 2014 at 12:48 PM ^

I noticed that. I have a former co-worker with whom I have an annual $0.25 bet on our brackets who always talks about his 12-5 picks.

This provides evidence that it's a more common upset. However it's only 0.23 over the norm, so if you're picking all 12s over 5s you're probably going to get 3 wrong.

Joined: 10/14/2008

MGoPoints: 149089

mgoblue98

March 25th, 2014 at 5:28 PM ^

The statistics that I found a couple of years ago showed that since 1985, the 5 seed has a record of 72-36 against the 12-seed. That is a 66.667% win rate for the 5 seed.

Joined: 11/10/2013

MGoPoints: 9356

yossarians tree

March 25th, 2014 at 1:18 PM ^

I'm no statistician but I think its probably because there is a compressed differentiation in quality among teams between 5 and 12. The difference is not in reality always "7", but is actually more like "3." Or, parity.

Joined: 08/19/2010

MGoPoints: 22247

MGozer

March 25th, 2014 at 6:44 PM ^

the 12 seeds want it more.

Joined: 06/08/2013

MGoPoints: 392

Indiana Blue

March 25th, 2014 at 12:00 PM ^

but if JB drops to .375 ish when you remove 2013, how can he stay at .700 ish when you remove the coach's best and worst scenario ? Just wondering ....

And thanks for adding the LOL classification !

Go Blue!

Joined: 09/19/2010

MGoPoints: 5677

RobSk

March 25th, 2014 at 12:52 PM ^

because taking out a bad performance helps offset the taking out the good performance.

Rob

Joined: 03/01/2010

MGoPoints: 2996

HAIL 2 VICTORS

March 25th, 2014 at 12:14 PM ^

Eh-

They did something similar over at RCMB.

Joined: 12/01/2009

MGoPoints: 21040

Cali Wolverine

March 25th, 2014 at 12:32 PM ^

On side note - How is that bracket working out with VCU?

Joined: 01/13/2013

MGoPoints: 21754

Seth

March 25th, 2014 at 12:41 PM ^

har har hardy har. I believe I did state my caveat that I think VCU will have as tough a time with SF Austin as the top seeds.

I also took OSU and Duke over Michigan. I have never been so happy to have my bracket so annihilated.

Joined: 10/14/2008

MGoPoints: 149089

Cali Wolverine

March 25th, 2014 at 1:04 PM ^

& UConn in my Sweet 16...which looks great now...but were pretty crazy picks at time.

Joined: 01/13/2013

MGoPoints: 21754

taistreetsmyhero

March 25th, 2014 at 10:47 AM ^

Couldn't it be better to look at the median performance for each coach?

Joined: 08/08/2012

MGoPoints: 39396

Jivas

March 25th, 2014 at 10:56 AM ^

First of all, this is really cool analysis. Two items: (1) What's Coach K's rank in the final (best and worst removed) metric? (2) Oliver Purnell is now at DePaul.

Joined: 07/06/2008

MGoPoints: 2396

brax

March 25th, 2014 at 11:18 AM ^

This is an excellent analysis. Can you please post the full results?

Joined: 02/04/2009

MGoPoints: 558

Seth

March 25th, 2014 at 11:22 AM ^

I had a client call at 11 and was having trouble getting it uploaded to google for some reason. In a bit.

Joined: 10/14/2008

MGoPoints: 149089

El Jeffe

March 25th, 2014 at 11:50 AM ^

I don't get it. Aren't we your "clients"?

Joined: 07/07/2008

MGoPoints: 27314

Seth

March 25th, 2014 at 12:05 PM ^

Okay I'll put it another way. I was negotiating another 40k tourney and liveblog for you (successfully)

Joined: 10/14/2008

MGoPoints: 149089

El Jeffe

March 25th, 2014 at 12:09 PM ^

fistpump.jpg

Joined: 07/07/2008

MGoPoints: 27314

Artichokes Anonymous

March 25th, 2014 at 11:48 AM ^

Great analysis. One thing that seems hard to account for is top programs receiving higher seeds than what they have earned. To me it makes the Coach K stat line a bit more understandable (as Duke is overrated, if anything) and the figures from Izzo, Pitino, Donovan, etc. more impressive.

Joined: 12/21/2012

MGoPoints: 1055

Seth

March 25th, 2014 at 12:04 PM ^

I think seeding bias is allayed by the fact that top seeds are given an appreciably easier road. Say a Duke team that ought to be the worst 2 seed is overrated to a one-seed. That hypothetical team gets a 16 seed instead of a possibly decent 15 seed in the first round, and the 9 instead of the 7 to get to the Sweet 16, and a 4 instead of a 3 to get to the Elite 8. They'll also most likely get to play a few miles from home.

I don't know how to separate the value of that higher seed from how good the team is.

Joined: 10/14/2008

MGoPoints: 149089

taistreetsmyhero

March 25th, 2014 at 12:21 PM ^

but, out of a strange curiosity, I'd like to see how the Avg. Tourney Wins by Seed vs. Expectation graph looks in women's college bball. One knock I always assume is true is that there is too little drama in it b/c teams play to seed too often (ie there are too few upsets). Just curious if my assumption is accurate or if I'm a foo...

Joined: 08/08/2012

MGoPoints: 39396

ca_prophet

March 25th, 2014 at 2:14 PM ^

Is that there's too little drama because there are only a few teams capable of winning the tourney. UConn, Stanford, Tennessee and friends have had teams well above the competition and have gone on to crush the tourney considerably more often than dominant teams show up on the men's side.

Joined: 09/07/2010

MGoPoints: 7016

Va Azul

March 25th, 2014 at 12:39 PM ^

Log transform, then linear comparison? A top seed can earn~ [-3.8,2.2] wins above expectation. A bottom seed [0,6] wins. There is a bias for lower seeds that is not accounted for.

Joined: 09/13/2008

MGoPoints: 99

Seth

March 25th, 2014 at 12:44 PM ^

True, but you have to factor in , which is the result of a history major who got up at 5 this morning with a poopy crying 3-week-old and once that was returned to a normative state went about googling how to work with logs since the last time he'd done so was in the mid-'90s with a TI-86.