LSU really beat itself up in '06.
I thought that myself when I read that article that talked about a Data Scientist(tm)
John Gasaway—AKA Big Ten Wonk—likes crusades. His last one was to obliterate rebound margin and seems to be going well. Not many use plain rebounds as a metric anymore, which is good because it makes no sense at all to do so.
Gasaway's latest horde of European knights with fuzzy ideas about salvation is aimed at the tournament seeding process:
I’m on the record as thinking that the mere distribution of wins — with due consideration for opponent, time, and place — can yield sufficient information to draw a line across the top quintile of D-I and tell the teams above this line, “You’re in!” But trying to do something as precise as sequencing an entire tournament field on an S-curve armed only with wins is a little like playing the piano while wearing oven mitts. It can be done, but the music would sound better if we freed up our fingers.
A few years ago I had a back-and-forth with Dan Steinberg of the DC Sports Bog about something similar: I was purveying a resume-based results-only college football poll at the same time he was publishing a top 25 from Vegas oddsmakers that claimed it was more accurate. Those are two diametrically opposed methods. The BlogPoll is descriptive: We have this data and this is our best guess at which teams have the most impressive resumes. Vegas is predictive: we have this data and this is our best guess as to who the best teams are.
So do you want your national title picked based on an assessment of the season or the team? I had a viscerally negative reaction to seeing things like LSU at #5 six weeks into the 2006 season when they'd lost to Auburn and LSU and beaten ULL, Tulane, Mississippi State, and Arizona. They proceeded to win the rest of their games. So Vegas was right, except if LSU was a little better and options at the top a little worse you can imagine a scenario where Vegas takes a team like LSU over some luckbox like 2002 Ohio State. Right or not, that ain't right.
The Vegas poll is answering a different question than I want the people deciding who should play for the national championship asking. If there are two major conference undefeated teams and a one-loss team that's so clearly better than the two undefeated teams but has an inexplicable turnover-filled loss in a driving sleet-storm that happened because their quarterback got injured, picking the obviously better team obliterates college football. It's not about some ineffable combination of NFL draft picks and victory margin, it's about wins. If that has embarrassingly dumbed down nonconference schedules at least it's provided a reason to play the games, and a reason to have your heart in your throat when the other team is driving for the win no matter what your MOV is.
No one is going to claim that loosening the dominion of wins over a sport that lets various .500 major conference teams compete for its title "obliterates" anything, but I'm still leery of a world where Michigan's overtime against Iowa is mostly important because it can push Michigan's Kenpom rating up a spot. Gasaway explicitly states he's fine with using wins for tourney selection but that only mitigates the problem; any solid at-large team sees that effect since they're just worried about seeding, not getting over the line.
It would be pretty dumb to have some guy from Wisconsin at the line shooting two to win against Ohio State and have those free throws hardly matter at all. Would it be fair? Yes. Would it result in better seedings for the occasional very good minor conference team that gets thrust into a tough first round matchup and can't show their stuff? Yes. But I think it would make the season much less vital. Sometimes a little unfairness is the lesser evil.
Now, if Gasaway's just talking about alerting the committee to performance-aware metrics when they attempt to evaluate the case of Utah State, a team that's obliterating the WAC but has only played three games against teams with a Kenpom rating higher than 90(!)* and gone 1-2 against them, sure. The way in which the Aggies have acquired their record should be able to influence the committee to bump them a little bit. His endorsement of Bilas's tweet calling RPI a "joke" suggests he's more militant than that.
Once you start talking about tossing a 17-7, 7-7 Big Ten team probably headed for 19-9 and 9-9 (this is Illinois—their finish: @OSU, Iowa, @Purdue, Indiana) onto a line where a Sweet Sixteen bid would only be a mild surprise you lose me**. The Illini's strong nonconference performance should easily see them into the tournament but while I love Kenpom I'd take eighteen games of .500 basketball over his rating when evaluating seeds.
Maybe I've read him wrong.
*[Iowa, the worst team in the Big Ten, is 82nd.]
**[To be clear, I'm not picking on Illinois because Gasaway is an Illinois grad. It's just that they're the Big Ten team with the goofiest-looking Kenpom rating given their record. Playing Texas, UNC, Maryland, Missouri, and Gonzaga in the nonconference will do that.]
[Also, think of the advantage lost in NCAA pools if people were fairly seeded based on Kenpom type metrics. Horror!]
LSU really beat itself up in '06.
The thing is, I really don't have a major problem with how the NCAA tournament is selected. By and large, the teams that play best throughout the year are rewarded with higher seeds and more convenient venues, and historically constitute the bulk of Final Four and championship clubs. Sure, maybe Illinois being put as an 8th seed instead of a 10th is due to a great OOC schedule, but rarely does it make a difference. Same goes for the Utah States of the world - yes they only played 3 "decent" teams, but 25 wins is impressive no matter who you beat. While Butler and George Mason give the impression that the mid-majors are just as dangerous as anyone else when it comes to the title game, the vast majority of teams win 1-2 games and then are out of the tournament. Getting too worked up over where they start off seems unnecessary when the end is basically the same.
If there is one issue I have with the seeding, it has to do with the overrating of conference results for teams from "name" conferences like the BE. Teams like Cincy and WVU are being talked up as near-tourney locks because they have .500 records in the BE despite not beating anyone of not outside of it and, frankly, accumulating most of their in-conference wins against other bubble teams and the dregs. Yet you have a team like UM, playing really good basketball, on the outside looking in despite having virtually the same resume. Heck, teams like St. John's, which have a couple of big-name wins but also some horrible losses (Fordham? St. Bonaventures?), are already being seeded as high as #4/#5 despite the fact they could easily finish with 11-12 losses heading into the tournament. Again, these teams won't ever make a difference in the grand scheme of things, but this over-reliance on a conference shorthand for figuring out who gets in and who doesn't is the one area I wouldn't mind seeing some change.
partly (mostly?) because of where they start. The difference between a #7 and a #8 is the difference between playing a #1 in the second round and playing one in the regional finals. (Maybe there's only one spot of difference on the S-curve, but maybe there's seven spots ... or more, depending on whether or not teams have to be flipped.)
The thing about using that as an argument for change, though, is that basically all evidence is anecdotal. I mean, what's the difference between 5th-seed Butler in 2007 and 5th-seed Butler in 2010? One lost to a 1 seed in the regional semis and one lost to a 1 seed in the finals, but it's not like I could say "see, Butler deserved better than a 5 seed last year" and be sure I was right.
So even if the mid-majors are getting shafted, it's not like we could ever be sure, and it certainly hasn't stopped them from getting to the finals (see also Indiana State). Besides, overseeding a mid-major could end up giving a power-conference team an easier path to the regionals, and that leads into your other point ...
Teams with double-digit losses certainly can have an impact on the tournament, although it is rare (1985 Villanova was 19-10 heading into the tournament, Cheatin' Larry Brown's Jayhawks were 21-11 in 1988) ... but if you look down a few notches from the 5s and 6s, you end up back where you started. Someone's got to fill out the tournament, and after you get through the 40 to 50 teams that clearly earned a bid (conference champs plus best of the rest), there's a pile of teams that are either mediocre power-conference teams or teams from weaker conferences with questionable resumes.
As little as I like the idea of taking 11 teams from the Big East (Marquette is 32nd on kenpom) or 8 from the Big Ten (Penn State is 51st) or whatever, the alternative is to take 4 or 5 from Conference USA, or 3 from the Colonial, or 2 from the Missouri Valley. Someone's got to fill out the 11s, 12s, and 13s, and no matter who those schools are, there will be similar schools on the outside looking in. (At least with a 68-team tournament, the 13s aren't likely to have made any kind of run ... obviously in a 32-team field or even a 48-team field, there were some really good teams that didn't get at-large bids.)
No, I totally agree - I get that they are just filling out the bracket at some point. But my issue is that teams from the BE that are by all measures mediocre get the conference "bump" for pulling off the upset against Pitt or UConn while teams with better resumes get overlooked because they lack those top wins. At least with the Utah Sts and Dayton's of the world, you have teams that are consistent winners on a smaller scale; with teams like Cincy, you have mediocre squads and I'm not sure they would be that much better in weaker conferences. Give me a team that has played well this year and could pull off the upset or two versus a meh team that will be chewed up by a 5 seed by 15.
I don't think anybody is seriously talking about them as a 4/5. They've been a 6/7 wherever I've looked, which is about right. Saying they have a "couple" of big-name wins is laughably dismissive. Georgetown, Notre Dame, Duke, UConn, Pitt? That's 5 teams that have spent much (or all) of the season in the top 10. Add to that road wins over bubble teams Cincinnati, West Virginia and Marquette and the fact that the committee weights how a team is playing later in the season, and it's not hard to see them on a 6/7 line.
Also, while the Fordham/St. Bonaventure losses happened, a lot can be attributed to the growing pains of a coaching change with 10 seniors. It's telling that their only freshman Dwayne Polee, who was one of their best players in the non-conference schedule, has disappeared as the seniors have adjusted.
The reason why the Big East gets so much love every year is because even though a team like West Virginia only played Vanderbilt, Minnesota, Duqesene, Cleveland St and Purdue...well that's a bad example I guess, because that's a legit schedule, tougher than Michigan's Kansas-Clemson-Syracuse and nobody else.
The Big East gets love because its good teams play ridiculously tough OOC schedules, which gives the middle-of-the-pack teams (which play tougher schedules than their counterparts in other conferences) an extra boost when they beat those teams.
As for the larger point, the advanced metrics fail to take into account the human element. Sure, a team that wins a lot of close games is not as statistically dominant, but they still had to win those games. There are always teams that consistently find a way to win the close one's, and there is something to be said for having the resolve to play through adverstiy that can't be quanitfied.
What I'm more curious about is what % of the people have any idea conceptually what the hell you are talking about? This is a great example of why blogs like this are great. Could you even think of posting a thought provoling article like this in a paper or Mlive? Huh?
I think Brian is 100% correct in that you have to award the wins over the substance of the team to award actual wins and not allowing Vegas's perception to dictate winners. It is great to have the data to peel apart resumes built on weak schedules and fortunate events for teams that are close, but in the end you need to award the wins even if they were sometimes fortunate.
Fans are so emotional the wins and losses taint everything that happens in a game and the press feed their anger or joy. I'm always amazed at how people react in the aftermath of a close game. The winners are often charachterized as gamers, winners and clutch. They often "wanted it more" or were "more experienced". The losers often get called chokers and losers. They are told they don't have that "killer instinct". Sometimes this appears to be true, but many more times it is just randomness.
Take the Mich-Ill game from this past year in football. Tate made a terrible pass that tipped up in the air that Hemmingway would probably come up with 1 out of 100 times. Think about the fan reaction after that game. While we were upset with the defensive performance, people fawned over the effort, being able to pull it out in the end, the defensive stop on the 2 pt conversion was talked about like we were Alabama stopping Penn St. in the Sugar Bowl.
Now go back to the MSU game in 09. The team fought back at the end and Tate throws another pass that gets banged in the air and this time it is intercepted. If that pass gets deflected a different way maybe we pull that game out and we beat MSU and we're the team playing in the Alamo Bowl and maybe MSU is left out of the bowl. Quite a different outcome for 2 teams that were actually pretty similar. Now go back and try to tell an MSU fan they only had the better team 2 of the last 3 years? It won't compute.
Our whole sports world is based on wins and losses to change that is just too radical. You think arguments are heated now? If someone tried using this method it would look like Egypt outside the NCAA offices.
I think Brian touches on the fundamental conflict of any advanced metrics or methods. Are they about predicting or rewarding, which can be two very different ideals and by definition will on occasion be at odds. If the goal is predicting then the free at the end of the are relatively meaningless. They are two potential points in a pool of 120 for the game (this is Wisconsin basketball we're talking about). If we are talking about about rewarding then the free throws make all of the difference. We are taught that a team that wins by a point is clutch and the team that loses by a point can't take care of business when it counts. In reality most teams fall on both sides of the equation at different times.
Although the work I have done on college football is about predicting, I am with Brian that the NCAA tournament should be all about rewarding. The wins and losses do matter, because when you look back on a season, it is all about the wins and the losses, not about the PAN or the Kenpom rating. 4 points on Jan 27 in East Lansing made all the difference.
Where I think college basketabll can use some help is on a more wonkish look at rewarding/ranking. The predicting and grading performance is well covered by Kenpom and the like but there is a serious gap in the selection process that needs to be acknowledged. Currently the tools used to reward/rank are very loose and at best proxies for accomplishment. Record vs. Top 50 RPI is good, but games versus teams in the 40s are very different than games versus teams in the top 10. RPI is directionally good, but for a tournament calibur team, playing a team ranked 100 should be an easy win, just like a win versus team #300. For RPI sake they are very different. The easy win vs. #100 keeps your RPI afloat whereas the blowout versus #300 drops you.
I have put together a work in progress solution to this that does an initial RPI pass and then awards points for each win versus teams that are of a certain threshold. The better the team or the bigger the margin the more points awarded. Game location and time of year matter. Losses work the same way but with no threshold. The worse the team the bigger the loss of points. I think there is a lot of work to do on the weights (Georgetown is my #2 team) but I think this directionally where things need to go.
In general I'm in favor of rewarding wins over predicting a pseudo-best team, but this can certainly be taken too far. As I tend to point out in any college football playoff discussion, if pure wins and losses were all it took, then the optimal path should be to abandon the Big Ten, join the Sun Belt, beat up on everyone and never play tough games, and then complain that "we did all we could" when it came time to figure out a national champion.
That said, metrics like KenPom should be largely ignored when it comes to tournament seeding. Metrics like Sagarin and the RPI should be preferred, because they account for wins and losses and, most importantly, strength of schedule.
Also: I don't think the rebound battle is won at all. At least not on a national level. The tempo-affected numbers are still dominating the discussion. UVA is never seen as a good rebounding team - believe me, this comes up all the time and commentators and opposing blogs constantly paint UVA as a team that rebounds badly - yet guess who leads the ACC in defensive rebounding percentage? That stat is still far more underground than it looks from a Gasaway-acolyte perspective.
I barely follow basketball before the tourney but year after year I've won various pools thanks to a strict adherence to KenPom's rankings. If that sort of metric were used to seed teams, it would cost me beer money.
I was wondering if the NCAA does value 8-11 seeds as their rankings show. 8's are better than 9's which are better than 10's
If I had my druthers I would rather be a 10 seed rather than the 8 or 9, realiziing the difference between them is very small. Thus I would have to beat a 7 seed but avoid the automatic 1 seed, who is probably an elite team. While some #2 teams are pretty flawed.
If the selection process is about rewarding seasons, than I would rather be rewarded with a 10 seed than the 8/9. And depending on the year I might want the 11, realizing that the difference between a 6 seed and 7 seed is minimal and that faces me with the 3rd seed, which usually arent very formidable when compared with the 1's and 2's.
I have all this data on my home computer from 1993, Ill have to check out percentages of 10's winning campared to 8/9 and their ability to get to the sweet 16. Although I don't think data from pre 2000 is very useful, with the early entries the game changed in a huge manner.
I like reading Gasaway's stuff, but he's just out to lunch here. Illinois will be a marginal seed and deservedly so. It's not just an RPI thing. This is a team that has lost nine of its last 16 games. Fine, they played well in November, but they've looked thoroughly mediocre from December onward. That KenPom could take a 17-10 (7-7) team that really hasn't played well for three-fourths of the season and rank them as one of the 20 best in the country makes his ranking system look seriously flawed. The selection committee is supposed to seed the teams based on where they are at season's end. Yes, those early-season games are part of the résumé, but they should not outweigh later struggles. From December 1 onward, Illinois is 11-9. I don't care if they made some late 3-pointers to make their losses close. You've got to win a lot more games than that to justify a good seed.