If I Was Designing a Poll...

Submitted by Seth on
This is partially a response to Black Heart Gold Pants, but I have enough SB Nation blogs already and don't feel like signing up for another rival's, nor waiting the requisite number of days to post, so if someone with a BHGP account wants to give him a heads up...

Also, warning: it's long. For those who like their baseball games in ESPN highlights, and their Melville in Cliff Notes, I put bullet points under each heading.

I wanted to generate a discussion on different polling strategies, and come to a consensus on what we expect from NCAA polls.

First, assumptions:
  1. Polls are not and will never be exact, even at the end of the season. There is no "right answer." Comparing over 100 teams with hideously unbalanced schedules with absolute accuracy is nigh impossible.
  2. We want polls anyway.
  3. A higher-ranked team is considered better than one ranked below it.
  4. Even if we produced that theoretical "perfect poll" there would be plenty of people who disagree on it.
  5. To a degree, there is an unstated general consensus that some teams are better than others, i.e. the masses can agree on certain things, like Florida is in the Top 3, and Michigan isn't.
  6. We will know more as the season progresses.
  7. The perfect poll would be the exact same in the preseason and at the end of the season, and still be entirely justifiable.
  8. Consensus is the ultimate goal -- corollary: fewer polls is better.

Resume Voting

  • Best At: Being a ranking on this year's performance that actually has its basis in this year's performance
  • Worst At: Providing a non-laughable poll before November
  • Primary Gripe: Small sample = useless

I have respect for resume voters because they have the same standard throughout the season. The downside is their polls take awhile to come together. Resumes grow more demonstrative only after there's experience on there. If I showed you the resumes of two 16-year-olds and you had to pick which one will end up making the most money by the time they are 50, we would be clueless.

Tate-Forcier At least it's a metric that makes some sense. But the wild variance defeats the purpose of having these polls in the first place: it's not to generate discussion, it's to provide a frame of reference for assessing the difficulty of beating one team or another. If Cincy loses next week, nobody's going to believe it if you say "oh wow, they beat the No. 1 team in the country."

It also, when it's used in concert with other voting metrics, has the unintended effect of compounding things like an overrated conference. A great example is the Big East a few years ago, when South Florida, Rutgers, Louisville and West Virginia took advantage of some early season flukes and an incredibly soft middle of the schedule to leap-frog each other to the top of the polls. This was the primary culprit in the short-lived appearance of USF at No. 2 in the BCS poll -- any ranking that has South Florida second in the nation in anything beside STDs is a travesty.

The upside of resume voting is that every week it gets more and more feasible. The BCS poll has been, in many of its incarnations, essentially a resume poll, which had the good sense to begin releasing data late in the season. Ultimately, resume voting is a justifiable system so long as it remains pure, but isn't very useful early in the year at providing a poll's primary objective: to provide a plausible ranking of NCAA's best teams.

Suggestion for improvement: Stay out of it until near the end. I want resume to determine who plays for the National Championship, but I'd rather not half-finished resumes affecting the mid-season polls. Other words: I'm with you if you wanna put '03 LSU and '03 Oklahoma in the Championship, but let's call '03 USC No. 1 right up until the end of the Rose Bowl, just so we're clear that Michigan is facing the hardest team in the country. Make sense?

Roster Voting

  • Best At: Pre-Season Poll that passes credulity test, Mid-season difficulty rankings
  • Worst At: End-of-Season Poll that passes credulity test
  • Primary Gripe: Not enough data, plays down this year's performance, which, like, isn't that what the poll is about?

Early in the season, this is most polls, including the AP and Coaches. Since no games have been played, it's a vote based primarily on how good the team was last year, with plusses for returning players, minuses for returning players.

Also incorporated in Roster Voting: "Barwis factor."This does a much better job of placating the masses in the pre-season. As the season progresses, however, as opposed to resume voting, this metric tends to disappear almost entirely, which I think is a major disservice to these polls.

Essentially, they fall victim early on to resume voting, rather than stick to their guns. This means big drops for teams as they lose. The downside, of course, is that if there's a consensus No. 1 team that loses its only two games early in the year, you'll see a major shift in that team's ranking -- big drop, steady incline, etc. This hurts the usefulness of the poll, since it changes its base metric mid-way through, essentially calling out its own initial justification.

A roster-based poll shouldn't be oblivious to the unfolding season, but it also shouldn't abandon its basis. Updates would be based on roster shifts, such as Oregon losing Dixon, Pat White losing a finger, or Michigan discovering one of its 4-star freshman recruits is already a more-than-serviceable and perhaps awesome college QB. This does not seem to generate much shift, but revelations abound in college football -- if someone pays close attention, we could end up with a fairly decent poll insofar as showing how much of a challenge each team should present.

Like resume polling, a roster poll is justifiable -- last year's performance, injuries, player statistics: these are all available metrics.

However, as the year progresses, such a poll would require A TON of input to remain accurate. Barring a UFR for every team, a roster poll seems unfeasible.

I can't think of a poll that keeps this metric throughout the season. I'd like to see one in the blog poll. It would wrack up a lot of Mr. Stubborns, and a few other outliers as other voters respond to season upsets, etc. And more importantly, while it's very useful at showing which team is the hardest to beat talent-wise early in the year, the more the season progresses, the more you'll have major incongruities, like a highly talented 4-loss team in the Top 5 while a lucky, scrappy, undefeated mid-Major team lingers at the bottom of the Top 25.

After about 8 weeks, a roster-voted poll would get lapped by the resume voters in placating the general populace, and take a lot of flack along the way. And at the end of the year, it would be totally useless.

Suggestion for improvement: This needs statistics, or it's as bupkis as pre-season polls. One day (I'm already looking into it) there will be UFR-like statistics kept for every player on every team. This will facilitate player and position rankings. And coaching ratings, too. And team rankings (offensive/defensive efficiency, etc.) The more info compiled and thrown in, the more this type of polling becomes feasible. Never going to be useful for who belongs in a championship, but I, for one, would find such a stat very interesting when having one team go up against another.

Predictive Voting

  • Best At: Pre-Season Polling
  • Worst At: BCS Selection, Precision
  • Primary Gripe: Factors are compounded

This is a straight-up attempt to get the final poll right in Week 1. A lot of AP voters fall into this trap, as evidenced by the justification they give for their preseason ballots.

e.g.

  • "I ranked Ohio State 1st because the lolBigTen is so weak the Buckeyes can knock off a freshman-quarterbacked USC, then tapdance to the BCS championship again."
    In this example, does this hypothetical assclown voter call Ohio State the best team in the country? No. But isn't the best team in the country supposed to be ranked No. 1? Umm....yes?

    Things tend to get untiedPredictive voting does have a strategy for keeping itself in line, which makes it somewhat useful, if still inaccurate, for mid-season and late-season polling. Essentially, teams are not down-rated at all when they lose something they were expected to lose in the fashion in which they were expected to lose it. They play against their expectations.

    Predictive voting is often used in concert with another metric, most often as a correction to Roster Voting ballots that generally have mid-Majors and giants in weak BCS conferences underrated. It generally has a lot of opportunity to look stupid as the season progresses, since the swings after unexpected wins and losses, in practice, are never truly in line with expectations. It also doesn't account for surprises, like Notre Dame losing to Michigan (not expected) but demonstrating that its offense is for real (i.e. they're not worthy of a major fall).

    Predictive voting is, however, not a bad way, conceptually, to achieve the goal of a preseason ballot that bears some resemblance to the end of the season. Of course, it's hideous at providing an accurate ranking of teams' actual ability. But it does a fair job of passing the eyeball test, and remains a well-used tool for college polling.

    Suggestion for improvement: Accuracy is the problem, because all changes are totally subjective. So use computers. Run 10,000 simulations of every game left in the season. This becomes the base prediction for each team, and should provide a solid framework for an initial season. Derivation from expectation down-ranks them or up-ranks them as the season progresses. Easier way: use the spread -- gamblers know what they're doing.

    Hype Voting

    • Best At: Wooooo!!! Tate Forcier is a god!!! I'm gonna go online now and see if the national consensus agrees! Woooo!!! They agree! We Rock!!!!
    • Worst At: NCAA Polling
    • Primary Gripe: Loose grip on reality
    OMG UNIVERSITY OF COLLEGE DEFEATED COLLEGE UNIVERSITY BY THREE TOUCHDOWNS -- BUMP BUMP BUMP, DROP DROP DROP.

    Accordingly.

    Don't worry, Domer, you'll be in the Top 5 again next AugustThis metric is among the least justifiable of the non-biased metrics, but is also rampant. Except it's also the easiest way to create a poll that readers generally agree with mid-season. It's basically rearranging teams each week based on carrots like "so-and-so deserves a 10-slot bump" or "Team X defeated Team Y so team X should go above Team Y."

    It passes the eyeball test, which is the whole point of hype voting. But it also generates a goodly chunk of the eyeball rolling from other pollsters who want something more concrete behind their polls.

    Suggestion for improvement: This basically comes down to faking it to get the results you wanted when solid metrics fail. I'm of a mind to either improve metrics or believe them before turning to pre-conceived notions out of convenience.

    Bias Voting

    • Best At: No. 3 Notre Dame @ No. 1 USC. TONIGHT on NBC!!!
    • Worst At: Honesty
    • Primary Gripe: Subversion of polling for selfish gain
    This is included because it happens. It's justifiable because it basically follows the suppositions of the masses. Bias serves a purpose beyond actual strength of teams, be it a coach who wants his opponents overrated to get into a BCS bowl game, or a rival underrated to keep him out, etc. It also includes sports/journalists/networks well-served by rating a major national program just over sliced bread. And bloggers who want some recognition for their beloved team, and the conference it plays in, etc.

    Brian uses the Coulter/Kos Award to keep the bloggers honest about their own teams, but I don't know how much he's watching what they do to their rivals and opponents. Just because you wear your bias on your sleeve, that doesn't mean you're immune from it (e.g. Coulter, Kos).

    Suggestion for improvement: Not that Brian hasn't said it 1,000 times, but this bears repetition upon repetition: MAKE ALL VOTES PUBLIC AND HOLD VOTERS ACCOUNTABLE.

    What's Best?

    Obviously, aside from a few resume polls, most polls are a combination of many of these metrics, all of which have major holes in them that strain credulity, over/under-reward scheduling and biases and notoriety, etc. At any given point during the season, and depending on the function a poll is meant to serve at that point in the season, there are better metrics than others.

    3928267709_4b97a78fe3 So let's go back to our suppositions, and pick out what it is we want from a poll at any given time:
    • Preseason: Closest as possible to the final poll, plus something that passes the eye test, i.e. readers can generally agree with it. For this, I suggest a combination of Roster and Predictive polling. Both are in dire need of better statistics, but the stats are out there already, and currently being employed to good effect by oddsmakers, who have a stake in getting it right (although they move their bets based on hype). We know who's on what team, and who will most likely be playing X amount of time at each position. We have a record of play for every year prior for every player on every team. We know the recruiting value of incoming freshmen, and we know the base value of freshmen to keep the recruiting value in perspective. As the season progresses, we have more records of play, which should make us more accurate. Transcribing this to a statistical value is not impossible, just very time-consuming.
    • Early Season: Still, I would stick to exclusively Roster and Predictive polls, for reasons shown above. I think one consensus poll would be best for this period.
    • Week 8 to Bowls: Start publishing a second poll, sort of like the BCS numbers, but not really, because it would be entirely Resume based (note: would also be used to determine playoff spots). This poll would show teams ranked by their resume If they were to win every game left on their schedule. It seems counter-intuitive, since, yeah, a lot of them play each other. But actually, that keeps it cleaner -- those that play each other get credit for doing so based on where each is at before the inevitable down-ranking of each other.*
    • End of Season: Publish a final Resume-based poll.
    * This system is kind of radical. No, I didn't say re-publish this poll. The idea is that by Week 8, we have a pecking order. For you to get into a 2-team (i.e. BCS) or 4-team, 8-team, or 16-team playoff, the teams ahead of you would have to lose, maybe twice if you're far enough down. This would radically change the college football season: you'd spend the first half trying to earn a ranking, and the second half would basically be a playoff (it would be a playoff in that fans would know the probable outcomes off-hand before each game).

    It would be awesome for fans, as major programs try to schedule each other early to build a high resume before Week 8. Then, as injuries deplete rosters and cold sets in, each team is in do-or-die mode every week, or else risk losing their place in line.

    Okay, I've said my piece. As with everything else I write, I ask you to please find as many holes in it as you can (except typos, which I plan to go back and fix when time allots).
  • Comments

    Seth9

    September 16th, 2009 at 5:27 PM ^

    First of all, predictive polling unfairly awards two groups of teams:

    1. Teams that are predicted to do well gain an advantage during the season and are given preliminary values that are adjusted, rather than simply evaluated.
    2. Teams that beat other teams predicted to do well.

    It is impossible for someone making a predictive poll to transition to a resume poll without throwing out all previous rankings and starting fresh.

    Second of all, predictive polling fails to rank teams in order of skill. I don't care if Notre Dame is going to go 10-2 with 10 wins over cupcakes and two losses to competent opponents. That simply does not justify a top-10 ranking at any point before or during the season.

    Thirdly and most importantly, predictive polling is only good during the preseason. Preseason polls are stupid. They should not be used ever. Personally, I feel that no polls should be released until week 4, when most BCS leagues start conference play. That way, at least, teams are awarded spots based on something other than pure speculation.

    Noah

    September 16th, 2009 at 10:18 PM ^

    "It is impossible for someone making a predictive poll to transition to a resume poll without throwing out all previous rankings and starting fresh."

    Absolutely true. The problem with predictive polls (as I think you noted) is that they reward or downgrade teams based on preconceived notions. "Well, Oklahoma lost, but I think that's a fluke, so I'm going to only drop them one spot." Or, "South Florida lost - I knew they were really awful! Out they go."

    Blind ranking would be an interesting experiment. Hard to do with ~120 teams, though. Give somebody a bunch of scores and team numbers or fake names and see what they come up with, and then see how that compares to existing polls.

    Noah

    September 16th, 2009 at 10:15 PM ^

    Here's an idea: no polls for the first 6-8 weeks of the season. After X weeks, start publishing a pure resume poll*. Early season polls get silly things like OkSU at #5, then crashing down because they weren't actually that good. The only real problem I foresee here is losing the hype from "OMG #1 OSU at #3 Texas!" I personally don't care about this, but there is probably a lot of money that would be lost somewhere.

    *Yeah, resume ranking has its biases too, but I think it's more rooted in reality (as Misopogon noted) than other polling methods. This idea solves its problems by simply not requiring a poll before there is a reasonable sample size (i.e. midway or so through conference play). The BCS poll does it, so why can't all polls?

    Ali G Bomaye

    September 17th, 2009 at 10:58 AM ^

    But's unrealistic to expect that no polls will exist for the first 6-8 weeks of the season. I think the best we can do is to say that no polls which factor into the BCS are allowed to poll for the first 6-8 weeks. Even if that happens, though, you know ESPN or somebody else is going to come out with their own "unofficial" poll to keep the hype train on the tracks, and the first BCS poll will be based in some part on this ESPN poll (especially the coaches' poll part, where it's unrealistic that any coach would evaluate a hundred other teams that they won't play).

    However, if you combined the delay with requirements that the votes be open and public and that the voters are able to put minimum standards of time and effort into their poll, you might minimize the effects of preseason bias.

    Noah

    September 17th, 2009 at 11:51 PM ^

    Yeah, the hype train is the big problem here. There's a lot of money coming out of those early-season rankings, and they're not just going to disappear that easily.

    The coaches' poll is a totally different bucket of fish - it's broken and needs to disappear. Conflict of interest like whoa. Sure, they're theoretically qualified, but from assistants filling out polls to joke votes for Duke, it needs to go.

    Njia

    September 16th, 2009 at 11:57 PM ^

    What you've proposed is basically the system that exists in NCAA Basketball (okay, its not an exact match, but its a reasonable one). Pre-season polls are interesting, but its not until there is a resume of work about mid-season that the polls start to mean anything. By the time the NCAA Tournament rolls around, the 64-team bracket is decided (mostly) by a combination of wins and losses, roster and prediction for success in the tournament.

    I think you pointed it out, but bias will be in every human poll. A system like you've proposed would at least produce something like a consensus and minimize hype.

    dr eng1ish

    September 17th, 2009 at 12:38 AM ^

    "Other words: I'm with you if you wanna put '04 LSU and '04 Oklahoma in the Championship, but let's call '04 USC No. 1 right up until the end of the Rose Bowl, just so we're clear that Michigan is facing the hardest team in the country. Make sense?"

    Not really. Is USC the #1 team or not? If they are then they should be in the championship game. If not then how can we call them #1?

    Seth

    September 17th, 2009 at 8:00 AM ^

    They're the "toughest" team.

    Understand, I'm talking about a ranking of which teams are the hardest to play, i.e. on any given day, which team would beat all the rest.

    However, the harder schedule, the greater victories, etc., is a second category altogether. Few people think LSU or Okla would have beaten the Trojans, so in our Roster/Expected poll, they are No. 1. On the other hand, our Resume poll has LSU and Oklahoma as the top two because they came to the same record through a harder schedule.

    To answer your question directly, no, USC is NOT #1, except in the "hardest team to beat" index, which, by the last week of the season, is understood to be at its least useful point anyway.

    Essentially, what I have done is create a two-poll system, which we should be used to. But instead of both polls being a miasma, each has a specific function. The Roster/Predictive poll provides a good pre- and in-season list of which team would beat which team. That's all it's good for. It has no bearing whatsoever on the final poll that determines playoff competition. The second poll is entirely Resume-based and isn't printed until the 8th week. That poll says which teams are having the best season. The team that has the best season in the end is the champion (note: which may not be the "most-talented" or "hardest to beat" team).

    GoBlueBalls

    September 17th, 2009 at 2:40 AM ^

    "3. A higher-ranked team is considered better than one ranked below it."

    This assumption I agree with, but with major upsets, the voting types have vastly different results. Take, for example, the annual USC-lost-to-a-team-like-Oregon State-or-Standford-OMG-LOL.

    Under resume voting, they "settled it on the field," and USC drops below the upset winner. Under a roster vote, despite the snafu, USC still remains the "better" team, and neither Stanford nor Oregon State will jump the Trojans in any poll.

    Upsets happen, but with resume voting, especially towards the end of the season and post-season, it is too strongly biased towards results and not the actual play on the field in determining the "better" team.

    Seth

    September 17th, 2009 at 8:10 AM ^

    Here's why I included pre-season polling in alternate-universe poll of polls:

    People like them.

    You can say "pre-season polls mean diddly squat." Well, yes, and no. Choose 25 teams at random, and then take any pre-season poll, and ask yourself which would have a greater correlation to the final end-of-season poll? See, there is some value.

    And that value has demonstrated itself again and again in the public's voracious appetite for pre-season polls when they are released. "Everyone" may be saying "I don't pay any attention to pre-season polls" but someone's out there reading them. There is a market. If we abandoned this market, someone else would fill it.

    I have no problem with pre-season polls. I just think they should have nothing to do with the final end-of-season poll. My system accomplishes that by using two separate polls. The one that begins in the pre-season is purely for entertainment and perspective. It has nothing to do with who plays in the end-of-year playoffs.