Interesting to say the least. Temple in the top 25 made me laugh, but it's still fairly decent as an overall rating system outside of the top/bottom 10-15%
KRACH applied to Division I-A college football
A few years ago I applied Ken's Rating for American College Hockey (KRACH), or Bradley-Terry statistics, to ACHA club hockey teams. At the time, participants for the national tournament were determined by an opinion poll, and there wasn't enough interplay for that to be meaningful (sound familiar?).
In an earlier post here, I intimated that I'd like to see someone crunch the numbers as a mechanism for rating Division I-A college football teams. It's something I've been thinking about for quite some time, just to see what would happen. So tonight I threw something together.
"The KRACH rating system is an attempt to combine the performance of each team with the strength of the opposition against which that performance was achieved, and to summarize the result as one number, a "rating", for each team. The higher the rating, the better the team."
"Interpreting the ratings
The ratings are given on an "odds scale": that is, if team A is rated at 400 and team B at 200, team A is reckoned to have odds of 2 to 1 of defeating team B when they meet (since 400 is twice 200). Equivalently, team A is reckoned to have probability 2/3 of defeating team B (since 400/(400+200) is 2/3).""There are two things we need to check, to make sure that the rating system is sensible:
- If you win more against the same opposition as another team, your rating will be higher.
- If you have the same record as another team, but against tougher opposition, your rating will be higher."
So, I took the season results from the official NCAA page. I excluded results against FCS competition, as a matter of principle.
One caveat here - I haven't yet worked out what to do exactly with undefeated and winless teams. This will become meaningful at the end of the season if there are multiple undefeated teams (I'm not sure I really care about the winless teams). While I sort that out, I've done the following:
- Verified my calculated rating by calculating the predicted number of wins;
- Determining a percentage difference between the predicted and actual number of wins;
Without further ado, the first KRACH rating for Division I-A college football:
|64||North Carolina State||0.561||2.000||2||0.017|
|73||Middle Tennessee State||0.363||5.999||6||0.022|
|88||San Diego State||0.147||2.999||3||0.019|
|111||New Mexico State||0.009||2.000||2||0.017|
|119||San Jose State||0.001||0.006||0||0.000|
So we have odds of 26 to 1 against OSU, and 20 to 1 against Wisconsin? That doesn't sound good...
I'd like to place a wager.
Outside of some previously admitted need of finessing of the undefeated team numbers, I like this quite a lot. The top 25 is surprisingly defensible.
And it loves the Pac-10, which makes me feel warm and happy inside.
What I'd really like to see is college football bloggers get together and instigate a true open-source computer rankings project. It's crazy that the computer programs used for the BCS are not open-source.
I think the reason that they aren't is that there would be too much criticism on weighting. They would tweak it year to year and it would cause more controversy because someone would go back to the previous year with the new formula and show that team #3 should have actually been playing for the championship.
Some things need to be a black box.
project is you could have different versions, all such that people (meaning people who know math and can read a computer program) can understand how they work. Or, better yet, you make it so the user can set the parameters, the "weighting" as you put it. Sure, there would be a lot of arguments and disagreement about what to do, but that's the beauty of the open source movement -- anyone can splinter off and build a better mousetrap, as long as they keep it open so everyone can benefit.
The concept of the BCS is flawed for a variety of reasons, but one of the most problematic is that it is not transparent. Do any of the computer programmers "tweak" their weighting from year to year, or even mid-season? Surely they do. But we know nothing about it. Some are more transparent than others in explaining their approach, but I don't think any of them have actually opened their code -- it's all proprietary.
Actually, one of the six BCS computer rating systems is open-source: Wes Colley's system.
I used the iterative version of the ratings described in his paper to re-create his ratings one year, and they were an exact match. Colley's website also allows you to add and remove games to see the effect those changes would have on the ratings.
As far as KRACH is concerned, there is one large flaw if you do not add in the imaginary team with 1 tie against each of the other teams--any undefeated team, no matter how easy their schedule is, will be rated ahead of any team with one loss. If you add in the imaginary tie, this problem goes away.
Let's say Team A is 10-0 against the bottom 10 teams in the nation, and Team B is 9-1 against the top 10 teams in the nation. We all agree that, in the absence of any other information, Team B should be rated ahead of Team A, right? If you don't add in that imaginary tie game, then KRACH would put Team A ahead of Team B.
When given the ultra-reliable eyeball test, Alabama looks stronger than Florida or Texas. I know Texas has a great defense, but Alabama just seems to manhandle people without flashiness, but with all of the consistency I'd like to see out of a true #1.
There is one problem with this. An undefeated team automatically has the highest possible score, so Hawaii ends up in the BCS Championship game.
This also leads to easier and easier schedules instead of harder and harder schedules for elite teams (teams not expecting to go undefeated would actually benefit from playing tougher schedules as shown by Temple). To apply this to football, it should probably try to evaluate how you played in wins as well.
Not to mention the top 6 teams all have the same score. How do you pick which is the best?
This is not meant to knock the work put in by the OP, but more as a point of constructive criticism to improve the ranking.
While it looks so in this analysis, it's a direct result of my failure to properly handle undefeated teams and a lack of a wins and losses comparison between the top teams.
Next week I'll either add the fictional tie team (which I'm loathe to do from a purist perspective) or calculate the round-robin winning percentage (RRWP) and strength of schedule (SOS) to go along with the infinite rating for undefeated teams.
One caveat here - I haven't yet worked out what to do exactly with undefeated and winless teams. This will become meaningful at the end of the season if there are multiple undefeated teams (I'm not sure I really care about the winless teams).
I wrote a KRACH program a few years ago, and what I did for the undefeated/winless teams was create a fake "tie team", and then have every single team play that team and have a tie. I had built my program off the description at USCHO, so it already handled ties. This had a bit of a fudge factor, but you need to make sure everyone has greater than zero wins and losses unless you have figured out how to divide by zero. I also made just one "I-AA" team to use for whenever someone played against a team from the other division because I didn't want to have to keep track of all those teams too.
It was nice in the middle of 2006 when I could use it to confirm that Michigan was better than their ranking in the polls, but in the following years keeping up the spreadsheet was too big a pain just to further my misery.
neutral field ~21 point spread for us against OSU. Other estimators have us nearer to a 50% chance at 6 wins, where this method makes like 10%.
I think it would work better if point margin were used instead of a win/loss binary.
Also, the Krach Dude mentions in a link on his site what his solution for the perfect and perfectly futile teams are, which is to average in a tie (.5 wins) against a fictional perfect team for every team.
Your ranking for the unbeatens is actually backward - if you give them a fixed rating, the ones with the greatest discrepancies between predicted wins and actual are the ones that have played the toughest schedules.
As others have mentioned, the "usual" fudge factor is to throw in a tie against an average team, which gets rid of all the connectedness issues. The other option is to calculate RRWP (round-robin win percentage) by using the KRACH method within groups of teams that are fully connected in both directions (that is, every team in the group has a win chain to every other team) and assigning win probabilities when connections exist in only one direction or in neither (if one team has a chain to the other but not vice versa, that team is assumed to have a win probability of 100%; if they are not connected in either direction, such as two unbeaten teams, they are arbitrarily assigned 50-50 win probabilities). This is messy because you have to go ahead and determine victory chain groups, though. The fictional tie against an average team is probably much easier. (It may even work to give that game a lower weight, so it has less impact on the standings while still normalizing everything to get rid of ugly zeros and infinities.)
College hockey has about three times the number of games as college football and half the teams. There simply isn't enough information in the common opponent data set to produce meaningful numbers.
It is interesting, to be sure. The system just isn't designed for College Football.
But if you read what KRACH was designed to do, it seems to be exactly that - relate teams with disconnected results.
The system wasn't 'designed for' college hockey; it was applied to college hockey because it fixes some of the problems with traditional 'rating' systems.
KRACH is designed for a league with approximately the number of games and teams as college hockey. It is not functional for college football because there aren't enough non-conference games and there are too many teams for KRACH to be successful. KRACH was designed with the assumption that being undefeated at the end of the season would be a miraculous event and is reliant on a larger sample size than exists in college football to assign probabilities.