Graph Theory Division I Ranking

Submitted by joeyb on

First of all, I should thank Coach Schiano for the idea for this ranking system. I've taken the concept from his diary, made a few tweaks, and applied it to all FBS and FCS teams. If you need an explanation of what is happening, I suggest reading there first.

So, the first thing that I did was I loaded all of the game results thus far (excluding last night and this night) into a database. Then, I was able to write a script that built paths between every combination of teams with the lowest possible value. So, MSU beat Wisconsin and Wisconsin beat OSU. The first step is to create paths of length 1 for those two games. After that, I can calculate the value of the path from MSU to OSU and then create a direct path of length 2 between them. Iowa's path to OSU is 3 and so on.

Once I had calculated the minimum value for all possible paths, I averaged out the values and had a very redimentary ranking. This is the exact system that was used in the original diary to rank the teams of the Big Ten, but now we have expanded the teams to compare.

Original, Unweighted, Without Record
Alabama
Auburn
LSU
Missouri
Oklahoma
Arkansas
TCU
Texas A&M
South Carolina
Oregon
Navy
Nebraska
Notre Dame
Michigan State
Ohio State
Stanford
Wisconsin
Oklahoma State
Boise State
Florida State
Michigan
North Carolina State
North Carolina
Utah
Arizona

As you can see, this isn't a very accurate reflection of the best teams in college football. It took me a few minutes to figure out what was going on, but it became clear when I realized that Michigan was on there. The problem is that this system strongly favors a diversified schedule. The more teams that you beat that don't play each other, the better chance you have of getting 2 or 3 length paths.

The first thing that I tried to do to fix this was to weight the games. Instead of automatically giving a team a path of length one for a win, I started dividing that by the number of scores (8 points) that a team won by. So, winning by 24 points, or 3 scores, would give you a path length of .333 over that team. This works out really well because it benefits teams that win by 2 or 3 scores, but it doesn't benefit teams too much for going beyond that.

Weighted, Without Record
Oregon
Stanford
Missouri
Oklahoma
Nebraska
South Carolina
TCU
Alabama
Texas A&M
Ohio State
Boise State
Wisconsin
Oklahoma State
Auburn
Florida State
California
Notre Dame
Iowa
Arkansas
Michigan State
Navy
Arizona
Virginia Tech
USC
Nevada

This seemed to get me a lot closer to where we want to be with a poll, but there are still some issues. How can Auburn be ranked 14 and Missouri be ranked 3? Well, now there is too much weight on winning strong games. But wait, if that's the case, then why isn't Wisconsin ranked in the top 10? That's because all of their blowouts came in the Big Ten. Wisconsin looked like a pretty bad team at the beginning of the year because they won some very close games against lesser opponents.

The only way that I could figure to solve the problems with overweighting is to add another weighting component, which is actually pretty obvious. I decided to add a winning percentage multiplier at the end. Originally, I figured that the concept of graph theory would account for winning. What it really does is account for beating the right teams, i.e. it is the strength of schedule calculation. The reason that I went with a winning percentage is because I don't want to give an advantage to teams playing Hawaii or in championship games, so raw wins was out of the question. I also need a way to penalize a team that loses a 13th game. The only way to do this is to do a winning percentage. 13-0 and 12-0 are now 1.000. 12-1 and 11-1 are only .006 apart. This gives a slight benefit to teams playing an extra game, but also makes sure to penalize properly for losses.

  Weighted, With Record
1 Oregon
2 TCU
3 Stanford
4 Auburn
5 Ohio State
6 Boise State
7 Wisconsin
8 Oklahoma
9 Missouri
10 Nebraska
11 Michigan State
12 Nevada
13 Oklahoma State
14 Arkansas
15 South Carolina
16 Virginia Tech
17 Alabama
18 LSU
19 Texas A&M
20 Utah
21 Florida State
22 Northern Illinois
23 Navy
24 West Virginia
25 Hawaii
29 Delaware (Highest rated FCS Team)
31 Jacksonville State (Beat Ole Miss)
36 Notre Dame
39 Appalachian State
44 Florida
45 Connecticut
52 Michigan
85 James Madison
110 Massachusetts

As you can see, this looks like a real ranking now and it won't automatically place an undefeated team ahead of a 1-loss team. I'm pretty excited about the outcome, because this is actually pretty comparable to the polls that are out there. How comparable?

My Poll R BCS R D AP R D Coaches R D Harris R D Computer Average R D
Oregon 1 Oregon 2 1 Oregon 1 0 Oregon 1 0 Oregon 1 0 Oregon 2 1
TCU 2 TCU 3 1 TCU 3 1 TCU 3 1 TCU 3 1 TCU 3 1
Stanford 3 Stanford 4 1 Stanford 5 2 Stanford 5 2 Stanford 5 2 Stanford 4 1
Auburn 4 Auburn 1 3 Auburn 2 2 Auburn 2 2 Auburn 2 2 Auburn 1 3
Ohio State 5 Ohio State 6 1 Ohio State 6 1 Ohio State 6 1 Ohio State 6 1 Ohio State 9 4
Boise State 6 Boise State 11 5 Boise State 9 3 Boise State 10 4 Boise State 10 4 Boise State 14 8
Wisconsin 7 Wisconsin 5 2 Wisconsin 4 3 Wisconsin 4 3 Wisconsin 4 3 Wisconsin 7 0
Oklahoma 8 Oklahoma 9 1 Oklahoma 10 2 Oklahoma 9 1 Oklahoma 9 1 Oklahoma 6 2
Missouri 9 Missouri 12 3 Missouri 15 6 Missouri 14 5 Missouri 14 5 Missouri 10 1
Michigan State 11 Michigan State 8 3 Michigan State 7 4 Michigan State 7 4 Michigan State 7 4 Michigan State 11 0
Nebraska 10 Nebraska 13 3 Nebraska 13 3 Nebraska 13 3 Nebraska 13 3 Nebraska 15 5
Nevada 12 Nevada 17 5 Nevada 14 2 Nevada 17 5 Nevada 15 3 Nevada 17 5
Oklahoma State 13 Oklahoma State 14 1 Oklahoma State 16 3 Oklahoma State 15 2 Oklahoma State 16 3 Oklahoma State 12 1
Arkansas 14 Arkansas 7 7 Arkansas 8 6 Arkansas 8 6 Arkansas 8 6 Arkansas 5 9
South Carolina 15 South Carolina 19 4 South Carolina 18 3 South Carolina 16 1 South Carolina 17 2 South Carolina 18 3
Virginia Tech 16 Virginia Tech 15 1 Virginia Tech 12 4 Virginia Tech 11 5 Virginia Tech 12 4 Virginia Tech 20 4
Alabama 17 Alabama 16 1 Alabama 17 0 Alabama 19 2 Alabama 18 1 Alabama 13 4
LSU 18 LSU 10 8 LSU 11 7 LSU 12 6 LSU 11 7 LSU 8 10
Texas A&M 19 Texas A&M 18 1 Texas A&M 19 0 Texas A&M 18 1 Texas A&M 19 0 Texas A&M 16 3
Utah 20 Utah 20 0 Utah 21 1 Utah 21 1 Utah 21 1 Utah 19 1
Florida State 21 Florida State 21 0 Florida State 20 1 Florida State 20 1 Florida State 20 1 Florida State 22 1
Northern Illinois 22 Northern Illinois 25 3 Northern Illinois 24 2 Northern Illinois 23 1 Northern Illinois 24 2 Northern Illinois 25 3
Navy 23 Navy     Navy 30 7 Navy 31 8 Navy 29 6 Navy    
West Virginia 24 West Virginia 24 0 West Virginia 23 1 West Virginia 24 0 West Virginia 23 1 West Virginia 24 0
Hawaii 25 Hawaii     Hawaii 25 0 Hawaii 27 2 Hawaii 28 3 Hawaii    
UCF 28 UCF     UCF 31 3 UCF 25 3 UCF 27 1 UCF    
Arizona 30 Arizona 23 7 Arizona 26 4 Arizona 26 4 Arizona 25 5 Arizona 23 7
Mississippi State 33 Mississippi State 22 11 Mississippi State 22 11 Mississippi State 22 11 Mississippi State 22 11 Mississippi State 21 12
    Average   2.92     2.93     3.04     2.96     3.56
    Mode   1     3     1     1     1

What you have here is my ranking along with some of the more common rankings you will see. I didn't think all of the computer rankings would fit in this chart, so I just took the average of them. The numbers to the right of the poll are the ranking of those teams in the poll and then the difference between that ranking and my ranking. The BCS poll does not show beyond the top 25, so I can't compare those teams to mine. At the bottom, you will see the average difference and and the most common difference between the polls. The team that seemed to cause me the most troubles is Mississippi State. They alone account for a little less than half a rank in each of the averages.

So, I'm planning on doing another ranking next week after all the games have been played and another after all the bowls have been played. I'd also like to do conference rankings and go back to previous years and "resolve" controversies. By next week, I will have tweaked this a bit more to add in homefield advantage and hopefully perfected the formula.

I have one last ranking for you. This is the algorithm without the margin of victory used to weight the wins. This is essentially what would be submitted to the BCS because they don't allow points into the calculations. It's interesting that it places Auburn in first now just like the rest of the computers.

No Weights, With Record
Auburn
TCU
Oregon
Michigan State
Ohio State
Stanford
Wisconsin
Boise State
LSU
Nevada
Missouri
Oklahoma
Arkansas
Nebraska
Oklahoma State
Utah
Virginia Tech
Alabama
Texas A&M
South Carolina
Northern Illinois
Florida State
Navy
Jacksonville State
Tulsa
(47) Michigan

Comments

Mitch Cumstein

December 4th, 2010 at 8:42 AM ^

Big playoff game today in my backyard (literally, I live across the street from the stadium).  I'm not going though, b/c I don't really care about their team at all, but I thought it was interesting that they're ranked the highest out of FCS teams.

Also, very cool post. I look forward to reading about the rankings after the season and about past years.

SmithersJoe

December 4th, 2010 at 10:05 AM ^

Question - did you choose the weighting factors in order to get your ranking closer to "eyeball-worthy", or was there a mathematical reason why you chose the numerical weights you did (ie 24 points, 0.333, etc.)?

This isn't actually a criticism - it's perfectly valid to optimize the result by varying the inputs, but you would probably have to do this over multiple seasons and finetune the precise weights (similar to using a "survival" mechanism to find the best algorithms in a mathematical model).

A lot of work, I know.  Thanks for doing this much already - it's interesting to see.

joeyb

December 4th, 2010 at 10:28 AM ^

There are no handpicked weights. 24 points means you lead your opponent by 3 scores of 8 points. So, 1/3 = .333. MSU beat Wisconsin by 10 points, which is 2 scores, so the weight in that game is 1/2, or .500.

However, my next step is to account for homefield advantage by removing points for winning at home or adding points for winning on the road, which could be handpicked point values. I'm not sure if I'm going to use the standard 3 points, 4 points (half a score), or I am going to try to calculate the difference in average differences between home and away games, then use that. In any case, I would expect homefield advantage to take MSU's win down below 8 points, which would give them a 1 score lead and a 1 point weight for the game instead of .500.

So, yes, I did add parts of the algorithm to meet the eyeball test, but everything worked out as I expected it to, so I did not tweak things to get the values to be what I wanted them to be.

Waters Demos

December 4th, 2010 at 11:34 AM ^

Great work.  Also, great question by smithers. 

I also mean no criticism by this comment.  I'm not sure how much value the comparison to the actual polls adds to your work.  I mean it - I'm really not sure.  It could add a lot, or very little, or perhaps even detract from it some.  I just don't know.

What I mean is that you've, in a sense, ran a statistical analysis that, intentional or not, justifies common opinion on the topic.  Is that a good thing?  It reminds me of one prominent criticism leveled at the work of GF Hegel, i.e., that his work only went so far as to justify the prussian state in which he lived.  Is it reason discovering worthy grounds for common opinion after the fact?  Or reason that is derivative of common opinion?

Perhaps this issue I raise is speculative because we only have so many inputs, and based on these alone, it's impossible to know, for example, whether MSU is a "better" team than the likes of Alabama or Nebraska (doubt it!) as the rankings (and your analysis) suggest. 

I really don't know (and I'm not a statistician), but if you (or anyone else) have any views on that issue, I'd be interested.

treetown

December 4th, 2010 at 1:47 PM ^

Great work and thank you and Coach Schiano for sharing this idea with us.

Again with all respect to the amount of effort put in, it would be interesting to see the original path lengths and see how various modifiers would affect the results. I'm not in the math or stats field but it might be worth seeing how tweaking your modifier (the 2 or 3 score margin) affects the overall results. A general solution identifiying the key inputs may be of value not just to college football but perhaps to other sports - I'm thinking specifically of college basketball where there is always a huge debate at selection time about whether one conference's 22-5 team is worse or better than another conference's 18-12 team with a stronger strength of schedule.

Finally, as you noted this system works best in a closed system (e.g. league play) where the participants have played against the majority of the same opponents - may be a superior form of tie breaker outside of direct head to head. (e.g. this year with the 3 way jam with OSU, MSU and Wisconin)

Again, great work. Any chance you might have a image of the graph and nodes? It would probably be a pretty neat image.

joeyb

December 4th, 2010 at 3:35 PM ^

I was also thinking about doing it for basketball, but I'm not sure what numbers I would use to determine weighting. I'll probably attempt it after I get the algorithm nailed down. The graph would be pretty hard to do, but it is possible. I'll look into doing it in the future, though.

Waters Demos

December 4th, 2010 at 6:33 PM ^

Aw hell, I was reading your comparison graph wrong when I posted the comment above.  I now see, e.g., that there is significantly more deviation in your analysis from the polls than I first saw (and that MSU is behind Neb; chalk it up to hangover?  Idk). 

In light of a more proper reading, and in retrospect, most of what I wrote before is unwarranted. 

Apologies - again, great work.