Predictive Win Model for Week 9

Submitted by kb on

With some spare time before the NCAA tournament this year, I developed a predictive model to pick basketball games for my NCAA bracket pool (figured it was better than me picking) using a descriptive discriminant analysis, which essentially assesses the variables that discriminates between categorical variables (in this case, wins and losses).  I experienced success with my NCAA basketball model (predicted 80-85% of the NCAA tournament games correctly), so I thought I would see how applicable it would be to college football.  So, for the last few weeks I have been validating the model week to week against the Sagarin rating and have had the exact same predictive accuracy (65-70%...not as great as it could be, but I’m in the process of improving upon the model) in terms of expected outcomes (winners vs. losers).  I figured it’s a good time to share with fellow MGoBloggers and I hope to make this as concise and readable as possible.  Apologies ahead of time if some of the tables don’t show up right, as I’m not too sure how to embed the tables within the diary as well as others.

Model Metrics:

After assessing a variety of team statistics from the past few weeks (SOS, win percentage, turnover margin, offensive yards per play, defensive yards per play, having a home game, and so on…you name it I have it and have looked at it) on a national level (Division 1-A - FBS only), the team statistics that best predict weekly winners and losers are, in order of importance:

  • Point Differential (avg points scored – avg points given up)
  • Offensive Yards Per Play
  • Defensive Yards Per Play
  • Win Percentage
  • Turnover Margin

Notable variables that were not important in determining weekly winners are 1) having a home game and 2) strength of schedule (probably too fluid of a variable right now, but could be predictive for bowl game winners at the end of the season).  

Big Ten Rankings:

The Big Ten rankings for Week 9 are below.  All of the variables in my model are presented in z-scores (-3 to 3) that were computed on a national level, with the higher the score the better for variables for which positive results are better (offensive yards per play, win percentage, turnover margin, and point differential).  For the lone variable (defensive yards per play) that is inversely related to winning, having a lower value is better.  The variable PREDSCOR is the output of the model, and the game winner is determined solely by the higher of the score between the two teams.

Big Ten Predictor Scores with Variables
TEAM PREDSCOR Sagarin OFFYRDS/Play DEFYRDS/Play POINTDIFF TOMARGIN WINPERC
Ohio State 3.97 88.73 .88946 -1.92245 1.74208 1.98148 1.33865
Michigan State 3.40 86.74 1.58746 -.95043 1.00894 1.61454 1.83693
Iowa 2.65 85.01 .88946 -1.04904 1.10881 1.61454 .69799
Wisconsin 2.02 85.22 1.00389 -.18971 .91920 .14678 1.33865
Michigan 1.21 76.49 2.15958 .99363 .35396 -.58711 .69799
Northwestern .43 69.53 -.11748 .27517 .30257 .69719 .69799
Illinois .26 80.67 -.55229 -.72503 .29244 -.40364 .12853
Penn State -.05 72.87 -.24334 -.24606 -.05929 -.22016 .12853
Purdue -.88 66.60 -.95278 -.27423 -.56590 -.22016 .12853
Indiana -.97 63.14 .06560 1.54303 -.16279 -.22016 .12853
Minnesota -2.46 57.24 -.20902 2.10653 -.73671 .51372 -1.65106

Notes:

  • My model does have us ranked a little higher, and Penn State a little lower than Sagarin.  Sagarin indicates this game should be closer, while my model says there is more separation between Michigan and Penn State
  • Illinois is ranked lower
  • We’re in the middle of the pack in Big Ten (where we expected we might be)

Predictive Model Results for Week 9:

Michigan (1.21) at Penn State (-.05) = Michigan

Michigan State (3.40) at Iowa (2.65) = Michigan State

Northwestern (.43) at Indiana (-.97) = Northwestern

Purdue (-.88) at Illinois (.26) = Illinois

Ohio State (3.97) at Minnesota (-2.46) = Ohio State

 

Enjoy!

Comments

kb

October 24th, 2010 at 1:45 PM ^

according to the models predicting weekly winners and losers.  I thought it would be related to winning, but it didn't turn out that way. If you think of all the home teams that lose home games on a weekly basis (e.g., Texas vs. UCLA and Iowa State, highly ranked teams at weaker opponents), it makes sense.

tasnyder01

October 24th, 2010 at 2:18 PM ^

I remember this in stats last year (yes, I'm a sophomore in College.  Don't laugh because my life is better than yours.....<---that's humor.  I had 4 midterms last week.)  that I z scores were (-3,3)  Why does your PREDSCOR have 2 teams with < 3 scores?

kb

October 24th, 2010 at 2:22 PM ^

isn't a z-score, so it's not a -3 to 3.  It's a score score that the model outputs given the weights on the predictor variables. Hope your midterms went well - I don't miss exams for a minute.

ebv

October 24th, 2010 at 2:56 PM ^

Nice analysis. Can you give a few more details on how you assessed the team statistics. Sounds like its some sort of linear classifier?

kb

October 24th, 2010 at 3:20 PM ^

It is a classifier.  Essentially I've been taking team statistics from the NCAA football stats page: http://statistics.ncaafootball.com/merge/tsnform.aspx?c=ncaa-football&p…

I took the team statistics from the end of the previous week (e.g., week 7) and recorded the actual win/loss for the next week (e.g., week 8).  I dumped the data into SPSS and used a discriminant function analysis, which analyzes how continuous variables (e.g., yards per play) predict dichotomous variables (e.g., who wins and loses).  The statistical test gives you weights that indicate how important variables are in predicting a dichotomous outcome, and then I applied those weights to team statistics at the end of the next week (e.g., week 8) to get a predictor score for the upcoming week (week 9).  Along the way I had to standardize variables to put them on the same scale (e.g., win percentage and yards per play are on different scales).