i find this extremely interesting
By The Numbers - How it Works
For every down, distance and yardline I have a calculate expected value. The expected value equates to the average points scored from an average team in that situation.
*Example, 1st and 10 at your own 20, no situation has more data points than this one. Last year, this situation yielded an average of 1.57 points every time it occurred. Obviously, you can't score 1.57 points in a football game. If you had the ball in this situation 100 times, you would score 157 points. It could be a TD every 4-5 possessions or a FG every other possession or probably some mix.
Each play changes the that expected value and that value is then attributed to the player/players who were recorded on the play. Over the course of games and seasons these points add up, some positive, some negative and we begin to see a clearer picture of what value was added by what players/units.
But adding value isn't the same for all opponents. A total of +10 is a very impressive number, but its more impressive against a good team than a bad team. After all of the data is collected, every team's unit is rated on a per play basis. This value is then added or subtracted from every play that occurs against it.
*Example, a good rush defense averages -0.1 against it every time the opponent runs. They are playing a decent run offense that averages +.04 every play. If the net result for the game is a -5 on 40 carries, the adjusted results would be a -1 rating for the offense (-5 + 0.1*40 = -1) and a +6.6 rating for the defense (-[-5 - 0.04*40]) in my write-ups, positive is always above average and negative is below average.
So the essence of the metric is how many scoreboard points did the player/unit contribute vs average and accounting for competition.
Exceptions and Notes
- Plays with lost fumbles are removed from all numbers because fumbles are considered random and greatly skew ratings
- QB sacks are included for team passing metrics but not for individual players
- Garbage time is not included in stats. If a team is up by 4 TDs in the 3rd quarter or 3 in the 4th it is considered garbage time and no plays are recorded.
- Wide receivers have 2 ratings, a rating on balls caught (Value) and a rating on balls caught or on balls targeted at them (Value+) the two metrics tell two different things and I haven't figured out how to combine them. WR values typically run higher because of the lack of negative plays assigned directly to a WR.
- Performing on third down is huge, on third down you either make a first down and you gain big points, or your drive is over and you lose any points expected for the drive (unless in FG range). This is one of the big advantages of this system, it can reward/punish plays made on big downs appropriately
- Only games against 1A competition count. Games against 1AA teams are basically scrimmages with nothing good or bad counting.
- All data is pulled directly from play by play data hosted on the NCAA website. I load all the data into a SS, run a bunch of fancy formulas and then dump it into a database where I can run queries till I pass out or the boss shows up.
It is scary to put this in writing, but here are my goals.
Monday - Game Review
Tuesday - Big 10 Player Rankings
Wednesday - Big 10 Team Rankings
Thursday - Flex/Catch up if I missed a deadline
Friday - Game Preview
During the offseason I am looking for ideas to pull from my DB of plays to validate or refute conventional wisdom. Items such as, is momentum real on quick change plays? Examining 4th down convention. Etc, again, looking for ideas.
Ideas going forward
I am very open to ideas anyone has on how to improve what I pull, how its calculated or what I do with it. Also, I am working on moving from expected points to a win percentage calculator so that there is no need for garbage time gray area. Won't happen this year but hopefully next year I will have that added.