Your R-squared values are very low, as you indicated. Out of curiousity, what are your P-Values? (Can we have confidence in even this low level of correlation?)
it's a major award
(tl;dr? Skip to the Conclusion at the bottom)
After a lot of discussion on this site about how random turnovers are, I decided to look at them in more detail. My hypothesis was that, while turnovers as a whole may appear very random, individual components of turnovers might be much less random. For example, as has been discussed before, once a fumble is on the ground it appears to be very random who recovers it. But what if causing a fumble is not random at all? The randomness of recovering the fumble might still obscure that fact if you only look at turnover margin.
I decided to look at five components of turnover margin: interceptions gained, interceptions lost, fumbles when on offense, fumbles when on defense, and fumble recovery rate.
I used whole-year statistics and compared the change from one season to the previous, using a total of 6 seasons worth of data. In college football there are, of course, many factors that change from year-to-year, but if there’s very little luck involved, I would still expect to see a decent correlation from year to the next. For simplicity, I assumed a linear relationship between stats from one year and the following year, so the analysis used linear regression, a simple but reasonably robust model.
The R-Squared statistics (simply the correlation squared) gives us an understanding of how much the variability is accounted for by our model. In simpler terms: how much is success in the stat from one year accounted for by the success from the year before?
All data was obtained from www.teamrankings.com. Data was always rounded to the nearest tenth by the source—because interception and fumble frequency are fairly low, this rounding may have a larger-than-ideal impact on the results; if anything, I would expect that should in general impact the results negatively (make them appear more random).
I next look at those five components of turnover margin: interceptions gained, interceptions lost, fumbles when on offensive, fumbles when on defense, and fumble recovery rate. The R-Squared values are:
Interceptions Gained: 0.057
Interceptions Lost: 0.049
Fumbles on Offense: 0.016
Fumbles on Defense: 0.001
Fumble Recovery Rate: 0.003 (the correlation is actually negative)
The first result that jumps out at us is that interceptions appear much more repeatable/less random than fumbles. Interceptions gained and lost in the previous year account for about 5% of the success the following year, compared to under 2% for the number of fumbles on offense. Fumbles on defense and fumble recovery rate appear almost completely random.
(For those still unconvinced that fumble recovery rate is almost completely random, the best team from each year did decently at best the following year--Michigan was 1st in 2006 but 47th in 2007, which tied for the best performance by a returning #1.)
The previous analysis tells us a lot about how repeatable results from one year are, but it doesn’t really tell us about how much is skill vs luck: after all, stats in a college sport ought to vary a lot from year-to-year: there is player development, new players, potential coaching changes, different strengths of schedule, and many other factors.
To provide an optimal baseline, I also looked at offensive and defensive yards per game. Intuitively, that’s a statistic that should be very greatly influenced by skill (though there are certainly amble sources of other influence, including luck). This will provide context by helping us understand what kind of change we should expect to see due to year-to-year variance (player or coaching changes, player development, changes in strength of schedule, etc.) instead of due to randomness.
For offensive yards per game, the R-Squared value is 0.243, while for defensive yards given up per game, the R-Squared value is 0.275.
Roughly 25% of success being accounted for by the previous year’s success is not very high, but that’s not a surprise—again, there is lots of change from one year to another. What this is helpful for is to provide context: even if turnovers are very skill-based, we would still only expect an R-Squared of .25.
There are two ways to view the turnover margin numbers: the first is viewing them in isolation. Even the best component of turnover margin, interceptions gained per game, is not very repeatable: success one year accounts for under 6% of success the following year.
The second way is to view them in comparison to the yards-per-game stats. With this perspective, interception rates on both sides of the ball are a little under one-quarter as repeatable as yardage. If we assume that yards-per-game is heavily impacted by skill, interception rates are likely fairly impacted by skill as well. I would hypothesize that within a given season, teams that are good with interceptions on either side of the ball will be likely to continue being good.
Fumbles seem to be much less skill-oriented. Fumbles lost is less than 1/15th as repeatable as yards-per-game. Fumbles forced on defense and the fumble recovery rate are almost completely random. (What this really says is that almost all teams are roughly equally good, not actually that there’s no skill in forcing or recovering fumbles.)
With total yardage, since 75% of success remains unaccounted for by the previous year’s success, we’d expect that it’s made up of two different things: randomness and other factors. If offensive yards per game is, indeed, not very random, that means outside factors (returning starters, returning coaches/schemes, different strengths of schedule) will have a large influence. This is important—we may be able to take some of these into account to improve the prediction we’d get based on just the previous year’s results.
Likewise, while interceptions are not very repeatable overall, they’re still about one-fourth as repeatable as our optimum. In a very rough estimate, we might then guess that outside factors also have one-fourth the strength with interceptions. Thus, if up to 75% of yardage success is outside factors, then up to 18% of success in interceptions is accounted for by those factors (this is very rough, since the factors may be quite different, or at least have different impact). That would leave roughly 76% of even the most skill-based category as random (100% - 6% based on previous year – 18% based on outside influences). The same rough calculation gives 5% of offensive fumbles based on outside factors, and 93% based on chance.
In summary, there is definitely some repeatability in three of the five turnover factors, but even the best of those still has under 6% repeatability, and by a very rough estimate, is still 76% random.
In 2011, Michigan was 34th overall in turnover margin per game, with +0.4. That’s good, but not amazing, despite Michigan’s stellar fumble recovery rate.
There are three factors I’ve identified and tried to account for: repeatability based on the previous year, outside factors, and chance.
Statistical repeatability bodes poorly for the Wolverines, unfortunately: the two most repeatable categories were the two at which Michigan did worst: Michigan was 82nd and 89th in interceptions gained and interceptions lost, respectively. Michigan’s best two categories, fumbles on defense (28th) and fumble recovery rate (1st) are basically random. In the fifth category, fumbles on offense, Michigan was a decent 42nd.
The second factor is really a category: outside factors, which probably impact interceptions the most. This seems positive for the Wolverines: returning Denard and the defensive backs, plus a coaching staff (an outside factor that would have pulled last year’s number down). Michigan’s biggest loss is Junior Hemingway, who certainly bailed Michigan out a few times last year.
The last category is randomness, which appears to have a very large impact on even the most skilled category, and complete control over a couple of them, meaning any real prediction is fairly foolish. To be a little foolish, then, I’d guess that interception categories improve to above average (say low 40s), but overall turnover margin gets worse, dropping to the 50s. However, I have only slightly more confidence than I do when calling a coin toss.
Your R-squared values are very low, as you indicated. Out of curiousity, what are your P-Values? (Can we have confidence in even this low level of correlation?)
Good question. The three non-random turnover correlations and the two yardage correlations are significant (p < 0.01) even with a Bonferroni correction for the multiple tests.
The two random correlations are not significant (as expected).
Taking your approach a step further, you might try separating sloppily vs well-played games. Turnover margins could be more random in sloppy games and more skill-based and repeatable in others..
eg You might try looking at a SUM-Difference plot: ie graph the turnover margin, (G-L) vs (G+L).* The latter may be a crude proxy for the sloppiness of games. On this graph,also use not a dot for each point but instead a color or symbol that depends on the magnitude of the year-to-year repeatability of a statistic (eg (G-L) or even some composite, like (G-L)/(G+L)).
From these plots, you may get a better idea about whether teams that play sloppy games (hIgh G+L) have less repeatable (G-L) stats than teams that play conservative and skillful games.** Sometimes, this may be a function not merely of team skill but of the percent of games played under sloppy weather conditions. Controlling for these conditions, if at all possible with existing data, could further refine the results. You might then see a much higher year to year correlation in the turnover margins, in games with more ideal weather conditions. As a proxy for such conditions, you could even look at games in more or less temperate or rainy/snowy regions.
*where G=a particular type of turnover gained, L=lost.
**As you do such an exploratory analysis, of course, statistical significance testing becomes more problematic (even though it would already somewhat problematic with your current analysis, based on a search for correlations and multiple, increasingly detailed hypothesis tests).
Have you tried using autocorrelations to look at year-to-year independence? That might be a better way to test for randomness within teams over time.
EDIT: After realizing that there are three comments, all criticizing/asking for further analysis, I just want to point out that I like your post and am just genuinely interested in the topic. The idea of TO being random is something I've argued against on the board more than once, but admit I've never looked at the numbers myself.
perhaps rather than looking for year-to-year correlations, which even your skill-based comparators suggest are weak, why not look within seasons? Take the turnover data for games 1, 3, 5, 7, 9, 11 and see how it tracks against turnover data for games 2, 4, 6, 8, 10, 12 (or some other pairing that lessens the influence of conference vs non-conference games). Then, do the same thing for total yards, etc. This should go a long way toward reducing the impact of some of your outside factors.
That said, thanks for the detailed analysis. This is the first time I've seen someone try to separate out different contributions to "turnovers" and I think it adds a lot to the analysis to do so.
If anyone has a source of stats to do this, please let me know--roughly what you're suggesting would have been my ideal plan, but I didn't have the numbers.
Also, I would love to hear if anyone has suggestions for a better skill-based optimum than yardage gained/allowed or concerns that the stat is more random/less skill-based than I thought.
If I am wrong about yardage being a good optimum, the net effect would be to make turnovers more random: not only would the year-to-year repeatability be less in comparison to a better optimum, but it would also shrink my rough guess for the impact of other factors on fumbles. Both of those changes would increase the importance of randomness.
I'm no statistician, but I'm impressed. This may be a stupid idea, but would TFL be a better optimum? Seems like forcing turnovers could be related to "aggressiveness" of the D, which could be reflected in things like sacks and TFL as well as turnovers.
as total yards vary quite a bit depending on offensive time of possession, etc.
Thank you for this. It's nice to see some math back on the board.
If the game-to-game numbers aren't available, I'd be interested in seeing the same analysis, but for pro teams. Roster turnover has a much lower impact in the NFL, so it would be interesting to see how much skill plays into year-to-year interception stats as a comparison to the college teams.
there are a variety of additional ways to break it down further, I would imagine. Fumble is the ultimate result of a variety of different actions.
If you followed teams over a number of years, would that tell you something? I'm thinking of comparing Saban's teams versus--say--a whole pile of MOR teams, or LSU and AL against the rest of the SEC.
Of particular interest to me, though I don't know how you control for it, is the way some defenses swarm to the ball carrier or work to strip them of the ball. It looked to me like we were becoming very good at that come, say, Illinois last year, and I would imagine this can contribute to + TO%.
Sorry if any of this is obvious or already gone over; I know this has been a discussion here, but I haven't followed closely.
I question the usefulness of examining turnovers at this macro a level. I should think it would be fairly obvious that with different players carrying/throwing/defending the ball every one to four years, any examination of turnovers from year to year is going to LOOK random. How is that useful?
I would hypothesize that there is certainly a correlation of turnovers from year to year when we look at each player INDIVIDUALLY. As a logic check, do we really suppose that if Scott Mitchell had stuck around another year, he wouldn't have thrown a ridiculous amount of interceptions...because turnovers are random? Good luck in YOUR fantasy league dude.
Charles Woodson finishes top 5 in iterceptions every stinking year. How random is that?
I would further theorize that synergy between the center and quarterback plays a large, non-random role in turnovers. I submit as evidence the fact that one-legged Molk was our best option against Virginia Tech.
I would further theorize that if a player's offensive turnovers do not go down as a player's career progresses, there is a position coach who is not doing his job.
The exceptions of course would be turnovers to end the half, and turnovers when trailing by a significant margin. Obviously teams will commit fewer turnovers when leading, and more when they shift philosophies to higher risk/reward when trailing significantly.
But then you have Denard. He plays like he is down 2 scores with his Dreads on fire on every single snap of the ball. Sometimes that works out spectacularly. Sometimes not so much.
I guess I question how examining turnovers on such a macro level is useful. It seems mostly academic when you consider how much more predictive a study of the particular players playing would be.
This is what I was thinking when I read the diary. You shouldn't look at year-to-year changes at the team level, but rather at the player level. Some players are better than others at forcing turnovers (remember Derrick Johnson at U.Texas?)
This is real easy. Divide the pointspread on the game by 23. That is your projected turnover margin on the game. Is Michigan favored by 23 over Directional U? Then on average the good guys will be +1 in turnover margin on the game.
I did something very similar, looking at year-to-year dependence for fumble recoveries in 120 teams over a 10 year period. I ran a simple linear model, regressing fumble recoveries in year T on recoveries in year T-1. I also tried including "fixed-effects" (i.e. team dummies) to control for unobserved heterogeneity between teams. The results are pretty clear:
Last year's fumble recovery rate explains only about 2% of the variance in this year's numbers (R^2). Also, the coefficient on the lagged dependent variable appears to be *negative*
I also drew a neat picture that Brian used on the front page at some point.
The data and R code I used can be found here:
Here's the content of my email to Brian:
And the basic results:
> # Manual FE > mod_fe = plm(recovery_rate ~ lag(recovery_rate, 1), model='within', data=dat) > mod_pool = plm(recovery_rate ~ lag(recovery_rate, 1), model='pooling', data=dat) > summary(mod_fe) Oneway (individual) effect Within Model Call: plm(formula = recovery_rate ~ lag(recovery_rate, 1), data = dat, model = "within") Unbalanced Panel: n=120, T=4-9, N=1069 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -113.000 -5.590 0.113 5.620 88.600 Coefficients : Estimate Std. Error t-value Pr(>|t|) lag(recovery_rate, 1) -0.148795 0.032253 -4.6134 4.505e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 130790 Residual Sum of Squares: 127920 R-Squared : 0.021958 Adj. R-Squared : 0.019472 F-statistic: 21.2833 on 1 and 948 DF, p-value: 4.5054e-06 > summary(mod_pool) Oneway (individual) effect Pooling Model Call: plm(formula = recovery_rate ~ lag(recovery_rate, 1), data = dat, model = "pooling") Unbalanced Panel: n=120, T=4-9, N=1069 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -126.000 -5.840 0.208 5.880 107.000 Coefficients : Estimate Std. Error t-value Pr(>|t|) (Intercept) 51.308810 1.581618 32.4407 <2e-16 *** lag(recovery_rate, 1) -0.026893 0.030823 -0.8725 0.3831 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 147510 Residual Sum of Squares: 147410 R-Squared : 0.00071295 Adj. R-Squared : 0.00071161 F-statistic: 0.761256 on 1 and 1067 DF, p-value: 0.38313