I don’t practice Santeria. I ain’t got no crystal ball. I had a million dollars but I … I “spent” it all.
In an obscure part of Jim Mora's famous playoffs(?!?) presser, he gave the sports world the skinny on turnovers: "I don't care who you play--whether it be a high school team, a junior college team, a college team, much less an NFL team --when you turn the ball over 5 times...you ain't gon' beat anybody I just talked about. Anybody.” We all understand this via basic football intuition (ahem) but, stick around if you care to see if we can stick a number on that intuition.
Plenty of previous work on the subject has been done by many folks including myself. Football Study Hall recently conducted a study in similar fashion to how I’ve done it in the Blue Moon stuff and estimated the effect of per game turnover margin on season win percentage. FSH’s look lines up with the BMM, both suggesting that the gain on Season Win Percentage for per game Net TOM is about 100 basis points. The effect on overall record is useful but when watching a singular football game we’re not thinking about the whole season; we’re only thinking about the next few hours or so. How do the turnovers within a game affect the outcome of that specific game? To answer the question we’ll have to use math skills that go beyond grouping, counting, and arithmetic.
Soulja Boy Huey Lewis MC Hammer. Wha?
To answer the question at hand you need special math. In this situation you need to estimate probabilities because the outcome of a single football game is categorical (specifically binary) rather than discrete as in the case of full season wins. Herm Edwards gets it: “This is what’s great about sports …you play to win the game. [/Pitch Perfect Cumong, Man Glare]. Hellooo? You play, to win, The Game.” The point of sports is to beat Ohio State. Herm gets it. /Michigan orthodoxy
The special math is called Logistic Regression. It’s still a kind of linear regression but that regression is run through what is known as a link function to deal with the binary nature of the thing being modeled. This is done in all kinds of technical fields but for sports, um, investors this is a particularly nifty trick to have stashed next to your rabbit’s foot. The data for the model comes from NCAA.org as always. Sorry, no coefficients this time but I’ll show you a—
Here’s a useful way to think about this chart: suppose we were to play a Sunday morning game where I told you a team’s Final Turnover Margin and you had to tell me if they won the game or not—what would the payout odds need to look like for you to break even? This chart is the first step in answering that question.
Several features on this chart stand out to lend intuitive validity to the model. First there is neutral win probability at neutral TOM. Second, negative TOM hurts your odds, positive TOM helps them. Third, there are diminishing returns. By the time you get to +/- 3 in final TOM, the next turn over for/against you doesn’t affect win probability that much.
*DO NOT MISS THAT LINK. Grab a drink because its MC Hammer’s 15 minute (yezzir!) extended length 2 Legit 2 Quit video. The word epic gets tossed around a lot these days but it’s the only appropriate word to use here. BiSB, you’d dig it the most. It’s like a mockumentary / old school kung fu movie / ridiculous dance video. The hairdos, man. And the cameos: Marky Mark, EAZY E(!!!), Queen Latifah, Milli Vanilli, James M--F--in Brown in full regalia with full on wizard abilities, Hammer's Wang, Jose Canseco, Isaiah Thomas, Kirby Pucket, Jerry Rice, Ricky Henderson, Deion Sanders, Andre Rison, Roger Clemens, Roger Craig, Ronnie Lott, , and Jerry Glanville. And that’s not all of them. Epic, man. Epic.
What’s Wrong, McFly?
Here’s the rub though, actually there are two rubs. First, that curve represents a generic team facing a generic opponent and neither of these things actually exist. I’ve used this example before but its worth a reprise: the generic US household has something like 2.4 children in it, but show me a household with 2.4 kids in it and I’ll show you a crime scene. Real football games are played by real football teams and they’re not all created equal. That curve shifts and bends according to the strengths of the teams in the contest. For reference, the math says “Nick Saban’s Alabama” can survive a –3 TOM against the nameless faceless generic team before it’s a coin flip situation. Let that sink in for a minute. Personally, I think that might be an underestimate.
So what’s the second rub? It’s related to the first one, actually. Here’s where our man Marty McFly comes in. I broke a major rule of predictive analytics to create this chart, I gave the model knowledge of the future. That’s a no-no for models that are supposed to be predictive because, duh. Don't give me that look, I told you I was a sinner last time. Deal with it. In addition to Final TOM and Game Outcome, I fed the model an end of season strength rating as well.
That disclosure may spawn some skeptics and I welcome thoughtful discourse, but allow me to explain myself before you tar and feather me. I think we’re OK to do this for the specific goals at hand. Remember, the goal here isn’t to create a predictive model, it is to estimate as closely as possible the impact of Final Turnover Margin on Win probability. On the chart shown previously, you can’t make an evenly matched game a toss-up unless you know for certain that the teams are evenly matched, right? The final strength ratings serve as a discount mechanism to let the computer know “look man, we’re talking about Oregon vs. Colorado here…the Buffs are going to need a lot of help to have ANY shot.”
Here’s another and more specific example from the past but closer to home: Michigan vs. Toledo 2008. Going into the game, Michigan was a 17 point favorite. “This is Michigan vs. Toledo, fergodsakes.” Um, no, Biff, it was Michigan **2008** vs. Toldeo. If you had read the almanac you would know that Michigan 2008 couldn’t lay points on anybody. Why the hell did you risk your existence in space-time if you weren’t even going to read the damn thing?
(I can’t do Jim and Herm and not do Denny. Its the rules).
So, now that we know what that chart is and what it cannot be, what does it tell us? Well, it says that turnovers are kind of a big deal, bro. How big a deal? The first extra possession is worth 16% in Win Probability. Basically, you’d need 2:1 odds in our little game to bet against the team with +1 TOM at the end of their game . In fact, in the generic case, its a simple equation: y = 2^x.
Sans The KNOWLEDGE, we would significantly under estimate the required odds by an increasing amount with each step away from neutral. Yes, I did the math the right way too, don’t worry about it, it’s irrelevant.
News Flash: we lack The KNOWLEDGE at several junctures. First the curve needs to be adjusted according to the true strength of both teams. You wont know how good each team actually is until they are done with their schedule—and maybe not even then—so, you’ll always have an error in your estimation for one or both teams. That error is lethal over the long run.
Second, and this is a biggie, you can’t consistently predict Final TOM. Both teams are in active competition to cause and avoid turnovers. Sure, if there’s a significant mismatch between the two teams, then you might be able to get a good guess in. But then, the end effect of turnovers go down as the rating gap increases so…well, let it suffice to say that there’s an error which is convoluted within an error.
Taking Destiny by the Bit
[Author note: this bit requires further discussion, please share your thoughts.]
Before I wrap this up, I need to talk about one more thing. TOM is one of those things, man. It’s out of your control. Try as they may, the defense can not expect to to get turnovers. They can try to provide the conditions necessary for turnovers to occur but they cannot make them happen. If a QB makes good decisions, no interceptions. If the ball carriers are Mike Harty, no fumble opportunities. Even if they aren't Mike Harty, you *might* be able to force fumble opportunities but you can’t guarantee a fumble recovery. You can try as hard as you can and still come up empty.
The offense however…seems like the offense can expect to not ever turn the ball over. Don't throw a pick, don’t drop a live ball, out scrap a guy for a loose live ball if you do lose your mind and drop it. You have agency in those things even if your opponent is trying as hard as they can.
So, screw TOM. Put the onus on the offense to not turn the ball over and then see what happens…Let me show you another—
I think this chart is astounding. Basically, it says that a generic team can cough the ball up twice to a par competitor and not hurt it’s win probability in any significant way. Eliminate turnovers completely (again, generic on generic) and you can lay 3:1. Cough it up once and lay 3:2 (ish). Actually, what this really says is that the typical team gives up two turnovers in a game against an equally matched opponent.
Interceptions are the Worst
This is bogue to QBs but the data don't lie:
Again this curve shifts and flexes depending on several factors but that’s the generic shape right there.If you’re up against a par opponent, your QB is “allowed” 1 mistake before he puts the team in a bad spot. Generic-vs-generic, the team that throws no INTs, wins 75% of the time. Which team will do that? What if they both do that?
Absent from this analysis is the timing of the turnover which is of course critical to its specific effect on the outcome of a game (Anthony Thomas fumble v. Northwestern). If that’s what you’re interested in, The Mathlete is your man.
I write this often because its important to remember: football is not a math test. Your game thesis could be dead to rights down to the weather forecast and you’ll still feel the break, feel the break, feeeel the break (/Santeria) very often. Often the decision comes down to believing in things you don't understand and/or can’t necessarily prove—not guilty and innocent are different things. Failure to reject the null hypothesis is not rejection of the alternate hypothesis. The rooting interest often defies logic and reasoning but that's what makes it so damn entertaining to have.
Welcome back to football.
THE DIFFERENCE IN YARDS
Building a little on something Seth put in the last Dear Diary, I went back through the 2011 and 2012 seasons last night and took a look at yards per play and point differential, and as it turns out, they do correspond to each other rather well.
To be fair, 26 games is a somewhat limited sample, but it is telling enough that I think we can show statistically something that you probably would have guessed - if you’re typically gaining more yards per offensive snap than your opponent, the chances are that you’ll win. Further, the bigger that differential between offensive yards per play, the larger the point margin will typically be – good or bad, of course.
How good is the correlation? It is pretty good for football statistics actually. Indeed, the R-value for the correlation between point differential and the yards per play differential turns out to be R=0.85 in this sample. Actually, in these last two seasons, we are averaging 64 offensive snaps per game, which probably ranks towards the bottom of Division I, I would believe, but then again, we do get quite a bit out of them at an average of 6.20 yards per play overall.
A LITTLE SUMMARY ANALYSIS:
Here are a few summary statistics from the two seasons:
YARDS PER PLAY
Of minor note here is that the minimum value for total plays represents the rain-shortened Western Michigan game, and the minimum value for yards per play for Michigan actually is the 2012 Nebraska game, which…well, anyway. We somehow managed to run 82 plays against South Carolina in our latest bowling adventure, and the 9.04 yards per play is the 2011 ND game.
In those 26 games, the median YPP turns out to be 6.59 yards, and performance above and below this line is night and day really. Indeed, in two seasons, we are 5-7 when we fall below the median for yards per play. We have a perfect record when we have achieved better than 6.59 yards per play.
THE RELATIONSHIP CHARTED:
Here is the relationship between yards per play and point differential in graphic form –
So, it is indeed correct that yards per play is a fairly effective indicator of overall success on a game-by-game basis. It will be interesting to see what this season will look like in graph form. I will have them ready at some point.
Just a note: Writing your picks in the comment section is NOT a valid entry. You must enter your picks using the form below or at this link. Feel free to discuss your picks here, but you must submit the form in order to enter the contest.
After a great first year of Pick Six on MGoBlog, we had a bit of a sophomore slump. I was too busy to write the weekly recaps and never got a system going for the great volunteers who were willing to take over that part of the job. I did update all the standings (you can find them at this link) but most people probably didn’t see them.
User BaldBill was leading for the last few weeks of the season with a great entry. In 5 of the 6 groups he picked the highest ranked team. His only downfall was picking Wisconsin (who ended up unranked) instead of Clemson or Texas in Group C. One mistake was all it took and he lost by one point to NYC Blue. NYC Blue picked the second best team from 4 groups and the best team (Georgia and Ohio State) from the other 2. Interestingly enough, he was the only one in the top 11 to not pick Alabama.
As usual, the unranked team you pick is very important. The top 50 is dominated by people who picked Notre Dame and a few Texas A&M picks. Remember that this year.
With a fresh new football season, we get a fresh start and this year will be better than ever. There will be weekly recaps and lots of fun discussion about the AP poll. So enter this totally free contest.
Here's how it works.
1. We divide the top 25 into 5 groups of 5 based on the preseason AP Poll: 1-5, 6-10, 11-15, etc. For this year's poll, the groups are:
- A: Alabama, Ohio State, Oregon, Stanford, Georgia
- B: South Carolina, Texas A&M, Clemson, Louisville, Florida
- C: Florida State, LSU, Oklahoma State, Notre Dame, Texas
- D: Oklahoma, Michigan, Nebraska, Boise State, TCU
- E: UCLA, Northwestern, Wisconsin, USC, Oregon State
2. Before the season starts you pick one team from each group, plus one unranked team. You're trying to pick the teams you think will finish highest in the final AP poll (after the bowl games).
3. Each week we'll try to update and publish the standings in a spreadsheet so you can track the progress of your teams. You get 25 points for having the #1 team, 24 points for the #2 team, on down to 1 point for the #25 team. Unranked teams get zero points.
4. The winner is the person with the most points (i.e. the highest ranked teams) after the bowl season. The midseason standings are only for entertainment purposes. Only the final AP poll counts.
5. And the grand prize? I will personally give the winner 10 meaningless (and now non-existent) upvotes.
That's it for the Pick Six: short, sweet and simple. The entry form closes before the first kickoff, so get your picks in now. Good luck!
Pretty much what I predicted last year
Last year I published my first stock watch based on my preseason team ratings and schedules and compared them to the Vegas preseason projections to identify teams that I thought would be outliers from the consensus opinion on preseason predictions. While I had some mixed results, all three of Michigan’s main rivals were flagged as potential outliers and my numbers differed from most preseason projections.
Here are some quotes I wrote prior to last year’s season:
It pains me to admit it but this Buckeye team could be very dangerous…The Buckeyes are set up for Urban to get credit for an upswing they probably would have had anyway, but it will probably take some significant first year growing pains to keep Ohio from a great theoretical bowl game.
For once in their football life the Domers could actually be underrated heading into this season…If the bounces go Notre Dame’s way this season they have a shot to be a top-10 team. Their biggest hurdle is going to be a schedule that entering the season looks to be far and away the nation’s toughest…there are plenty of other reasons to be optimistic on the Irish roster.
I have no doubts the Spartan defense is going to be good. I just don’t think they are going to be great and I have major questions about the offense. With a new quarterback and nearly 90% of their receiving production gone, there is little history on their side that they can have a productive offense. Breaking in that many new players at skill positions has Sparty projected to be one of the worst offenses in the country this year, Le’Veon Bell or not. Their defense will keep them afloat but unless Michigan St breaks in a new crew on offense at an unprecedented rate, the offense will be this team’s limiting reagent.
I took the most heat for the MSU pick as several Spartans caught wind of this and told me how a first year quarterback and new wide receivers were nothing to be that concerned about and were highly offended about my prediction as one of the worst offenses in the country. 20 points per game later, I stand by my prediction.
After the big three, it wasn’t all sunshine and roses. I did pick Texas to contend for a national championship, Missouri and Tennessee to be mid-level SEC teams and Kansas State to fall back to the middle of the Big 12.
On the other plus side, I pegged LSU, West Virginia and Arkansas all dead on.
Overall the success was mixed but for Michigan’s three main rivals, I would put my preseason prognostication on them up against anyone’s.
This year my predictions for Michigan and its three rivals are dead on with Vegas heading into the season. Notre Dame should settle in to an average of the unlucky 2011 and the lucky 2012 (8.7 predicted versus 8.5 Vegas). Michigan State should see the offense get better and the defense get worse and compete with Michigan and Nebraska for the Legends Division title (8.2 versus 8.5). Ohio State should ride an easy schedule to double digit wins (10.7 versus 11) and Michigan is projected to another year of holding serve before the recruits start flooding into the starting lineup (8.6 versus 8.5). There isn’t any significant room between my predictions on these four teams and the Vegas preseason lines.
So who do I disagree on? Let’s look at the five major conferences.
The only title contender I have any major difference with the oddsmakers is in Wisconsin. My numbers are a big fan of new coach Gary Andersen and I have the Badgers nearly a whole game (9.8 versus 9) ahead of the Vegas number. Nebraska will have the inside track for the top record in the Legends Division thanks to an easy schedule. Michigan is rated as the best team, but consider the Huskers frontrunners thanks to a slate that avoids both Wisconsin and Ohio State.
Northwestern, Purdue and Penn State are the three teams I have the most disagreement on and I think they are all three overrated by at least 2 games.
No major differences for SEC teams. Alabama is obviously the favorite with Georgia and Texas A&M as my two leading contenders. Like last season, I still think South Carolina is a good but not great team.
2013 should be a fulcrum year for Texas and Mack Brown. After an amazing run in the 2000’s, the 2010’s have not been the brightest lines on Brown’s resume. If he has the capacity to turn it around, 2013 should be the season to do it. Texas’s roster is rated the highest of any team since the 2011 Alabama squad (based on recruiting rankings with upperclassmen weighted heavily). Several groups are also high on Texas, I have them projected at 10.6 wins, a full game clear of Vegas and everyone else in the Big 12.
By biggest sell team of the year is TCU. Vegas has them predicted at 8 wins and I don’t see them making it to bowl eligibility. Kansas and Charlie Weis could exceed expectations, I have the Jayhawks with an outside shot at bowl eligibility.
Like the SEC and the Big Ten, I think the Pac-12 has a clear-cut frontrunner. Even with the loss of Chip Kelly, I think Oregon is in line for 11 wins on average. I see USC as the biggest threat, I have them a game ahead of Vegas at 10.4 wins (in 13 games) versus Vegas’s 9.5. Stanford is projected at the same 9.5 but I have them as a distant third in the Pac-12 with only 8.3 projected wins.
Of the seven teams in the new ACC projected to win at least 7 games by Vegas, I am within a half game in my projections for all of them except Virginia Tech. Like TCU from the Big 12, I think Virginia Tech is vastly overrated this year and am only projecting them at 5.2 wins for the season.
As noted above, I am mostly in line with the Vegas line of 8.5 wins for this Michigan season, but barring a Gardner injury, there is definitely more upside to downside. Michigan has 8 games where they should be a solid favorite. At their projected level, odds are that one of them finds a way to get away, but if they can win all 8, that leaves coin flip games against Nebraska, Michigan State, Notre Dame and Ohio State. If Michigan is better than expected at all, those 8 games should move to virtual locks and make a double digit win season a very real season. Without a major change event, there is a very solid downside firewall in place for this season, at least if you think, like me, that Penn State and Northwestern are generally overrated entering the season.
I will start this diary by saying that, with the right data, it is possible to get the actual mean yards per play on offense (actually, if you really wanted to spend the time, it is right in the box score), but as most people generally don't more than skim when they look at box scores, it dawned on me that it might be possible to approximate yards per play from other information in the box score, and indeed, I believe it is.
So, what is the minimum information that you might need? Again, if you wanted to spend the time, you could glean everything you need right from the detailed stats, but what if you wanted a quick calculation?
I share this because it is something that I honestly had not thought of before when looking at relationships within the statistics, so if it is well-known (there are people that know far more about this than I), then I will apologize for the redundancy.
I did one of my standard data dumps for this one – I took 10 seasons of Division I offensive data by season and then began searching for a few things that correspond well to yards per play. I wanted to try and keep this simple, so I was hoping to see perhaps two other stats that might be good candidates. As it turns out, there are a couple – yards per carry and yards per pass attempt. Respectively, the R-values for each are 0.720 and 0.835. This is over a fairly large sample (n=1,189).
The next challenge was simply charting these and seeing how well they did in fact trend with one another. Below are some examples from the Big Ten:
So, they track each other fairly well. Indeed, they track each other well enough that it dawned on me that yards per play could be approximated by the average of the two statistics mentioned above, so this is became the next step. Could it be that offensive yards per play can be approximated by the average of yards per carry and yard per pass attempt? It would make total sense if this were true, of course, as these cover most offensive plays, right? It is important to note that the sum of yards per carry and yards per pass attempt correlates very well with yards per play – the R-value here is 0.930.
It seems you could get pretty close just based on those two numbers, at least typically. The average difference between estimated and actual yards per play on offense for all of the Division I data turned out to be only 0.12 with the mean error being all of 1.94%, so even though it is discount certain things which do happen in the course of a game, you can get a decent handle on how effectively the offense is advancing based on these numbers. It is important to note that the estimate tends to be over, but this is not necessarily accounting for things such as plays for zero yards, plays for negative yards (TFLs, sacks, etc…) the marked imbalance of some teams when it comes to rush vs. pass plays, as well as some other things.
Some comparisons from the Big Ten data specifically:
Again, this is something I never really thought of looking for within the stats, but to see that you can get a reasonable approximation of offensive yards per play from two other numbers in the event you didn’t know the exact number of plays run on offense for a team (or just weren't interested in a lot of math or a minimal amount of searching)
In any case, this data comes from historic season statistics, so the next step here is to test it at the game level, which I plan to do this year. It would also be interesting to see if a similar approximation could be constructed from yards allowed on defense, but I shall save that for perhaps the next diary.
Have I found anything particularly profund? Probably not. It’s a way to possibly estimate a number that, if you were studious and carefully studied the box score, you could simply calculate (or if you didn't have all the information at your disposal). This is more about an interesting manner in which different statistics correlate and can be applied, but those relationships are always interesting to discover even if they are not necessarily profound or novel.
OBLIGATORY (in honor of a fate my own cats will suffer in a few short weeks):
Heading into the 2012 season there was much concern over the lack of size of the interior DL. Spring Practice has us convinced that undersized Jibreel Black and hopefully breakout player Will Campbell would be manning those spots and we would survive.
Cut to 2012 Fall Practice and seemingly out of nowhere Quinton Washington emerged as the surprise starter. He went on to have a very good season and hopefully will continue this trend for 2013.
So I pose the question Who will be the surprise breakout starter/contributor for the 2013 season? Here are the candidates:
RB Drake Johnson
Spring Practice was the time for another RB to emerge as Toussiant was still on the shelf and all of the practice reps and carries were there for the taking. Aside from Justice Hayes grabbing the 3rd Down RB spot it didn't seem like anyone else had separated themselves. I was expecting to see more "OR" in the RB depth chart but surprise we see Johnson has used Fall Practice to clearly pass Rawls and the freshman. Speaking of the freshman I think this spot at the depth chart doesn't represent contribution ability but rather the best utilization of eligibility years. Touissant is the clear starter and Hayes should take the majority of 3rd Down plays, leaving only "breather" and "garbage" carries for the other RBs. Better to use Johnson than Green or Smith until and unless either of them show they are ready to take primetime carries.
LG Graham Glasgow
Glasgow had been in a battle with Miller for the Center spot for the Spring Practice and it seemed most of Fall Practice. Only recently did we start hearing Chris Bryant was going to start and almost just as quickly that he had been passed Glasgow. Hopefully Bryant still troublesome knee was not the reason Glasgow took the spot, but rather that once Glasgow got consistent snaps at LG he beat the guy would could have started in 2012. Hopefully this is not similar to 2011 when Barnum was the starter at LG but was injured, making us believe he would be good in 2012.
SLB Brennan Beyer
We have continuously heard praise for Cam Gordon's athleticism and playmaking ability while the only think I remember reading about Beyer that he is a good run defender. Yet he started the Spring Game and remains the co-starter. Here's hoping he has shown significant improvement and all his "hype" has been swallowed up by Fort Schembechler.
My choice for the 2013 Surprise Fall Contributor is Graham Glasgow. Really the only knock against him was his underwhelming recruiting profile, which history has shown means little when dealing with lineman.He is a third year lineman with the size and strength to be a very good player at LG.