Zen And The Science Of Third Down Conversions
You may remember a previous post wherein I showed off some shmancy graphs of third down conversions, though they lacked context and a suitable smoothing method that didn't make Baby Jesus get pissed off about data distortion. Well, more bit hammering has produced something I think is worth showing people. Without further adieu...
Third Down And What-What
A third down conversion rate is a conflation of two pieces of data: the distance you have to go and how frequently you make that distance. The varying distributions of said distance impact the end result so heavily that the stat often presented on televison broadcasts may as well be called "third and second and first down conversion rate." It's not totally useless, but it can be improved upon in a logical and fairly simple way. And perhaps you can learn something about the team you follow or football in general by taking a deeper look into the situation. If this sounds terribly dull to you, I invite you to forget all about this and check out these creepy drawings of NBA players by bizarre little Japanese girls.
Anyone left is undoubtedly hardcore like Quickdraw McGraw, so here we go. When we last left our blogging superhero, he was casting about for a logical way to smooth the wonky data points in individual team's third down rates. The problem: out towards the third-and-long boonies the disproportionate amount of third-and-tens caused an unnatural flattening effect where any smoothing would drag surrounding distances towards the third-and-ten-percentage.
The solution: take the average yardage of each set of data and place the percentage at that mark. So if you had 10 third and tens and 5 third and elevens, whatever conversion percentage you got from that would be placed at 10.33 yards out. Assuming the conversion rates are linear--a good assumption that far out--this provides non-distorting data smoothing.
So, viola(!), the Michigan defense's third down efficiency:
How To Read This Graph: the thick line in the center is the NCAA average, which is not smoothed. Jutting out from the average are the various icebergs that compose an individual team's deviation from the norm. In general you want your defense to be below the line and your offense above it. Red is bad. Green is good. I checked data out to third and 25, but it gets very thin and useless out there. 15 is about the sanity limit.
This is extremely similar to the graph produced by the first iteration of this process, but I feel much better about it with the theoretically non-distorting smoothing. It stands as clear evidence that last year's Michigan defense was in fact subpar when put in third and long situations. Message board yahoos (and cantakerous bloggers), revel in your victory. Perhaps more disturbing, however, was Michigan's far below average performance on third and short despite employing one Gabe Watson.
The second useful piece of information is the average distance faced on third down. Look nyah:
How To Read This Graph: It's the same thing. Thick line == NCAA average. Red and green == individual team's deviation. There's not really a good way to determine what's "good" or "bad" without taking the actual distance into consideration, so the green and red do not flip. Green == above. Red == below.
Note that no attempt has been made to smooth this data, since there's no line to fit it to. You gets what you gets.
Tomorrow I'll highlight some of the more interesting graphs produced by this method; Thursday you'll get a teeny app that you can use to see this data for any team in D-I. (Caveat: due to some member schools not reporting data to the NCAA, the database I've cobbled together is incomplete. This includes four games USC played, including something of minor importance called "The Rose Bowl." Shame on USC. Some negligent person in the athletic department should be fired for ignoring this task... probably updating his blog.)
Anyway, I'll leave you with the offensive graphs:
I actually thought they'd be considerably uglier.
Also: I'm taking requests. I have a database of most plays/drives that happened in D-I last year. If you've got any ideas as to how to use it, I'm listening. If you know SQL and would like to try something out yourself (Bueller? Bueller?), I can give you a public login.