the just released schedules were a flat-out statement that the B10 doesn't believe SOS will matter in playoff selection
stats
OT: Extracting play-by-play data
Due to all of the debate regarding the Wisconsin game and the quality of the 2010 offense in general, I've been thinking about stats a fair bit. Thus, I went to find out some more regarding FEI calculation- I ended up not finding the information that I needed so I emailed Brian Fremeau to see if he can provide some illumination (although I believe the actual formula he uses is proprietary so I don't expect to learn too much).
The functional end result is that I've become curious about how people such as Brian Fremeau and others that create advanced stats based on play-by-play or drive-by-drive data are able to collect their data.
The NCAA team reports have game-by-game play-by-play data, but extracting the necessary information from them seems difficult since it's all text based. I'm guessing that it just looks complicated to me since I'm not a CS or CE person. But, I'm still interested in how the data is extracted.
So, if there a better site than the NCAA team reports to get play-by-play data to extract and distill down into the necessary components (pass, rush, yards, player(s), etc.) or is the NCAA site the best and it just takes some coding to make it work efficiently?
I wonder what kind of advanced stats the MGoCommunity could come up with access to years worth of distilled data from every team in the country...
Thanks.
Finding good football statistics
I need some help finding stats that are better than simple raw yardage totals as we all know how flawed those are. What are some good suggestions. I want tempo free type stats, efficiency and all that. I know of Kenpom for basketball. There has to be something awesome for football that I don't know about yet.
The Michigan Difference
We spend an awful lot of time griping about the defense here and what it is doing for opposing offense's stats. I thought I would instead look at what our offense is doing to our opponents' defensive stats. So brace yourself for lots of charts.
Here is how our opponents' defenses have looked thus far:
| Opponent | Games | Yards | Yds/gm |
NCAA Rank |
| Connecticut | 8 | 2914 | 364.25 | 60 |
| Notre Dame | 9 | 3538 | 393.11 | 77 |
| Bowling Green | 9 | 3828 | 425.33 | 98 |
| Indiana | 9 | 3518 | 390.89 | 76 |
| MSU | 10 | 3279 | 327.90 | 28 |
| Iowa | 9 | 2651 | 294.56 | 8 |
| Penn State | 9 | 3114 | 346.00 | 45 |
| Illinois | 9 | 3087 | 343.00 | 43 |
Well, how did Michigan's offense do compared to these teams' season averages?
| Opponent | Avg Yds/gm | Michigan |
M % of Avg |
M % w/o M |
| Connecticut | 364.25 | 473 | 130% | 136% |
| Notre Dame | 393.11 | 532 | 135% | 142% |
| Bowling Green | 425.33 | 721 | 170% | 186% |
| Indiana | 390.89 | 574 | 147% | 156% |
| MSU | 327.90 | 377 | 115% | 117% |
| Iowa | 294.56 | 522 | 177% | 196% |
| Penn State | 346.00 | 423 | 122% | 126% |
| Illinois | 343.00 | 676 | 197% | 224% |
So our offense has gained more yards than what every one of our opponents' defenses yield per game.
What would their statistics look like if they hadn't played us? I went and calculated what each team's Total Defense season average would be and their resulting ranking with the FBS statistics:
|
Opponent |
Average Yds/game |
Rank |
Without M Yds/game |
Rank |
Difference |
| Connecticut | 364.25 | 60 | 348.71 | 50 | -10 |
| Notre Dame | 393.11 | 77 | 375.75 | 70 | -7 |
| Bowling Green | 425.33 | 98 | 388.38 | 75 | -23 |
| Indiana | 390.89 | 76 | 368.00 | 62 | -14 |
| MSU | 327.90 | 28 | 322.44 | 27 | -1 |
| Iowa | 294.56 | 8 | 266.13 | 5 | -3 |
| Penn State | 346.00 | 45 | 336.38 | 34 | -11 |
| Illinois | 343.00 | 43 | 301.38 | 14 | -29 |
Average difference: -12.25 places.
So there we have the Michigan Difference. Playing Michigan so far this year has cost our opponents on average of 12 places in their NCAA Total Defense statistic. I suspect an analysis of rushing offense, passing offense, or scoring offense would yield silmilar results.
The Leaders, and Best. Let's Go Blue!
Edit: As per suggestions in comments, added % over average w/o Michigan, above.
Added another chart for the defense:
|
Opponent |
Gms |
TO |
TO/gm |
vs. M |
TO/gm - M |
Rank w/ M |
Rank w/o M |
Difference |
| Connecticut | 8 | 2708 | 338.50 | 343 | 337.86 | 86 | 88 | +2 |
| Notre Dame | 9 | 3618 | 402.00 | 535 | 385.38 | 49 | 59 | +10 |
| Bowling Green | 9 | 2663 | 295.89 | 283 | 297.50 | 111 | 110 | -1 |
| Indiana | 9 | 3599 | 338.50 | 568 | 378.88 | 50 | 64 | +14 |
| MSU | 10 | 4168 | 416.80 | 536 | 403.56 | 36 | 48 | +12 |
| Iowa | 9 | 3688 | 409.78 | 383 | 413.13 | 42 | 41 | -1 |
| Penn State | 9 | 3325 | 369.44 | 435 | 361.25 | 68 | 72 | +4 |
| Illinois | 9 | 3261 | 362.33 | 561 | 337.5 | 71 | 88 | +17 |
Average difference: +7.13 places
Overall conclusion: Our offense is doing more damage to our opponents' defensive stats than our defense is helping our opponents' offensive stats. And WTF was up with the MSU game? We whiffed on both sides of the ball on that one.
Denard's Stats
Thought you guys may want to see these before the sites update them. I didn't triple check, so if I'm off a yard or two, please let me know...
PASSING
| Compl. | Att. | Yards | Yd/G | Comp% | TD | INT | |
| First 5 Games | 67 | 96 | 1008 | 201.6 | 69.8 | 7 | 1 |
| Season Projected | 161 | 230 | 2419 | 201.6 | 69.8 | 16.8 | 2.4 |
RUSHING
| Att. | Yards | Ave. | Yd/G | TD | Fum. Lost | |
| First 5 Games | 97 | 906 | 9.3 | 181.2 | 8 | 1 |
| Season Projected | 233 | 2174 | 9.3 | 181.2 | 19.2 | 2.4 |
He's attempted to throw 96 times, and ran 97. How do I hear people call him one dimensional? That seems pretty balanced. He may see stouter defenses, but you can bet he doesn't play 2/3rds of a quarter in any B10 games, barring injury.
Splitting a White Rainbow
In a previous diary I used passer rating as a well known and objective grade for the relative value of a quarterback’s stat line in order to determine if there were any trends in player development, and if so, how strong those trends were. However, in the diary I noted that passer rating is not without its issues and pointed those interested in finding out toward other people’s work and went on with it.
Most Declarations of Grievance attack the adequacy of the the formula used saying that the scale is unintuitive, some of the components are not orthogonal (total yards, completion %), some components are irrelevant (touchdowns), and other components are omitted (rushing stats, and sacks). These are valid arguments but the alternatives presented are unfamiliar, come with their own set of complexities, and are often difficult for fans to calculate on their own.
In this diary I don’t want to generate a new formula, that has been done. Rather, I want to accept the current formula for what it is and develop new benchmarks for what it shows us in modern context. The two problems I have with it are that it’s clearly outdated and that it obliterates information.
Problem 1: It’s Old and Busted
The current NCAA passing efficiency formula (shown below) was developed in 1979 and was generated using passing data since the beginning of the modern two platoon era which began in 1965. At the time, the rating was calibrated to yield a rating of 100 for the average passer. If a QB had average values for all 5 components (attempts, completions, total yards, touchdowns, interceptions) his passer rating would have been 100.
Here’s the rub, major rule changes have been implemented in favor of the passing game since two-platoon football started, and so the majority of the data set used to calibrate the formula was skewed toward weak passing numbers by today’s standards. The major rules changes are:
- 1976: Offensive blocking changed to permit half extension of arms to assist pass blocking.
- 1980: Retreat blocking added with full arm extension to assist pass blocking, and illegal use of hands reduced to 5 yd. penalty.
- 1985: Retreat block deleted and open hands and extended arms permitted anywhere on field.
And these aren’t even all of them. Behold, further evidence of Anthony Carter’s ridiculousness: he thrived in an era where the rules were stacked against the pass. Before these rules were implemented, offensive linemen could not really be aggressive in pass blocking. They were forced to be either turnstiles (before 1976) or turnstiles with their elbows sticking out. Before ‘85, linemen could not have their palms facing the opponent. Back in the day illegal use of hands and holding penalties were 15 yards assessed from the spot of the foul. Cloud of dust football so popular back then for a reason. For a modern taste of what this might have looked like check out Michigan v. Notre Dame 2007. The mismatch between Michigan’s D-Line and Notre Dame’s O-Line in that game was obscene. Despite that Jimmy Clausen’s freshmen year performance at Notre Dame, on that terrible offense, was slightly above average by 1979 standards.
Due to the rules changes, passing stats have inflated but the formula has not adapted along with them. That is not to say that it has no value, just that our understanding of that value is outdated.
Problem 2: It’s A White Rainbow
Imagine if a rainbow were a brilliant white arc in the sky; still interesting, but less so than what we usually see. If the water droplets in the air can not produce a prismatic effect, they just diffract the light and we can’t see the individual colors. BTW, white rainbows are real.
Getting back to football, the passer rating formula looks at, yards per attempt, completion %, touchdown rate, and interception rate, then assigns weights to those values and blends them together to provide football fans a single number to use to compare QBs against themselves and each other. All in all that is a useful tool, but the blending process obliterates some very interesting information. Passer rating is a great coarse filter but it’s inadequate for picking up subtle differences. Not all 130’s are created equal.
Re-Calibration
In order to address the first problem, it is necessary to decompose the formula into it’s base components to see what the new definition of average is for each category. For college players, I think it is also useful to split the data by recruiting ranking (Rivals.com Star Rating) and Experience (Years as Starter) to really understand how well a kid is performing relative to history.
For this project I’ve taken only players who played on teams in BCS conferences and who were rated as a Rivals.com 3-star recruits or higher. The data plotted is the average for all players within a given category (ex: all 3 start players in their 1st year as Starter is a group, and so on).
One thing I should note up front is that there are fewer and fewer players in each category as the number of years as starter increases; only about 10% of QB recruits in each group start for four years. For the 3-star and 4-star groups this isn’t a huge problem because they survive the attrition fairly well and still have 7 or 8 players to use for averaging purposes; not great by any means but workable. The 5 star group ends up with 2 players in my data set that have started 4 years (Chad Henne and Trent Edwards). A sample size of 2 is not workable and has therefore been omitted.
For completion percentage we see that the average QBs gradually improves his accuracy and approaches 61% in the long term. The higher a player is rated coming out of high school, the sooner he is likely achieve steady state.
With Yards per attempt we see a more subtle upward trend and also more separation between rating groups. I think this separation makes some sense. For one, Rivals explicitly accounts for the players physical assets; it stands to reason that 5-star players are more likely to develop NFL-level arm strength and will therefore be able to push the ball up field without sacrificing accuracy significantly. Another potential factor is that a high level QB recruit is likely to attract high level WR recruits that help improve YPA significantly. I think the long term standard that should be applied for this category is 7.6 yards per attempt.
Touchdown rate is a factor that many people argue against including in the passer rating formula. The argument goes that a TD is as much a result of the WR’s ability as it is the QB’s. The Roundtree hawk down at Illinois is an example: Edwards, Manningham, Breaston, Odoms, and a bunch of other guys would have taken that ball to the house. I think this chart shows this effect pretty dramatically. The 5-star recruits tend to go to high level programs and are surrounded by high level offensive lines, running games, and receiving corps, thus making it easier for them to throw touchdowns. Oh yeah, and they’re more likely to have the skill to exploit their advantages. Long term target: 6.0%.
Interception rate is the only negative factor in the formula, so a lower number is better (duh). Again we see 3-star recruits significantly lagging the other two groups. I suspect that not only is there the experience issue, but 3-star recruits are likely to need more time to develop proper mechanics. By year 3, all groups are about as good as they’re going to get. Long term target: 2.7%.
The New Hotness
Cherry picking the long term values for these parameters allows us to assemble a passer rating that is a true indication of good passing efficiency in college, not just objectively point but also subjectively; that value is 139.2. This is a stout target to hit and the player needs help from his team mates to get there, but it is achievable for all BCS level recruits by their 3rd year as starter. In 2009, 33 QBs put up this level of performance or better with another 10 or so within reasonable striking distance.
Third Down Numbers and Other Stats
There's been plenty said on just how bad the third down defense has been, and I thought I'd chronicle that for you. For starters, our NCAA rank is currently #66 out of 120 FBS teams when it comes to overall defensive 3rd down conversion percentage (how often the opposing offense succeeds). We are listed at 38.78% with 38 conversions in 98 attempts.
From going over box scores, I found only 97, so note that discrepancy now. I'm not worried about one missing right now. Also worth noting, I used ESPN's box scores, not Brian's UFRs. So that may cause discrepancy if you go back and check plays there.
I'm not going to offer much more than interesting stats in this. I'll let you guys draw your own conclusions and leave them in the comments. Any thoughts or explanations are welcome.
So let's take a look at the different third down plays the defense has gone up against by yardage:
| Yards To Go | Conversions | Attempts | Percentage |
|---|---|---|---|
| 1 | 8 | 13 | 61.54% |
| 2 | 3 | 10 | 30.00% |
| 3 | 4 | 9 | 44.44% |
| 4 | 2 | 5 | 40.00% |
| 5 | 3 | 12 | 25.00% |
| 6 | 8 | 12 | 66.67% |
| 7 | 2 | 2 | 100.00% |
| 8 | 2 | 8 | 25.00% |
| 10 | 3 | 12 | 25.00% |
| 11 | 0 | 3 | 0.00% |
| 12 | 1 | 1 | 100.00% |
| 13 | 0 | 1 | 0.00% |
| 15 | 0 | 3 | 0.00% |
| 16 | 0 | 1 | 0.00% |
| 18 | 1 | 1 | 100.00% |
| 19 | 0 | 1 | 0.00% |
| 23 | 0 | 2 | 0.00% |
| 24 | 1 | 1 | 100.00% |
| TOTAL | 38 | 97 | 39.18% |
There's obviously a couple outliers out there. The 3rd and 18/24 plays against MSU and Iowa respectively definitely throw a wrench in the numbers. The number that is the most disturbing, though, has to the 3rd and 6 metric. Let's take a slightly closer look at that:
| Opponent | Down | Distance | Pass/Run | Yards | Note |
|---|---|---|---|---|---|
| WMU | 3 | 6 | pass | 23 | Fly play where a blanketing Warren dives and WR comes up with it |
| EMU | 3 | 6 | rush | 13 | Brown misreads zone read with running qb |
| EMU | 3 | 6 | pass | 12 | Umbrella coverage, missed tackle |
| EMU | 3 | 6 | rush | -4 | 2nd team scrubs were in |
| Indiana | 3 | 6 | rush | 0 | Rollout pass turned scramble for no gain. |
| Indiana | 3 | 6 | pass | 18 | 3-man rush, as hit, throws skinny post against Mouton for 15 yards |
| MSU | 3 | 6 | pass | 0 | Stevie Brown Interception |
| MSU | 3 | 6 | pass | 9 | Crossing under routes confuses our LBs |
| MSU | 3 | 6 | pass | 15 | Woolfolk stares down QB in man coverage instead of WR. Misses route. Misses tackle to allow 1st |
| MSU | 3 | 6 | pass | 0 | Blitz house, man open but thrown wide |
| IOWA | 3 | 6 | pass | 10 | Curl short of the two guys we have deep on that side. Warren backed off presnap. |
| IOWA | 3 | 6 | pass | 33 | Pumpfake by Stanzi to a laid out Stross on a fly-ish route. |
Other than that pick and the four yard TFL against EMU by the scrubs, that's horrid. It doesn't seem to be laid squarely on blitzing too many, umbrella coverage, or anything in particular.
When you throw in those really long conversions, it looks pretty ugly. So what do you have to compare these numbers to? I've got two things. Brian did some extensive DIY Third Down Efficiency studies during the first few years of his blog, something he hopes to return to in the future, IIRC. There you can see that the normal conversion rate on a 3rd and 1 is ~68% (2007 statistics I believe). Michigan is outdoing that by about 7% on defense.
As you move down that trend line, however, you can see Michigan starts to approximate that line really quickly, then the extremely long conversions start to skew the results.
Also, we can look at how Michigan has done against opposing defenses.
| Yards To Go | Conversions | Attempts | Percentages |
|---|---|---|---|
| 1 | 7 | 8 | 87.50% |
| 2 | 6 | 9 | 66.67% |
| 3 | 4 | 8 | 50.00% |
| 4 | 5 | 7 | 71.43% |
| 5 | 1 | 6 | 16.67% |
| 6 | 1 | 3 | 33.33% |
| 7 | 2 | 4 | 50.00% |
| 8 | 3 | 7 | 42.86% |
| 9 | 2 | 5 | 40.00% |
| 10 | 1 | 7 | 14.29% |
| 11 | 1 | 6 | 16.67% |
| 12 | 1 | 4 | 25.00% |
| 13 | 0 | 3 | 0.00% |
| 14 | 0 | 2 | 0.00% |
| 15 | 0 | 3 | 0.00% |
| 16 | 0 | 1 | 0.00% |
| 18 | 0 | 1 | 0.00% |
| 21 | 0 | 1 | 0.00% |
| TOTAL | 34 | 85 | 40.00% |
As you can see, Michigan is doing much more poorly on offense when it comes to converting on third down. That said, we're also much better on converting on short yardage. When we get within 4 yards, we've got a very high percentage chance of converting.
Going back to the D for a minute, one of the other problems I'm noticing is how much worse we are on 1st and 2nd down. I'm not sure of too many metrics to gauge this, so I thought about a way to get a decent metric on this. While the standard 3 yards per play average will be fairly successful, it's probably not the best way to describe how successful you are. I decided to go with an arbitrary metric of half the distance needed instead. So, for example, if it's 1st and 10, 5 yards would be considered a successful pick up. So on a 2nd and 5, a 2.5 yard pick up would leave you with 3rd and 2 or 3. I would argue if you're able to do this, you'd probably be slightly more successful than just averaging three yards per snap.
I'll admit this metric is just my opinion, and I welcome ideas for a better way to measure success on 1st and 2nd down.
So with my metric in mind, here's the type of stats I'm seeing.
| 1st Down | 2nd Down | |||
|---|---|---|---|---|
| Attempts | Successful | Attempts | Successful | |
| Passing | 75 | 36 | 71 | 37 |
| Rushing | 102 | 38 | 67 | 38 |
| Sacks | 4 | - | 2 | - |
| Total | 181 | 74 | 140 | 75 |
While Michigan does a decent job of stopping a team on 1st down, about 40.9%, second downs, Michigan is quite a bit worse on second down, around 53.8%. This is understandable as you generally need less yardage on 2nd down while still getting about the same number of yards. To explain, Michigan averages a 1st and 10.38 and gives up an average of 5.807 yards. Meanwhile, one second down, they average 2nd and 8.41 and give up an average of 5.629. The opposing team gains between 5-6 yards per play [ed. -cringe] on both first and second downs, while in my metric, they should need less.
I guess, if anything is good news, on third down, we face an average of 3rd and 6.56 and hold an average of 5.18 yards per play, over half a yard less per play than 1st or 2nd down.
I'll probably be playing with these stats a bit more in the next few days. Unfortunately, most of my stats don't involve personnel, so that complicates things.
