Mathlete you set the standard for informative diaries on the blog. I like the idea of seeing how players progressed over the years. I would also love for someone to take my simple idea and expand on it and figure out how much experience (especially in the two-deep) actually matters to on field performance.  GO BLUE!

I would have thought picks and sacks to be extremely correlated with really, really BAD offensive production, because teams that are behind are more likely to get desperate and drop the QB back, who will in turn be more likely to take a sack or throw a pick.

Great analysis, as always!

but is a correlation value of 0.0296 really indicative of a meaningful relationship between your variables?  I have a hard time believing that a plot with that much scatter can generate a meaningful trend line.  How big a confidence interval on your trend line slopes would you need for said confidence interval to contain zero?  In other words, given this sample data, how likely is it that there really does exist a positive correlation between your x and y variables?

None of the correlations displayed are "statistically significant" which takes into account not only the relationship between the two variables but also how many data points there are. In this case there are tons but it is still not statistically significant.  However, we can assume that a MGoRelationship does exist, even though grumpy statisticians may disagree.

I'll begin with the now obligatory "well done sir." I've always been interested in redzone statistics. Maybe you could look at whether or not there are certain defensive schemes that fair better in the redzone obviously excluding goal line plays. Perhaps this has more to do with the difference between good, mediocre and bad defenses? Maybe you can look at TD production according to distance in the redzone. Is there a redzone "deadzone" that offenses tend to rarely score TD's from? Just some thoughts...hope they help!

I would LOVE to see some data of Big 10 teams vs SEC teams... perhaps after the inclusion of PSU?

As always, great diary! You're one of the main reasons I love this blog.

1) Maybe there's something interesting about size vs speed for offense (high scoring ninja football vs 3 yds and a cloud of dust? more upsets/mistakes for speedy higher scoring teams?) or defense (smaller speedier defenses giving up more time consuming scoring drives? bigger slower defenses doing better against running offenses?).

2) I've always thought that a big part of being a big-time athlete is shaking off mistakes and performing on the next play. My anecdotal example is that Ryan Leaf is a big crybaby and Tom Brady is a play making machine. I don't have a specific question, but maybe there's something testable about "momentum" or teams recovering after a turnover or a lost lead or certain teams (coaches?) being "better" at this then others?

Keep up the great work.

are there really 4th quarter teams? And if there are, is it just luck or are there any common threads?

Not positive I'm reading this right, but isn't it possible that pass offenses that give a lot of interceptions are also just bad pass offenses in general, making the interceptions not seem as damaging to the whole?

the "secret dead zones" and the Carr vs. RR.  Would there be enough data to do a GERG vs. past three D coordinators after next year?

I don't know as much about stats as this guy does.  However, an anecdote from recent memory: Texas-Nebraska, Big 12 Championship Game 2009.  MVP:  Ndamukong Suh, DT, Nebraska.  According to Suh's Wiki, he had 12 tackles, 7 for loss, and 4.5 sacks against Texas that day.  However, I would argue that this was not a sign that Texas had a bad offense, or that McCoy was a bad QB.  Rather, Suh was simply an unstoppable force of nature that year, and his personal stats and the defense's stats reflect that fact.

I'm not exactly sure what conclusions we can draw from this, but I believe this article lends credence to the idea that having a great offense is better than having a great defense, on average.  However, getting back to my B12 example, a great offense will, on average, get you better field position for extra-long-range field goals in emergency situations, as well as setting up your playmakers on offense for many more scoring opportunities.

I think the real effect of spread offense mania in college football has been to reduce the importance of plays run for short gains.  Sure, a quick throw for 5 out to the flat is all well and good, but statistically, an offense that is capable (at least on paper) of 20+ yards per play is in a better position to take advantage of bad secondary play, especially at the college level.  Not to mention making much greater use of the pass is easier on the running backs' bodies; they also get more chances to utilize their blocking and misdirecting abilities.

I would like to see Carr vs. RR, but as part of a broader 'Coaching Legends of Michigan' series.  Find out how to weight each coach's performance as a function of quality of schedule, recruiting, and hard-to-define things like that--i.e. the mitigating factors that make it so hard to compare coaches from different eras.  Recruiting was a lot different in Yost's era than it was in Moeller's, and it's more different still in RR's.

Can you do the same thing but with fumbles to see if Brian's "randomness" theory holds true or if fumbles are a symptom of a good defense/bad offense.

I have always had the gut feeling that 1st down success is very important for a defense, but absolutely critical for an offense.  A defense that isn't successful on 1st down (I'll define "success" as 3 yds. or less) isn't necessarily a bad defense if it can create TOs and such, but an offense that isn't successful on 1st down is simply a bad offense.

Care to investigate?

I think someone did a post a while back on "successful downs" by the offense.  This person determined that a "successful down" was one that gained 1/2 or more than the yards needed on the play. i.e. a successful 1st & 10 would be one that gained 5 or more yards.  Maybe something more statistically significant than this, but I'm just throwing out ideas.

This is for NFL, FWIW.

FO baseline for offensive success is 45% of needed yards on 1st down, 60% on 2nd, and 100% on 3rd or 4th.

Defense gets a stop if it prevents an offensive success.

I've always thought part of getting a lot of interceptions and sacks is having a good overall team.  Obviously you need to have a good defense to get those two items, but I've always thought a good offense helps too.  If you have a good offense and good defense, chances are you are going to be winning a decent amount of games.  That in turn is going to force teams late in the game to throw the ball more thus increasing the chances for sacks and int's.

An interesting stat would be to see how sacks and int's of really good teams (BCS title game participants) are spread over the 4 quarters.  I wouldn't be surprised to see a spike in sacks and int's in the 4th quarter.

I have no idea how you could do this, but it would be awesome if you could somehow quantify how important certain positions are to the outcome of the games.  It'd be interesting to see, somehow, if poor safety play has more of an effect than poor cornerback play.  I can think of a few ways of trying it, but none of them are that great because they all rely on how people rank players and it doesn't take into account the effects that one player being bad can directly and/or indirectly have on others.  It'd be awesome if you came up with anything, really.

but stadium/crowd noise as it correlates—or not—with game outcome.

Great job as always Mathlete!! Not sure if the data is available, but a comparision of how the offense and/or defense for RichRod in his first 3 yrs at UWV compare with Michigan's (if i remmeber correctly UWV was a top 10 defense when they barely missed the nationship game in 2007). i would also like to see a progression of great players.

I always enjoy reading your articles.  This one was yet another surprising one.  Its very interesting that QBs that throw more than average interceptions are probably undervalued.

There was a post in the last fews days (or week) which looked at how much experience UM has compared to other teams.  The breakdown was upperclassmen and underclassmen.

Obviously breaking that down further into number of non-redshirt years would be nice, but I think it would be really interesting to see how games playes (and possibly games started) of all of your players or ideally your starting lineup correlates to performance (wins or maybe yards per game).  Keeping the universe to just Big Ten teams would be ok.

Thanks again.  I look forward to reading your next article.

I enjoy these diaries and like the way you think.  I must admit to having a hard time following the jargon in them (particularly with respect to the variables you are measuring, but occasionally with the stat methods you use as well).  Could you either link to a glossary, or include a paragraph or two of explanatory template?

Thanks.

I have 2 questions that maybe you can help me understand.

"the average defensive unit produced 2.3 ppg worth of sacks and 2.0 ppg worth of interceptions.  Sacks have a slightly higher direct value than interceptions "

This appears to suggest the defense is better off getting a sack than an INT. Do you mean to say that because there are often more sacks than INTs that the sacks end up having a greater effect? Or are you saying that the defense is better off getting a sack than an INT? The second possibility is so counterintuitive that I think it requires more analysis to be understood.

"For every point per game that a defense generates due to sacks, the overall pass rush generates 1.2 ppg of additional value.  Interceptions are also powerful, but not as much so.  Each ppg of value a defense generates through interceptions is worth 0.9 ppg of additional value."

I am confused how you can measure the effect of a sack or INT on plays when there is no sack or INT. I presume when you say pass rush you mean sack because I cannot think of any way of quantitatively measuring the effect of a pass rush that does not result in a sack. I must be missing the point. Again, we see this counterintuitive idea that sacking the QB is better than getting the ball back. Interesting.

I would get higher R-Squares if I regressed predator drones against meconium.

I mean, any conclusion could be correct assuming a 2% confidence interval.

Judging by the number of infants that have seemingly been killed by predator drones, it seems that they may be attracted by meconium?

math used to take all the fun out of skewl for me

and now you are making it take all the fun out of footbaw for me

boo math

You say that sacks and INT's aren't a good estimator for offenses, which makes sense. An offense which passes a lot is more likely to give up more sacks and INT's, but also has the potential to put up a lot more yards than a team that keeps it on the ground all the time.

My question is this. What do the numbers show about the rushing game for teams that surrender a lot of sacks. I imagine the numbers have to be lower, because that would indicate the QB was holding the ball more often (unless it's just that they have NO pass protection, at all). However, does the surrendering of more sacks indicate a generally weak offensive line that also leads to reduced Yards Per Carry, and more stuffs, or no?

interceptions returns and fumble returns on sacks are not included

Well, then, no offense, what's the point of this exercise? Fumbles and interceptions ARE returned. What's the point of measuring a reality in which they're not?﻿

any sacks that resulted in fumbles and any INT's that had any return yardage. I believe those plays were simply not included as data points.

I know that's what he did. I'm wondering what the use of measuring interceptions and fumbles are when you excise return yardage.

So, fine - in a world where INT's are not returned, sacks have more value. But - what does that tell us about, you know, football, where interceptions ARE returned?

So, fine, for the 10% of INT's (made up number) that aren't returned, sacks have more value. But what, really, do we learn from that?

INT yardage doesn't matter on non-INT plays.  Sack-fumble return yardage doesn't matter on non-sack plays.  The idea is to measure whether a defense that gets lots of sacks is more effective when they aren't getting sacks, and likewise for interceptions.  From what Mathlete shows us, in terms of expected points, the answer is yes.

Peace

Ty

The idea is to measure whether a defense that gets lots of sacks is more effective when they aren't getting sacks, and likewise for interceptions.

Is that what he's saying? I don't see anything like that in his post. Maybe I'm dense?

Anyway - why would a defense that gets sacks be more effective when they don't get sacks? This doesn't make sense.

than a defense that does not record a lot of sacks.

Specifically: Is a defense that gets a lot of sacks good because they get a lot of sacks, or do they get a lot of sacks because they're good?

Edit: Is this discussion making anyone else feel kinda gay?

I just suddenly realized I've never talked about sacks so much before in my entire life.

Maybe it's just my inner 14 year old.

"but does either of these correlate to a better defensive performance overall[?]"

"Not entirely surprisingly, the better a defense is at producing sacks and interceptions, the better it is on downs where neither occur.

"For every point per game that a defense generates due to sacks, the overall pass rush generates 1.2 ppg of additional value. Interceptions are also powerful, but not as much so. Each ppg of value a defense generates through interceptions is worth 0.9 ppg of additional value.

My own research started with 20 years of leaguewide NFL data, and looked at yards-per-attempt rather than expected points.  My findings were similar: there was a weak inverse correlation between sacks and passing effectiveness on non-sack plays, and a slightly stronger-but-nothing-amazing correlation between INTs and passing effectiveness on non-sack plays.

However, I also looked at passes defensed, and there was a very strong inverse correlation between passes defensed and passing effectiveness on passes that weren't broken up--that is, in seasons where "coverage" was really good, passing was severely depressed.

Double however, the passes defensed, and passing effectiveness, trends closely followed the Ty Law Rule.  Over the late 90s and early aughts, there'd been growing trend of more physical press coverage, culminating in the '03 playoffs, where Ty Law physically assaulted Marvin Harrison en route to a title.  That summer, they changed enforcement of PI and illegal-contact rules--and YPA has shot up while PDs per attempt have been dropping.

So, my commenters (and the Mathlete, and I) have concluded that measuring leaguewide data is just too high of a view, and I'll be following up with team-by-team data over the past few seasons . . .

Peace

Ty

I mean he didn't include an INT'S where there was a return or sacks where there was a fumble.

Only INT's that were ruled down at the point of interception (IE - Tackled at the time of INT by the WR), and sacks that did not cause a fumble, or that did, but the fumble was in a pile and was not returned at all, were counted. So those yards haven't been excised. They simply don't exist.

It's a matter of ensuring you're not skewing your results by the fact that offenses probably aren't very good at defense and vice versa. A long return after an INT is more likely than long yards after a catch, I would think.

I know.

But isn't that a small sub-set of INT's? Isn't he excising a large number of data points?

Math = sexy !

But shouldn't you do some sort of pre-processing to make the data look like something other than the pieces of baby food on my 5 month old's face before drawing a line through it and attempting to make some sort of conclusion?  Paraphrasing CG and WolverBean, the idea that you can draw a line through that mess and make a reasonable conclusion is a bit laughable.

Here's an idea.  Bin your vpg from sacks (or ints) and then plot fewer points (Like 10 points at most).  You can calculate a bin average and standard deviation, then plot that using the std. dev. as an error bar.  If you are so inclined, you can also do a weighted linear fit using methods detailed in (for example) Bevington's book.  Also there is software that you can buy that does minimum chi-squared type fits like Slidewrite (which costs \$150 or so) or write your own code to do the analysis using something like MATLAB.  But at the very least you can see if extreme bins have a statistically significant separation.

"Will Michigan rue the loss of Brandon Graham? - Tim"

Without reading the post... um, yes?

The implications for offense, and for grooming quarterbacks in the NFL, intrigues me.

It reminds me of a thought that ran through my head last season, watching a replay of the 1992 NFC Championship Game ('92 season, game in '93), between the Cowboys and 49ers.  Steve Young, who I remember as a very intelligent decision-maker, threw kind of an ugly pick towards the end of the game as the 49ers tried to come back.  I remember thinking, "Wow, did Steve Young just throw a Brett Favre pick?"

As the game has evolved since 1992, there's been increasing pressure on young quarterbacks to start right away--and a correlating pressure 'just be a caretaker', and never make the big mistake.  These numbers make that seem like folly: teach the QB to make plays within the offense, and deal with the inevitable mistakes!  Don't 'Nerf' your offense, and be left with no positive plays, either . . .

Case in point last year: Brady Quinn, with zero weapons around him, was asked to dink and dunk and nibble all season long.  They weren't going to win throwing WR screens to terrible WRs, so why bother?  Why not let Quinn play football, and learn from his mistakes?  Why play a low-variance game when you know you've got no chance of winning on execution?

Peace

Ty

PS: thanks for the shout-out!

As a Devil's Advocate, Roethlisberger, Flacco, and Sanchez have all been brought up in the 'caretaker don't make the big mistake' mold.

Those are some good teams and good young QBs.

Sanchez's "goodness" is relative.  On a team with the best rushing offense in football, and the best scoring defense in football, he was statistically terrible: 12 TDs to 20 picks, and a 63.0 passer rating.  If he was being coached to not make mistakes, he was doing an awful job.  He did show some real promise in certain games, and I do expect him to take a step forward in 2010--but in 2009, you can't call his absolute on-field performance anything but rotten.

Roethlisberger had a similar supporting cast, but he threw 17 TDs to just 11 INTs, and had a 98.0 passer rating to go with that excellent ratio.  They didn't ask him to win games for them, just make wise decisions.  Not only did he make very wise decisions, he made a few great plays anyway, and his championship-caliber team won a championship.

Flacco had a lesser, but still very good, supporting cast his rookie year.  He posted a 14/12 TD/INT ratio and 80.3 rating . . .

What I'm getting at here is that the 'caretaker' thing can work when you have a winning team forced to play an inexperienced quarterback.  But as a blanket philosophy of quarterback grooming?  If you ask physically gifted prospects like Alex Smith and Jamarcus Russell and Tim Tebow to do nothing but make very wise decisions on terrible teams, they'll fail.

Even if they could pull it off, we can see above that eliminating mistakes won't make a poor or mediocre offense better, it'll only eliminate mistakes.

Peace

Ty

that i have no idea what any of this means.   Having said that, if the purpose is to say that sacks are more important to a game than INT's  i dont agree...

first of all   Its folly to compare a sacks and Int's in terms of impact.

One play is defined as a tunrover 1 is not.  1 plays creates negative yards for an offense, one creates literally  an end of a possession.

Statistcally ints show up differently then sacks.  Sacks create negative yards that have to be made up.  An Int ends possessions...  No coach in america would say "oh man if we just sack them over and over again our defense will give up less yards", to "hey lets pick one off and get the ball"

conversely no coach in america would say " its better to throw a pic than to take a sack"  regarldess of what the offensive production stat says.  I-N-T those three letters, matter a whole lot more than yards...

Teams that have a good pass rush, tend to get a lot of sacks, and probably have a good dline/blitzing lb's,  coverage guys etc. You would expect that sack leading teams problaby have better defesive stats than a team that gets a lot of int's

So if the point of the exercise is to show that statistacally the number of sacks a team gets is a better indicator of how good the team is defensiviely, i would agree..  Int's  are more hit or miss depending on off days by qb's wrong reads,  confused game plan,  wind, and maybe one player.  Where as getting a lot of sacks indicates a lot of good defensive players...

just my opinion,  ill see myself back out..

I think gs makes an excellent point. Good defenses generally get sacks while there is data to suggest that INTs are more a function of the offense than the defense.

In other words, getting sacks might correlate more with good defense than INTs. The sacks are not preferable to INTs but have a stronger correlation with better defense than INTs.

But, when all is said and done getting an INT is better than a sack, in most cases.

I dont know what any of this means

Really, I dont. Maybe I'm too old.

Regardless, I want my defense to get sacks and picks,  and I dont want my offense giving up sacks and picks.

Hopefully, these charts prove that and we're on the same page

Actually I enjoy this article/post quite a bit, but agree with some of the commentary that it's not making a huge splash on my senses right now.

That said, I generally just accept that Mathlete is using his statistical analysis correctly and I've had enough statistics classes to know that other people who understand statistics say that those plots are the correct way to do things, and that the trend lines are a valid result for this data.

I like to focus on the conclusion and how the conclusion could be used by the head coach to make either strategic (what players do I need to recruit, and which coaches do I need to hire to train them) or tactical (what play do I call now in this game) decisions.

And what the conclusion tells me is not very valuable for this goal.  To restate the conclusion, the amount of sacks and interceptions a defense compiles on the stat sheet is indicative of how good of a defense they are overall, and thus how well they will perform on any play over the course of a game, and games over the season.  So nothing valuable to tactics, (doesn't tell me what formations against which opposing offensive plays were most effective at producing sacks and interceptions) and has slight strategic value ("hey guys we should recruit awesome defensive talent and train them with defensive skills so they are really good")

BUT, when you form a hypothesis, you very frequently don't know the outcome, you have an expectation maybe, but the whole reason to do the analysis is to prove or disprove the hypothesis.  And as was stated, for Defense conventional wisdom holds, good defenses will product higher amounts of sacks and interceptions than poor defenses.

The offensive conclusion is a little less clear to me, at least to the point of my goal, i.e. what do I do tactically or strategically?

To gsimmons85's point, interceptions end the possesion, and on a scale of 1-10 that's "bad".  But the hypothesis wasn't setup to measure how an interception affected a specific drive, but to determine if it correlates to the average performance of an offense, i.e. do bad offenses always have higher interceptions against them than good offenses.  The result is "no in general all offenses have higher interceptions and sacks when facing good defense than they do facing a bad defense"

which is a more obscure proof of the same conventional wisdom, i.e. it's harder to perform against someone who is good at stopping you than against someone not so good at stopping you.

And to extend the conclusion to something I think is meaningful to a head coach, "But just because you are facing a good defense does not mean you start calling safer plays, because that defense will be just as effective stopping the safe plays as the risky plays"

and finally I'll bring it home to an area of focus I would like to see.  First, if you are facing a good defense, you will put your team in a very bad position if you cannot establish a point lead early in the game, or maybe more importantly prevent getting far behind, because as you start to call risky plays to catch up, the good defense will just increase your failure rate.

The flip side of this is of course the good old Bo/Lloyd theme of hey we've got a large lead, let's sit on that by reducing the amount of risky plays on offense and thus let their offense continue to struggle against us.

Unfortunately this doesn't work in a team sport where your offense and defensive teams are made up of different players.  Consider the time when players were on offense AND Defense. Under these condistions your offense and defense are both just as tired at the end of the game.  BUT Bo/Lloyd did not have this.  What they had was an offensive team that got more and more rest during the game as the risk level of offensive plays was reduced.  And a defense (albeit always good as dictated by tradition *cough*, The Horror aside *cough*) that was getting more and more tired.

So Mathlete, please test this hypothesis and please include some data from 1997.  Defenses perform better on average when their offensive counterparts are performing better true or false?

Qualitatively and from a subjective memory, overall the Offense in 1997 was not only conservative, but overall young and below average (I think that while the offensive line contained the most talent ever, in 1997 it was almost entirely redshirt freshman and of course had gritty Griese as the QB)  If any team won their games solely because of defense, this was the team.

But considering how well that team trashed the Big Ten, it seems to me that Lloyd kind of squandered a 20 point lead on Ohio State, not to the point of losing, but certainly he for some reason decided to stop trying to score in the second half, and succeeded.

And the game against Washington State required some stunning long pass plays to put Michigan on top, as the Michigan defense got behind on what was the best passing offense they saw all season by far.

The first win against Colorado was impressive and dominating, but didn't Colorado end up sucking that year?

And for the Penn State game, I think someone must have put some rum in Lloyd's gatorade based on the kind of offensive tricks he started pulling out of his hat, of maybe Penn State's coaching just chose to ignore the fact that Lloyd had been using Woodson on offense, cause it sure didn't look like the Penn State defense felt they needed to cover the fastest guy on Michigan's team.  I mean, wasn't the Penn State game where the "no scoring in the 4th quarter" streak ended? *gasp* another Michgan streak ending.

Sorry for the memorabilia toward the end here, but the interview of Woodson at the end of the game is burned into my memory.  They of course asked, "what do you think about the 4th quarter no scoring steak ending?" and Woodson's reply of, "the result was already decided by then",

ahh, I can't wait to see everyone talking about how surprisingly good the 2010 defense turns out and ignore the fact that it will come because RR is NOT Bo/Lloyd and isn't going to take his foot off the Offensive Gas.

This young defense is not going to be good, but they will be good enough.

That said, I generally just accept that Mathlete is using his statistical analysis correctly

Not to beat a dead horse here, but the Mathlete's conclusions are completely unjustified in this situation. The correlation values presented in the first two charts are way, way below accepted statistical norms. Normally I'm a huge fan of the Mathlete's posts (heck the data here is still very interesting), and the conclusions may in fact be correct, but the data shown does not back up the claim that sacks are more important than interceptions or that the offense is not affected by either.

Mathlete, I'd like to see you include statistical significance testing in all of your articles.  Also, you could include ranges, ie the benefit is 1.2 +/- X ppg for each sack.  In this instance X would be large which means a sack is not a strong predictor of score.