landing spot. will be interesting to see how he does.
Things We Know This is obvious territory: the Spread's "Score whenever possible" mentality renders T.O.P. moot as a way to tell which team was playing better at the end of the game. Thing is, T.O.P. was never meant to be an in-game metric, or shouldn't have been. It's an IN-GAME metric. The idea isn't to show who's dominating the game, but what shape the defense is in. Its continued popularity on networks is likely due to the ease with which it's calculated. I think we can come up with a much better metric for that, and retire T.O.P. Good guesses:
- Offenses tire less quickly than defenses. Giving blocks is better than receiving them. Reacting to a play that you didn't call puts you at a disadvantage. Pushing past a lineman to the one place he doesn't want you to is more tiresome than shoving one (a lineman) back from the one direction you know he wants to go to. There's a lot of chasing involved.
- Players recover from being tired in real time (not Game Time)
- Fatigue is generated during plays, not between them
- Greater fatigue reduces the effectiveness of a defense because a) tired players can't react as well, and b) substitutions are inherently a reduction of the talent put on the field.
- While fatigue can be recovered from during the game, the more that is drained, the lower the maximum recoverable energy.
Things We'd Like to Know I want a metric that:
- Gives an approximate likelihood of the offense scoring based on defensive fatigue.
- Since the above would be very difficult, the metric should at least standardize defensive fatigue, to be used as a reference point
- Is fairly easy to calculate with widely available stats
Pure guesses (opportunities for me to look stupid):
- Energy is recovered at an exponential (logistic? Math majors help! -- i mean a curve that slows as it goes, or y=x^[fraction]) rate.
- More plays depletes a defense's performance
- More plays in progression depletes a defense's performance faster
- Available statistics allow us to create a metric for a defense's performance based off of these fatigue factors
Let's Talk Variables It's hard to count actual time during plays, at least for us laymen. However, number of plays per drive is easy to calculate. I would like to count plays that are replayed due to penalties unless it is blown dead. I'd like to count overall time elapsed since the last defensive play.
However, actual time is hard to come by. We have the time the game took to play. We have the in-game time. But short of having a DVR with a timer, I haven't been able to find any real time metric. If someone can find me a place where that is kept and freely accessible, I will use it. Otherwise, we're going to have to ignore regeneration based on real time.
The atom for all of this is going to be plays run from scrimmage.
Defensive plays from scrimmage increase defensive fatigue. Offensive plays from scrimmage decreases defensive fatigue. Since they use so many backups, special teams plays do not count.
The test for it will be yards given up, since scoring equates too much with field position. Why yards? Because we know that yards gained and winning are correlated. A defense that gives up more yards is more likely to be scored on.
Needs a name. For now: SCHWING.
Defensive SCHWING: How it Works What we will create is a basically running play counter:
- Higher number indicates higher level of defensive fatigue
- Defensive plays count for +3 for the defensive team
- Offensive plays count for -8% for the team on offense
- No team can go into negative.
- Commercial Breaks, Time Outs and Reviews count for -15% for both teams
- Half Time reduces all fatigue by 80 percent (rounded to nearest integer)
The Spreadsheet is here. Click on each image for full size
Michigan vs. Western Michigan:
Averages: Michigan 21, Notre Dame 17
Michigan vs. Eastern Michigan:
Averages: Michigan 21, EMU 14
Remember, higher is bad. It means that Eastern Michigan, over the course of the game, faced a Michigan defense operating, maybe at like 79 percent of its capacity, because of fatigue, while Michigan faced EMU's at, say, 86 percent capacity.
Keep in mind, it's impossible to be 100 percent the whole time. But notice how much better Michigan's defense was against Western, who's not much more talented than Eastern Michigan. There's a big difference in how well the Wolverines let the defense rest in Game 1, whereas they were considerably harder on the D in Games 2 and 3, whether by turnovers or quick scores.
So....Correlation?If Michigan's defense gives up more yards when its SCHWING level is high, that would indicate the metric works, right?
Notre Dame de South Bend:
The yellow lines are offensive plays. The ones sticking out below were negatives (or holding penalties).
Michigan gave up 236 yards (5.02 yards per play) to Eastern when our SCHWING level was 20 or higher. We gave up 61 yards (2.26 yards per play) when it was 19 or lower.
It was actually more drastic than that. A lot of short yardage was given up in the 2nd half against the backups in soft, clock-killing defense. The big plays in the first half were all during high-SCHWING periods. The 3-and-outs were during low ones.
Against Notre Dame, Michigan gave up 188 yards (6 yards per play) 2 with a SCHWING under 20. Not good. We gave up 294 yards (6.125 yards per play) when SCWING was over 20. Also not good. There wasn't as much SCHWING variance, however, against Notre Dame as there was against EMU. The Wolverine defense played much more of that game tired. If you take out the 27 yards on the last play, our SCHWING under 20 YPP goes down to 5.37 (161 yards). I think that just says ND's offense was pretty good (or held like bitches).
WMU was the opposite. With SCHWING under 20, the Broncos put up 81 yards (2.79 YPP). When SCHWING went over 20, they put up 222 yards (6.17 YPP). If I excise the 73-yard TD, it's still 4.26 YPP. But it shouldn't be excised -- that happened near the peak of Michigan's defensive fatigue during the game.
Here's what yardage against us looked like against WMU as SCHWING went up:
As the season progresses, I'll do more plotting to see if this sticks, but so far this seems a little bit correlative. If I had to guess, I'd say ND and their max-protect-bomb strategy caused the difference.
All told, when Michigan's SCHWING was under 20 this year, our defense gave up 330 yards (3.79 YPP). When it was over 20, we gave up 752 yards (5.74 YPP).
I'm sure we could play around with the factors, but as a very basic statistic, it seems to be fairly predictive. When the defensive fatigue rating for a given team is high, they are likely to give up more yards, in our extremely small sample of course. Feel free to plug in other games from years past.
Obviously, scores come after drives.
The thing to look at isn't the end of drives, but the start of them: what shape is the defense in as Team X gets the ball. For example, when Michigan put up three quick scores on Western, they got the ball each time with WMU's defensive deficiency rating already well over 20.
Similarly, EMU got the ball down 38-17 and had a magnificent drive (which should have been a TD), but every drive before that in the 2nd half, Michigan's D started under 10. The real backbreaker for them was when the QB buckled and fumbled -- that gave Michigan the ball back with EMU's defensive SCHWING over 20.
Couple things jumped out, though. The quick scores (Brown's long TD run, the kick return for TD against Notre Dame, Denard's existence) were answered with scores against Michigan, or long periods of scoring drought. Interceptions, too, created a fast turnaround. Look at Stonum's return: not only did it put Michigan back on the field after a tough stop (helped by Cheeseburger Charlie's inability to get a few plays called in*), but even more it helped the Domers' defense rest away the effect of that good early drive by Michigan.
Note how different this is from Time of Possession. By basically counting plays back and forth, we can see when one team or another is particularly likely to get scored on.
I think I'm gonna keep using this as the season progresses. It's pretty easy to calculate, especially if you have the spreadsheet handy. If it holds up as a decent indicator of expected defensive performance, maybe an addition to the UFR charting?
UPDATE 9/23:Bad news. I ran all of the plays from all three games (by ND, EMU, WMU and MICH) and there's such a small correlation it's almost not worth it:
Of course, it's not conclusive. Wait until we have at least 1,000 plays from scrimmage to analyze (we're at about 450 right now).
When SCHWING was 20 or over, offenses gained 1363 yards on 251 plays, and had 23 "big" plays (15 yards or more). That's 5.45 YPP, and 9.16% chance of a big play.
When SCHWING was under 20, offenses gained 984 yards on 175 plays, with 15 big plays. That's 5.67 YPP, and 8.57% chance of a big play.
Not exactly correlating.
One thing of note: Carlos Brown's 90-yard scamper came at a SCHWING level of 17. In fact, a lot of big plays took place around a SCHWING level of 17 to 25. I don't know that that means exactly, except perhaps that's early in drives but seldom right at the start of them. Or that 17 to 25 is the bell curve. This could simply be because early in drives there's more field to go, thus more space for big yardage.
Situationally, there was a small difference. With SCWHING under 20, 26.55% of plays from scrimmage resulted in a 1st down or touchdown. When SCHWING was over 20, that number rose to a 31.62% conversion rate. The touchdown ratio went way up: 7.11% over 20, and 1.69% under 20. But I can't tell you how much of that is field position -- the likelihood of scoring goes up when you get closer to the end zone, and SCHWING goes up the longer a drive lasts, meaning high SCHWING generally takes place deep in an opponent's zone. So the TD ratio means pretty much nil. Anyway, the average SCHWING level before plays that resulted in 1st downs and touchdowns was about 24; the level before plays that didn't convert was 22. Small difference.
I'm not giving up just yet, though. I'm gonna track a few more games, because I think I'm getting thrown off by big plays late in the WMU and EMU games, when backups and whatnot were in (high SCHWING is supposed to necessitate more backups, so if the backups go in when SCHWING is low, that changes things).
Here's the big plays with Low SCHWING this year:
|40||WMU||17||WMU||43||TD||(1st and 15) Robinson, D. rush for 43 yards to the WMU0, 1ST DOWN MICH, TOUCHDOWN, clock 03:57.|
|3||ND||6||MICH||24||1ST||(2nd and 9) ALLEN rush for 24 yards to the ND45, 1ST DOWN ND (Williams, Mike).|
|6||ND||15||MICH||24||1ST||(3rd and 4) CLAUSEN pass complete to RUDOLPH for 24 yards to the MICH25, 1ST DOWN ND (Williams, Mike).|
|24||ND||19||ND||40||1ST||(3rd and 12) Forcier, Tate pass complete to Mathews, Greg for 40 yards to the ND41, 1ST DOWN MICH (WALLS).|
|37||ND||19||MICH||19||1ST||(2nd and 6) CLAUSEN pass complete to ALLEN for 19 yards to the MICH22, 1ST DOWN ND.|
|86||ND||14||ND||24||1ST||(2nd and 14) Forcier, Tate pass complete to Stonum, Darryl for 24 yards to the 50 yardline, 1ST DOWN MICH (McCARTHY, K.).|
|100||ND||17||ND||16||1ST||(1st and 10) Minor, Brandon rush for 16 yards to the ND33, 1ST DOWN MICH (McCARTHY, K.).|
|129||ND||10||MICH||15||1ST||(1st and 10) PENALTY MICH pass interference (Cissoko, B.) 15 yards to the ND19, 1ST DOWN ND.|
|205||ND||11||MICH||27||1ST||(1st and 10) CLAUSEN pass complete to TATE for 27 yards to the ND47, 1ST DOWN ND (Floyd, J.T.).|
|9||EMU||3||EMU||30||1ST||(1st and 10) Brown, Carlos rush for 30 yards to the EMU21, 1ST DOWN MICH (CARDWELL, Marty).|
|51||EMU||10||EMU||26||1ST||(1st and 10) Forcier, Tate pass complete to Odoms, M. for 26 yards to the EMU43, 1ST DOWN MICH (MAY, Chris).|
|54||EMU||19||EMU||22||1ST||(3rd and 1) Shaw, Michael rush for 22 yards to the EMU12, 1ST DOWN MICH (SEARS, Johnny).|
|63||EMU||17||EMU||90||TD||(1st and 10) Brown, Carlos rush for 90 yards to the EMU0, 1ST DOWN MICH, TOUCHDOWN, clock 07:15.|
|156||EMU||18||EMU||36||TD||(1st and 10) Robinson, D. rush for 36 yards to the EMU0, 1ST DOWN MICH, TOUCHDOWN, clock 07:14.|
|175||EMU||11||EMU||24||1ST||(1st and 10) Cox, Michael rush for 24 yards to the EMU41, 1ST DOWN MICH (PALSROK, Tyler).|
Three of those plays are garbage time (205 ND, 156 and 175 EMU). One is Shoelace's incredible Yakety Sax Moon Run. Another is Carlos Brown's 90-yard run. Three more are big plays against EMU's defense. The rest are plays from the Notre Dame game, which, like, they have a great offense.
This isn't nearly enough to put SCHWING back on the map. But they're certainly opportunities for SCHWING to look stupid.
* Weis: "It's MMFFPHHHI-RIMMMFGHT MMMPHTWINS!"
Jimmah: "What coach?!?"
Weis: "I MMMFFFPHH SAID RUNMMMMPHHH ISO MMPPPHHH RIHMMMMPPHH"
Jimmah: "Coach, I can't hear you! Take the ham sandwich out!"
Weis: "I MMMPPHHFFF RIMMMPPHHHHHHFFF SPLMMMMPHHFFF DAMMIT!"
Jimmah: "Dammit, coach? What? What? Dammit -- TIME OUT"
Today's Focus: Rushing Stats (Rush YPG)(Full NCAA Rankings)
Players of note: Ralph Bolden, Purdue (1st, 178.5ypg); Jahvid Best, Cal (6th, 140.5ypg); Armando Allen, ND (23rd, 105.5ypg); Caulton Ray, MSU (99th, 61ypg)
Why It's Important:Because.. it tells you how many rushing yards a player has per game, on average. Pretty self-explanatory here. Generally the more yards a player rushes for per game, the better they are.
Why It's Flawed:It just measures yards. A big, bruising back that gets the ball on third and short situations or inside the ten yard line can be just as valuable as a quick running back who gets big yards but can't break tackles. What would you rather a running back's stat line be -- 6att, 25yds and 3 TDs or 30att, 200yds and no scores? One gets you points, the other gets you valuable field position that can turn into yards.
Also, it doesn't take into account the number of rushing attempts. YPC does this, but you'd have to look into two or three different stat lines to really see the effectiveness of a RB.
ALSO, it doesn't take into account fumbles. 200yds in a game is all well and good, but if all that field position is wasted because he fumbled 3-4 times, it doesn't help at all.
So any one stat for a RB will be leaving out a lot of the story.
Applying this to Current StatisticsRalph Bolden, Purdue: 178.5ypg (#1)
Definitely a great YPG average, good enough to be #1 in the nation after two games, but a look at his YPC tells a different story. Bolden averages 7.14ypc, still a respectable number, but not nearly #1 in the nation. In fact, second through sixth leading rushers in terms of YPG have a higher YPC average than Bolden. His 50 carries are the second highest in the top 10.
It's pretty obvious that between two equally talented rushers that have the same YPC average, whoever gets more carries per game will have the higher YPG average. Hence the flaw.
Robert Turbin, Utah St: 148.0ypg (#4)
Obviously an extremely small sample size here, as Turbin has only played one game so far (Utah), but he's listed here for another reason. That 148yds was garnered on only 13 carries, earning him a 11.38ypc average, the best of anyone in the Top 25 of YPG.
Reggie Arnold, Arkansas St: 104.5ypg (#25)
Arnold, while not dominant in either YPG or YPC (8.04), is extremely efficient in terms of points earned with his carries. He's had 26 carries thus far, and has scored 5 touchdowns. Almost 20% of the time this guy's had his hands on the ball out of the backfield he's been in the endzone.
So three different stat lines, all pretty damn good in their own way.
An AlternativeAlong the lines of my Quarterback Efficiency Rating, I've come up with a Rushing Efficiency Rating (RER). It's much more than YPC or YPG, it's a combination of the major aspects of a running back's game that is contributes to their overall efficiency.
Here's the first draft of the formula:
(Yards) + (Touchdowns x 10) + (Fumbles x -10)
So a big bruiser who might not rack up 8-9ypc but is solid with ball control and in the red zone who's usually good for a few scores:
10att, 40yds, 3 TDs (RER: 7.00)
Has an RER that's similar to a speed back who might rack up the yards, but is prone to a mistake here and there and might not always get the ball on the goal line:
28att, 170yds, 2 TDs, 1 Fumble (RER: 6.79)
Applying the RER to Last Season's Backs
|1||7||Donald Brown, Connecticut||JR||18||5.68||160.23||6.17||6
|2||5||Shonn Greene, Iowa||JR||20||6.03||142.31||6.68||5
|3||1||Jahvid Best, California||SO||15||8.14||131.67||8.92||1
|4||10||Javon Ringer, Michigan St.||SR||22||4.20||125.92||4.76||10
|5||8||MiQuale Lewis, Ball St.||JR||22||5.39||124.00||6.07||8
|6||6||Chris Wells, Ohio St.||JR||8||5.78||119.70||6.17||6
|7||2||Kendall Hunter, Oklahoma St.||SO||16||6.45||119.62||7.12||2
|8||3||Vai Taua, Nevada||SO||15||6.44||117.00||7.08||3
|9||4||Tyrell Fenroy, La.-Lafayette||SR||19||6.08||114.58||6.92||4
|10||9||LeSean McCoy, Pittsburgh||SO||21||4.83||114.46||5.51||9|
Quite the shakeup in the YPG rankings when the number of carries is taken into account, as well as the number of touchdowns. YPC numbers, on the other hand, are nearly identical. If the fumbles were taken into account, this would surely be a bit different, but until I can find those stats this is all we have to go by.
Thoughts? Comments? Fumble statistics? Let me know.
Behind the Numbers will be back soon with another look at a stat from the world of College Football. Any stats you want to be examined a little closer? Or even just a stat you've been interested in for a long time? Let me know in the comments and I'll do my best to get to it in the next few installments of BtN. Thanks for reading!
So quite a bit of discussion has opened up in the original post about the Efficiency Rating only taking into account the passing efficiency, and in today's College Football world, quarterbacks are much, much more than that.
In this post I'll take a closer look at the current efficiency rating and how it turned out last year in terms of ranking the quarterbacks, as well as taking a stab at my own Quarterback Efficiency Rating, which will hopefully take into account the broader, tangible aspects of a quarterback's game.
Last Season's ResultsTop Ten in Passing Efficiency - 2008
|1||Sam Bradford, Oklahoma||SO||180.84||10.02||1
|2||David Johnson, Tulsa||SR||178.69||8.9||2
|3||Colt McCoy, Texas||JR||173.75||8.07||6
|4||Tim Tebow, Florida||JR||172.37||8.23||5
|5||Zac Robinson, Oklahoma St.||JR||166.84||7.43||9
|6||Mark Sanchez, Southern California||JR||164.64||8.29||4
|7||Chase Clement, Rice||SR||163.92||8.31||3
|8||Graham Harrell, Texas Tech||SR||160.04||7.97||7
|9||Case Keenum, Houston||SO||159.91||7.96||8
|10||Chase Daniel, Missouri||SR||159.44||7.11||10|
(For comparison's sake, Pat White's QER was a 6.3)
Most of the names on the list are pretty obvious ones. The who's who of College Football Quarterbacks last year. Bradford, McCoy, Tebow, Robinson, Sanchez, Harrell, and Daniel all had phenomenal seasons and were in the spotlight of College Football because of it.
Johnson, Clement, and (to an extent) Keenum, however, weren't mentioned too much. They don't play at high-profile programs, and don't play against Grade-A competition, but you can't make an argument that they had great seasons. is David Johnson a better quarterback than McCoy, Tebow, and Sanchez? Almost certainly not. He is a good passing quarterback, however, and his rating shows that.
Key Players Not in Passing Efficiency Top 10
|26||Pat White, West Virginia|| 142.35|
|39||Michael Desormeaux, La.-Lafayette||135.01|
|75||Julian Edelman, Kent St.||118.83|
Maybe a bit of a stretch to call White, Desormeaux and Edelman "Key Players", but there's a method to my madness. Edelman, Desormeaux, and White had the 11th, 32nd, and 52nd highest rushing yards per game, respectively.
Obviously this didn't suddenly make them some of the best quarterbacks in the nation, as none of the three were invited to the Heisman Ceremony, but it's just an example of the aspects of the game that Passing Efficiency doesn't take into account.
Quarterback Efficiency Rating?So if Passing Efficiency isn't a great way to evaluate the overall quality of a quarterback, what other ways are there?
Well.. there aren't too many.
So I took a stab at my own Quarterback Efficiency Rating. It has its flaws but it's a more comprehensive, all-encompassing look at what a quarterback does and evaluates them based on a multitude of other statistics, beyond just passing.
Quarterback Efficiency Rating (QER)
(Completions) + (Passing Yards x 0.5) + (Passing Touchdowns x 50) +
(Interceptions x -25) + (Rushing Yards x 0.5) + (Rushing Touchdowns x 50)
(Rushing Attempts) + (Passing Attempts)
In this formula, not only is the best possible rating just over 100 (no one would ever realistically reach over 100, or even close to 100) for an easier analysis of the rating, but a pocket passer:
26/32, 280yds, 3 TDs (QB Rating: 9.875)
Has a comparable rating to a dual threat or even a running quarterback:
14/21, 150yds, 1 TD, 10att, 96yds, 2 TDs (QB Rating: 9.26)
There's no arguing that, in this example, the pocket passer had a better game, but at least with the QER they were about on the same level, whereas the Passing Efficiency Rating would have given the pocket passer a 185.7 rating, and the dual threat quarterback a 142.4.
Again, not perfect. But neither is the Passing Efficiency Rating. It might not make it into NCAA Recordkeeping, but it might help us in the bloggosphere rate quarterbacks on more than just their passing ability.
That's all for this installment of Behind the Numbers, please feel free to let me know if you have any constructive advice for the QER. Thanks for reading!
UPDATE: Part Two is here. Includes an alrernative to Passing Efficiency.
Today's Stat: Passing Efficiency(Full NCAA Rankings)
Players of note: Ryan Mallett, Arkansas (1st, 210.25); Jimmy Clausen, ND (3rd, 196.31); Kirk Cousins, MSU (6th, 186.71); Tate Forcier, Michigan (21st, 161.69); Terrelle Pryor, OSU (79th, 116.92)
Why it's important:It's pretty much the golden standard for measuring the (wait for it) efficiency of a quarterback. It's not flawless by any means, but overall is a pretty good indication of how good a quarterback is. Once there's a good sample size (at least 100 attempts), it's pretty safe to say that a player in the top 20 of the efficiency ratings is a good quarterback, and a player outside the top 50 isn't quite as high-caliber.
Why it's flawed:Passing Efficiency measures just that -- efficiency. How efficient something or someone is usually boils down to how much of 'x' they can do in 'y' amount of tries. It's no different in the world of college football. The equation for Passing Efficiency in College Football is as follows:
(Completions x 100) + (Yards x 8.4) + (Touchdowns x 330) - (Interceptions x 200)
So while all of that stuff on top is really important, it really boils down to how many passes the quarterback has attempted. For example:
Quarterback A plays basically the whole game and racks up some pretty good numbers, but in the red zone gets bruised up and comes out for a play.
Quarterback B comes in for that one play and throws an eight yard touchdown pass, and is right back on the bench, and remains there for the rest of the game.
Quarterback A's stats: 28/35, 310yds, 3 TDs, 1 INT
Quarterback B's stats: 1/1, 8yds, 1 TD
Go ahead and take a stab at each quarterback's rating. Or just scroll down a bit and look at the actual answers, you cheater.
Quarterback A's Efficiency Rating: 246.2
Quarterback B's Efficiency Rating: 497.2
Quarterback B, the backup who came in for one play, isn't necessarily a better quarterback than Quarterback A.. there's actually a good chance that he's a good deal worse. His efficiency rating, however, is more than twice that of Quarterback A, who had a damn good day throwing the ball. However because that one pass attempt that he did have was a successful one, his Efficiency Rating is about 287 points higher than the current highest rating in Division 1.
Applying this to current statistics:Ryan Mallett: 17/22, 309yds, 1 TD (210.25)
In the one game he's appeared in so far, Mallett has only attempted 22 passes (remember, the smaller the sample size the more skewed the rating), and completed 17 of them. A 77% completion percentage is second only to Sean Canfield (OSU, NTOSU), who has the 14th highest efficiency rating. He only has the one touchdown and has yet to throw a pick (not as important as you'd think, as you'll see later). Not stellar numbers by any means, but he did pretty well against Missouri State.
Jimmy Clausen: 40/60, 651yds, 7 TDs (196.31)
Not too much to say here, the efficiency rating is pretty well deserved so far. Quite the interesting comparison to Mallett's numbers, however. Clausen's numbers are obviously superior in every way but completion percentage. Clausen is clearly the superior quarterback here, yet because of the small sample size in Mallett's case, he has the higher rating.
Tate Forcier: 36/53, 419yds, 5 TDs, 1 INT (161.69)
Tate's numbers compared to his rating are also pretty interesting. He actually has a higher completion percentage (67.9) than Clausen (66.7), has a respectable touchdown percentage (9.43% of his passes are touchdowns, compared to Clausen's 11.7%), and only has the one interception. However even if we take that interception away (it wasn't even his fault!), Forcier's rating doesn't improve too dramatically. If the pass fell harmlessly to the ground, his rating would be a 165.5, good for 17th. If the pass was completed for a 15 yard gain his rating would be a 169.7, putting him in 16th.
The TakeawayQuarterback Efficiency Rating is an effective way to rank the overall efficiency of quarterbacks, especially later in the season once there is a decent sample size of attempts to go by. Until then, however, it's a stat that's easily skewed by a few attempts going for big yards and touchdowns. We all know Quarterback A in the example above had a better game than Quarterback B, but the formula for efficiency rating doesn't. Quarterback B did complete 100% of his passes, and 100% of his attempts went for touchdowns.. the thing is there was just the one attempt. Therein lies the flaw.
Just for fun, try to guess which stat line would garner the higher efficiency rating. Answers are at the bottom of the post.
Situation 1 Situation 2
A. 25/30, 250yds, 2 TDs, 2 INTs l A. 30/40, 300yds, 1 TD
B. 15/17, 140yds, 1 TD l B. 10/12, 100yds, 2 TDs
A. 20/24, 200yds, 2 TDs, 4 INTs
B. 20/40, 250yds, 5 TDs
Behind the Numbers will be back soon with another look at a stat from the world of College Football. Any stats you want to be examined a little closer? Or even just a stat you've been interested in for a long time? Let me know in the comments and I'll do my best to get to it in the next few installments of BtN. Thanks for reading!
Situation 1- A: 162.0 B: 176.8; Situation 2- A: 146.2 B: 208.3;
Situation 3- A: 147.5 B:143.75
In 2007 and 2008, Notre Dame played 15 of such teams and went 2-13 (13%) against them: 1-8 and 1-5 in each respective year. Contrast this with an 8-2 (80%) record against teams in the bottom third of PED. Free advice, ND: you might consider scheduling less "Club Decent" teams in the future.
Almost no victories come against "Club Decent" teams.
The Irish had 10 wins combined the last 2 seasons. In 2007, 2/3 of those wins were against bad PED teams (#32, #101, and #84), and 6/7 were last year (#112, #79, #33, #83, #115, #103, and #87). This means opponents in Irish wins in the past 2 years average out to #83 in PED -- worse than 70% of all teams.
What do you call an exception that's not really an exception?
The only wins against decent pass defenses (#32, #33) in the past two years came against two below average teams: UCLA (6-7) in 2007 and Purdue (4-8) in 2008. Due to injury, UCLA had to play a third-string freshman walk-on qb, and were also without their starting RB. The turnover differential ended up +7 for the Irish. This sounds familiar for Michigan fans. The victory against a very poor, 4-8 Purdue team was in South Bend, where the Boilermakers rarely win (1-15 in their last 16 games). The turnover differential was +1 for the Irish, with the only turnover coming as a pick 6.
What did we learn from Nevada? We don't yet know where Nevada will stack up in PED for 2009, but we do know that last year they were #85, and an oft-cited #119/119 in pass defense, crushing #118 by 25 yards/game. I wouldn't expect that unit sans 1 free safety to be much better. Statistically, even the 3-9 2007 team would very likely have beaten Nevada.
Will Notre Dame break the trend this year and be able to beat "Club Decent" teams? I'm not sure how optimistic I would be as an Irish fan. While 2008 seemed to be an improvement over 2007, how much of that was the schedule? Their record was no better against the "Club"; they simply played fewer teams. 3 wins (2007) - 4 "Club Decent" opponents = 7 wins (2008). Yes, they have another year of experience, and Jon Tenuta is calling the shots on D now, but is that enough to significantly buck a trend that went seemingly unchanged from year 1 to year 2?
Is Michigan a "Club Decent" team this year?
Michigan was a poor PED team (#79) last year, suggesting that ND would beat them -- and they did. Is there reason to believe the Wolverines will change that this year, thus suggesting a different result? There are many positive signs. New DC Greg Robinson has brought a new, attacking defensive scheme, which is designed to put constant pressure on the QB. This was very effective against a large (very similar in size to ND), veteran line for WMU. Stevie Brown's move to SLB should also help, bringing his speed and athleticism up in pass coverage instead of a larger, slower LB attempting to guard receivers in space.
Update at bottom
Update 2 at bottom
Note: This is a long and complex read. I know that. I'm looking for assistance with a project I'm working on that I know everyone will be interested in. If you wish to skip all of the reading, I have summarized everything in bullet points at the bottom.
I had hoped to keep this my little secret until I was completely done and I could unveil everything at once, but I no longer believe that I could do this project as efficiently without some other input. As an engineer, I require myself to do everything with as high efficiency as possible so I must petition the MGoBlog community for help.
As many (more likely all since you're on a site like this) of you are aware, there have been more and more threads being posted which essentially go down as so:
Poster 1: "We're going after slot-dot X and he's only 3 stars!. Argh! Doesn't RichRod understand he's not at WVU anymore and he needs to get MICHIGAN quality recruits. RichRod=Fail."
Poster 2: "Stars don't matter, obviously RichRod thinks that he's good enough and that's good enough for me."
Poster 3: "Rankings are early, they'll change, just settle down for now"
Poster 2: "He's only 3 stars but look at his offer sheet, I'd take someone that's 3 stars with offers from USC, OSU, UF, 'Bama, etc. over a 5 star with offers from us and the MAC."
Poster 1: "Stars do matter, you need talent!"
Poster 2: "Mike Hart, Braylon Edwards... nuff said"
And so on and so on.
So, I started thinking about rankings and their usefulness at predicting future college and pro success. To that end, I'm going to undertake what I believe will be the largest statistical analysis of recruiting rankings to date. But I need some help.
Let me describe what I'm planning on doing, what I've already done to accomplish that goal, and what I still need to do. Then I'll finally be able to show everyone what I need help with. You'll also be informed enough to offer criticisms, advice, and ask questions if necessary.
1- What I plan on doing
I'm going to take all recruiting data from Scout and Rivals from 2002-2009. As of right now, that includes: name, positional rank, number of stars, HT/WT/40, position, hometown, and home state. I'm then going to also compile data on how many starts each player had in each year of his career, if he redshirted, if he left early for the draft (manifested as number of years of eligibility remaining), the number of All-Conference honors received, and the number of All-American honors received. I will also take information on if they were drafted, what round they were drafted in, what overall number they were drafted as, what position they were drafted for, and what team they went to.
Once I have all of that data, I will first do a top-level analysis to see, independent of everything else, how star rankings alone are at predicting collegiate and pro success as defined by the stats that I will have collected above.
From then on, I will keep trying to dig further to get more and more relevant models and conclusions. This will include but will not be limited to how the average rankings of the other players around another player (independent of that player's rankings) affect collegiate/pro success, the number of blue-chip recruits that completely fail, the number of blue-chip recruits that leave their home state, the average team ranking, success of rankings at predicting success at each individual position, the affect of positional ranking on future success, etc.
I'm going to try to come up with as many ways as possible to analyze the data that either decouples the data or gives conclusions that are independent of coupling. Figuring out how to do that will be difficult but fun.
As a side note, this will also let me eventually compare Scout and Rivals to say with some authority, whose [final] rankings are more accurate.
Of course, I will also apply standard statistical analysis procedures to determine if my conclusions could be deemed statistically relevant or not (I don't know with what percent confidence yet so don't ask).
2- What I have done
It's all well-and-good to have thought all of this out, I'd be willing to wager that at least one other person currently reading this has thought about it, but thinking alone won't get any of us anywhere. So, I've started to do a lot of the grunt work as a sign of my commitment so that people will understand that I'm dedicated enough to make helping me worth their time.
I have already collected all of the information from Rivals for every class and every player.
So, for the classes from 2002-2009, I have every name, positional rank, Rivals Rating (RR), star rating, position (as Rivals breaks it down), and what school they committed to.
I have also created an Excel spreadsheet template that will allow me (once I get all of that data) to merely copy and paste a few things from Rivals and all of the data that I have on every player will be retrieved. With that, I will be able to create a spreadsheet for every BCS team (as Rivals only has complete listings for BCS teams) which will have every class and all of the data for each kid in every class all in one spot. Then I'll be able to do my analyses more easily.
3- What I still need to do
Obviously I'm still not done with the collecting data/grunt work as I still have to take all of Scout's data. It's taking a little while because of the way that they format their data compared to Rivals. Fortunately, I have solved the problem and can now do the usual copy and paste (followed by several other things to make it all work).
I'm considering also grabbing data from ESPN but I'm really not sure if it's even worth it. They only have data from 2007-2009 (I believe) so that doesn't even include a class that been drafted yet.
More importantly, I need to find a source for the other data that I'm trying to collect. I need to find some place(s) that lists all of following:
- If a player redshirted
- Number of starts each year
- Every All-Conference team (not just first team) for all BCS conferences starting from 2002
- Every All-American team starting from 2002
- Every transfer since 2002
- What position each player was drafted for
- Individual player positional statistics (e.g. completion percentage, interceptions, tackles for loss, etc.)
There is also some other data that I’m going to try and collect but I already have sources for that so it need not be listed here.
4- What I need help with
I need help finding the data that I list above. Pieces of it are available everywhere but I haven’t found a single site that has a repository of all the information implied in even just one of those points above.
Additionally, getting individual statistics is extremely hard. But, it would allow more comparisons than possibly anything else. But, there are literally tens-of-thousands of players. There were over 1000 wide-receivers in 2009 alone! There are simply too many players to try and go to each player individual profile page somewhere and collect the data. I, unfortunately, require lists. That is, unless there is some tool or way to automate that data collection process. I myself know of no such way but that is one of the reasons that I’m asking the MGoBlog community for help, because I don’t necessarily know everything that I could do to make this project as easy as possible (at least on the data collection front).
I’d also like to find a way to collect data on all of the schools that have officially offered a kid a scholarship to see if there is some way to show that stars or scholarship offers is, statistically speaking, the best measure of a kid’s future ability. Again, I can’t go to every Rivals profile page to try and collect that data. This is one area where I feel that since the pages are so similar, it might be possible to write some sort of script to do the work for me. Unfortunately, I’m a ChemE and MSE person, not a CSE person (for those of you outside the engineering that’s Chemical Engineering, Material Science Engineering, and Computer Science and Engineering respectively) so I don’t know what tool or utility I would go about using to accomplish that. I am in Tech. Services so I’m sure that if someone pointed out to me the appropriate tool and maybe some documentation on how to use it then I wouldn’t have any problems.
I know that what I wrote above was long so here’s the summary (whether you read everything preceding this or not).
I’m going to perform a statistical analysis on Scout and Rivals to determine how good their final star ratings and positional rankings are at predicting future success both in college and the pros. To do so, I have already collected the data from Rivals and am currently working on collecting data from Scout. I will probably not take data from ESPN although that is not a certainty.
To determine collegiate success I will take data that includes but is not limited to All-Conference honors, All-American honors, and the number of starts. To determine pro success I will take into consideration where a player was drafted and for what position.
I know where to acquire some of the information that I need but I still need help finding useful places to take large amounts of data on:
- All-Conference teams
- What position each player was drafted for
- Number of starts by each player
I would also like to find a way to automate data collection, specifically with an eye towards collecting data on what schools offered each kid a scholarship. Since there are tens-of-thousands of kids this cannot be done individually but must somehow by automated. I do not know how to do that and am thus asking for help. The same situation applies for collecting individual, positional specific, statistics on each kid.
If anyone would like to help me out with what I have asked, then I would greatly appreciate it. Any criticisms will be well-received (or at least as well-received as I can) and taken into account. Any comments or other thoughts are also welcome and appreciated.
For more information, read the sections above.
Since so many people have responded with helpful ideas, if you wish to contact me with anything that you either don't want to post in the comments, is too long and complicated for the comments, or that you wish to have a more private dialogue about then email me at:
That's not my main email so I won't check it as often (i.e. not every 20 minutes) but I'll try to check it at least once a day. If you want to send me anything, links or other work that you've done that might help me, then send it there.
Thanks for all the great ideas and please keep them coming. I'm still thinking about ways to handicap a teams that have a lot or a little talent relative to the average (for reasons that are too long to fully explain in this update, although there are some interesting thoughts on why and how in the comments below). I'm also looking for ways to automate the data collection process. There are a few suggestions below but I'm going to be looking for more so please tell me.
Again, I prefer using the comments if possible but if not then email me.
Update 2: 3-27-09
Well, it's been pointed out in the comments and confirmed by me that the email address is listed above doesn't work. That's because I had a small typo. Of course, small typos in email addresses are big typos.
Anyways, the correct email address is: [email protected]
If you tried emailing me earlier with the previous email address then please try again. I appreciate your patience.