Any interesting sports analytics storylines for 2018 football season?

Submitted by MGauxBleu on July 8th, 2018 at 9:24 AM

I'm starting to dabble in sports analytics as a new hobby. I'm working on some NBA player analysis for an upcoming fantasy basketball season, but Michigan football is my first love, so I'm trying to tease out some potential projects for this fall. Anything you all can think of that you'd want to tease out? Or any interesting public datasets that you know of that I could analyze? For instance I was thinking about combing through old UFRs.

Alternately, any one have suggestions for interesting sports analytics blogs or twitter profiles to follow in the vein of or with a bend towards college football? 



July 8th, 2018 at 10:49 AM ^

I've never seen it analytically proven out ---- but I've become more and more convinced (my gut feel) that college teams are not throwing the deep ball (the proverbial "50-50" ball) nearly enough.  I think there is more reward there than risk.

The CFBAnalysis subreddit is a decent source for data.  I personally have drive-by-drive data for most D-1/FBS college football games played since 2005.  Drive-by-drive data is nice but for a question like the above, you really need play-by-play data.  That takes a whole lot longer to compile.  Play-by-play data can also be purchased, but that's on the order of hundreds of dollars.


July 8th, 2018 at 11:11 AM ^

Most of them aren't true 50-50 balls - but that's the fairly common nomenclature.

If you had a rich-enough data base, you could:

1. Look at the number of 30+ yard downfield passes that were thrown.  Determine that X% of the time it resulted in a TD, Y% of the time it resulted in a 30+ yard gain (no TD), Z% of the time it resulted in an interception, et cetera ....

2. Take a historical team and --- in the context of all the other plays they ran that year --- increase the percentage of times they threw a 30+ yard downfield pass.  Leverage the stats above.  Simulate their drives/seasons thousands of times.  See if their offensive efficiency increases or decreases in the aggregate.

That's not a perfect method.  And you need a hell of a lot of data.  And College Football stats are goofier than NFL stats because there are so many mis-matches (in step 1 above, you should throw out all the FBS/FCS games and a chunk of the FBS/FBS games too).  But the above method would at least hint at an answer.

Ghost of Fritz…

July 8th, 2018 at 1:55 PM ^

O.k,, sounds interesting.

But how do you control for teams that have better passing games in general being the teams that will make more 30+ yard pass attempts (precisely because they have the good QB, receivers, and pass blocking o-line, that makes them comparatively better than other teams at success on that type of play).

IOW, assume that the numbers show that teams that throw more 30+ yard passes in fact do measure out as having higher offensive efficiency. 

Is that because they are attempting more 30+ yard passes?  Or is that because they had the QB, receiver, o-line combo to make 30+ yard passes a good probability proposition in the first place (which then leads those teams to attempt more 30+ yard passes than average)?

For example, was Michigan's 2017 offense bad (in part) because they did not attempt enough 30+ yard passes? 

Or was it bad because they had a problematic o-line, inexperienced receivers that were not good route runners, and questionable QBs (all of which lead the OC to rarely attempt 30+ yard pass plays)?

What is the causal element?  Mediocre offensive players (at least mediocre for long passes)?  Or the lack of 30+ yard pass attempts?


July 8th, 2018 at 2:25 PM ^

Agreed ---- there is A LOT that you would have to control for.

I admit that an analysis like I suggest would be answering questions at a "fairly high" and "directional" level.

And even if an analysis said "college football teams as a whole would become more efficient in the aggregate by throwing more deep balls", it may not add value for an individual team.  E.g., an offense that already runs the ball for 7-yards-at-a-chunk and rarely turns it over.  There's no value to adding "more long balls."  Their offense is already very efficient.

Such is data analytics.  Often "fairly high", "directional", and "in the aggregate" is as good an answer as one can get.  Still doesn't mean it's an analysis not worth attempting.


July 8th, 2018 at 3:24 PM ^

An offense that runs the ball for 7 yards a chunk SHOULD ONLY throw deep balls. LBs and safeties cheat up and you have 1-on-1 matchups (and maybe 1 on none with some well designed play action passes). 

What they should NOT do is throw 3-step hitches for 5 yards a pop or complicated 5-step patterns.


July 8th, 2018 at 2:32 PM ^

I admit --- PSU 2016 made me think about this theory more, but I generally thought that well before 2 years ago.

Bill Connelly is a smart guy.  There is a reason he has "explosiveness" as one of his five factors.  "Efficiency" is one of his five factors too.  But the best teams are BOTH explosive and efficient.  

And --- in theory --- one way to increase "explosiveness" is to increase the "number of chances you have to make an explosive play."

You don't win the bets you don't make.


July 8th, 2018 at 3:30 PM ^

I don't want to put my personal e-mail out here on this board.  And unfortuantely MGoBlog doesn't have a Direct Message component. 

But search for me on CFBAnalytics (hint: search for various forms of "Nittany" amongst user names) and send me a Direct Message there.  I can then share out some of my dataset w/ you (if it would be useful).


July 8th, 2018 at 3:21 PM ^

I agree with you 100% although what is often forgotten is that to often throw a 30+ yard a team has to protect the QB for more than a nanosecond. 

The pros definitely don't throw it downfield enough although the protection issue is even more acute.



July 8th, 2018 at 11:15 AM ^

I have always wondered about efficiency based on field location (but obviously not enough to actually evaluate it myself).  For example a WR or RB who gets a 1 yard touchdown has obviously helped the team and done their job, but when averaged into their stats, the 1 catch/carry doesn't really help them in their overall statistics and isn't properly captured. What was the hammering panda's per carry average, yet he had numerous touchdown. If you instead look at that he covered 100% of his total potential yards on many of his carries, I think his value would be better conveyed than his lousy yards/carry.  I think the offensive efficiency statistic (5yds on 1st down, half the yards for a first down on 2nd and all the yards on 3rd down) is close, but does not tell the same story. Unfortunately, I don't have a suggestion for a database to study this (otherwise, I may have actually looked at it myself).


July 8th, 2018 at 11:38 AM ^

I like that idea.  I think the closest stat to that right now conceptually is EPA (Expected Points Added).  E.g., on 1st-and-goal from the 1, if a college team runs the football, they will typically score 3.4 points on the play.  If Hill scores a TD, that's a +2.6 for him.  If he gains 0 yards, that's a -3.4.  If he fumbles and it's returned for 6, that's a -9.4. (I'm not sure how the extra point gets factored into all that).

Could also do the same thing with WPA (Win Percentage Added).  That makes a Hill TD run in a 10-10 game more valuable than a TD run in a 35-3 game.

I think the NFL Advanced Analytics guys use both of these - but I haven't really seen it for College.  Bill Connelly's offensive efficiency is closest, yes.



July 8th, 2018 at 12:17 PM ^

yeah, obviously I picked the extreme example, but I think there should be a difference between someone who runs for 20 yards and gets a TD than someone who just runs for 20 yards. This is especially true when the RB could have run for many more yards if the ball was just somewhere else on the field when the play began.


July 8th, 2018 at 2:18 PM ^

True - you simply can't run for 30 yards when the play starts from the opponent's 20.  Whereas you can from mid-field.  

You can only solve for "how much does this 20 yard run help in this particular time and place."

Case 1: 2nd-and-8 from the opponent's 20.  Typically, teams in this position will score 2.7 points on that drive (I'm making this # up, but w/ a big enough data base you can figure this # out).  E.g., Expected Points = 2.7.  Typically, teams that run the ball in this spot will have an Expected Points = 2.9 on their next play.  So if Higdon runs for 20 yards, he should get credit for (6-2.9) = 3.1 "points added above expected points."

Case 2: 2nd-and-8 from your own 40.  Expected points = 1.4.  Typically, teams run the ball in this spot will have an Expected points = 1.5 on their next play.  If Higdon runs for 20, that's now 1st-and-10 from the opponent's 40.  Expected points there = 1.8.  So Higdon's run for 20 yards, he gets credit for (1.8-1.5) = 0.3 points of "points added above expected points."

I think this metric would solve for your question.  Higdon's 20 yards in Case 1 is a lot more valuable than in Case 2.  I made up a bunch of numbers above, but I'd be very surprised if there are many situations where "X yards for a touchdown, where theoretically he could have run for more" aren't more valuable than "X yards somewhere else on the field, where he could have run for more but didn't."

Now, of course --- Higdon is one of 11.  Good luck figuring how his "points added above expected points" gets apportioned among everybody else on the field.

I tend to think football is the most analytically difficult of all the sports.  Baseball is fairly easy - basketball comes next.  Then ice hockey and soccer.  Football has a ways to go though.


July 8th, 2018 at 6:13 PM ^

I think there should be a difference between someone who runs for 20 yards and gets a TD than someone who just runs for 20 yards. This is especially true when the RB could have run for many more yards if the ball was just somewhere else on the field when the play began.

Forrest Gump disagrees


July 8th, 2018 at 12:18 PM ^

An OL efficiency parameter would be nice. I would say 2017 and 2008 probably had low scores. 2001-2002, '95-'97, '86 and '76 may have been our better OL?


July 8th, 2018 at 12:37 PM ^

Early 90s along with '99 had great OL. '06 line was good enough to give Henne some time to complete his passes. I thought the '86 OL played rather well, if not for that Gopher game the regular season would have had 0 losses. I can't believe Cooper beat us in the bowl game that season.



July 8th, 2018 at 3:58 PM ^

I think QB sack stats go back quite a few years. You may be able to find a relation between QB sacks and QB hurries and extrapolate that back before the qb hurry stats were kept. Yes, before Rick Leach's soph season, Michigan probably ran the ball 85 percent of the time, so much will need to be normalized before 1976. Though, I remember the '76 OL was quite good. The right side dominated defenses with Dufek and Donahue. Lytle and Huckleby got quite a few yards running right.



July 8th, 2018 at 4:46 PM ^

I am going to suggest something contrary to most other suggestions. Don't try to incorporate everything into your initial model.  You are going to miss things and there are going to be some counterintuitive forces acting in the model.  Start with a simple model that is easy to build on and let the data "tell you" where to go next.  I also suggest monte-carlo simulations as a more intuitive way to test different ideas on small data sets.


(Full disclosure-most of the modeling I do is Bayesian so I am bias towards monte-carlo)


July 9th, 2018 at 12:08 PM ^

I wonder if there's any value in further breaking down the types of 30 yard passing plays, ie: plays where the ball traveled 30 yards in the air, versus clever RPS plays where slot ninjas and tight ends caught a quick 5 yard dump off on a crossing route and broke for big YAC. Both will be listed in the stat sheets the same, yet are completely different style plays and protection schemes. We've had plenty of both in different seasons under Harbaugh.

SMart WolveFan

July 11th, 2018 at 12:48 AM ^

Agree, especially when evaluating the QB. 

Efficiency on passes that "travel" 15 to 30 yards, specifically on obvious passing downs, is probably the best metric for QB skill;.

Plus the risk/reward factor of extending the field should be considered both in long TDs and in interceptions thrown on 50/50 balls. I always thought a shared point system for plays, especially TDs, might be a more interesting way to track a players impact.

For example, any TD play would be worth 6 points (1TD) and an individual player can get max 5 points because nobody but Barry could do it alone. So, on some plays it would be 

Oline 1 point, QB 2 points, RB 1 point, WR/TE 2 points

Others Oline 2, RB 4

some QB 5, WR 1

RB 3 WR 2, Oline 1 

WR 4, Oline 1, QB 1


Than you can reduce individual plays down based on their TD expectency.

An individual player would get a TD when he accumulated 6 points

So total TDs, either per game or season, would end in .0 to .5.

But Oline could also get credit for "TDs", I like that!

SMart WolveFan

July 10th, 2018 at 11:26 PM ^

One thing I've started to do some research on is what I call "2 down efficiency" or "the percentage of drives a team makes a first down on either 1st or 2nd down".

I suspect there are drive specific (obviously), game finishing and even season long cumulative advantages when a team limits the number of 3rd downs put on tape for later opponents to prep for.