Any interesting sports analytics storylines for 2018 football season?

July 8th, 2018 at 9:48 AM ^

Lol, I appreciate it.

Joined: 05/08/2009

MGoPoints: 1698

Hotel Putingrad

July 8th, 2018 at 10:31 AM ^

The guys at The Athletic use a lot of fancystats.

Joined: 12/10/2014

MGoPoints: 212736

July 8th, 2018 at 10:49 AM ^

I've never seen it analytically proven out ---- but I've become more and more convinced (my gut feel) that college teams are not throwing the deep ball (the proverbial "50-50" ball) nearly enough. I think there is more reward there than risk.

The CFBAnalysis subreddit is a decent source for data. I personally have drive-by-drive data for most D-1/FBS college football games played since 2005. Drive-by-drive data is nice but for a question like the above, you really need play-by-play data. That takes a whole lot longer to compile. Play-by-play data can also be purchased, but that's on the order of hundreds of dollars.

Joined: 12/04/2012

MGoPoints: 43891

Hugh White

July 8th, 2018 at 10:56 AM ^

Intriguing. You’re suggesting that the 50-50 Ball is not, in fact, a 50-50 Ball. It will need a new name.

Joined: 11/26/2012

MGoPoints: 12405

July 8th, 2018 at 11:11 AM ^

Most of them aren't true 50-50 balls - but that's the fairly common nomenclature.

If you had a rich-enough data base, you could:

1. Look at the number of 30+ yard downfield passes that were thrown. Determine that X% of the time it resulted in a TD, Y% of the time it resulted in a 30+ yard gain (no TD), Z% of the time it resulted in an interception, et cetera ....

2. Take a historical team and --- in the context of all the other plays they ran that year --- increase the percentage of times they threw a 30+ yard downfield pass. Leverage the stats above. Simulate their drives/seasons thousands of times. See if their offensive efficiency increases or decreases in the aggregate.

That's not a perfect method. And you need a hell of a lot of data. And College Football stats are goofier than NFL stats because there are so many mis-matches (in step 1 above, you should throw out all the FBS/FCS games and a chunk of the FBS/FBS games too). But the above method would at least hint at an answer.

Joined: 12/04/2012

MGoPoints: 43891

mongoose0614

July 8th, 2018 at 11:51 AM ^

The big variable is the penalty for PI in the NFL being a spot foul vs 15yd variety in college.

This skews the numbers pretty good for the reward ratio in NFL vs the college game.

Joined: 01/12/2009

MGoPoints: 1647

Ghost of Fritz…

July 8th, 2018 at 1:55 PM ^

O.k,, sounds interesting.

But how do you control for teams that have better passing games in general being the teams that will make more 30+ yard pass attempts (precisely because they have the good QB, receivers, and pass blocking o-line, that makes them comparatively better than other teams at success on that type of play).

IOW, assume that the numbers show that teams that throw more 30+ yard passes in fact do measure out as having higher offensive efficiency.

Is that because they are attempting more 30+ yard passes? Or is that because they had the QB, receiver, o-line combo to make 30+ yard passes a good probability proposition in the first place (which then leads those teams to attempt more 30+ yard passes than average)?

For example, was Michigan's 2017 offense bad (in part) because they did not attempt enough 30+ yard passes?

Or was it bad because they had a problematic o-line, inexperienced receivers that were not good route runners, and questionable QBs (all of which lead the OC to rarely attempt 30+ yard pass plays)?

What is the causal element? Mediocre offensive players (at least mediocre for long passes)? Or the lack of 30+ yard pass attempts?

Joined: 10/04/2014

MGoPoints: 23880

July 8th, 2018 at 2:25 PM ^

Agreed ---- there is A LOT that you would have to control for.

I admit that an analysis like I suggest would be answering questions at a "fairly high" and "directional" level.

And even if an analysis said "college football teams as a whole would become more efficient in the aggregate by throwing more deep balls", it may not add value for an individual team. E.g., an offense that already runs the ball for 7-yards-at-a-chunk and rarely turns it over. There's no value to adding "more long balls." Their offense is already very efficient.

Such is data analytics. Often "fairly high", "directional", and "in the aggregate" is as good an answer as one can get. Still doesn't mean it's an analysis not worth attempting.

Joined: 12/04/2012

MGoPoints: 43891

DoubleB

July 8th, 2018 at 3:24 PM ^

An offense that runs the ball for 7 yards a chunk SHOULD ONLY throw deep balls. LBs and safeties cheat up and you have 1-on-1 matchups (and maybe 1 on none with some well designed play action passes).

What they should NOT do is throw 3-step hitches for 5 yards a pop or complicated 5-step patterns.

Joined: 11/19/2008

MGoPoints: 7470

MJ14

July 8th, 2018 at 2:09 PM ^

I think you are just confused after watching Penn State throw up prayer after prayer and somehow coming down with way too many of those.

Joined: 01/09/2011

MGoPoints: 16453

July 8th, 2018 at 2:32 PM ^

I admit --- PSU 2016 made me think about this theory more, but I generally thought that well before 2 years ago.

Bill Connelly is a smart guy. There is a reason he has "explosiveness" as one of his five factors. "Efficiency" is one of his five factors too. But the best teams are BOTH explosive and efficient.

And --- in theory --- one way to increase "explosiveness" is to increase the "number of chances you have to make an explosive play."

You don't win the bets you don't make.

Joined: 12/04/2012

MGoPoints: 43891

July 8th, 2018 at 2:45 PM ^

This awesome. Thanks!

Joined: 05/08/2009

MGoPoints: 1698

July 8th, 2018 at 3:30 PM ^

I don't want to put my personal e-mail out here on this board. And unfortuantely MGoBlog doesn't have a Direct Message component.

But search for me on CFBAnalytics (hint: search for various forms of "Nittany" amongst user names) and send me a Direct Message there. I can then share out some of my dataset w/ you (if it would be useful).

Joined: 12/04/2012

MGoPoints: 43891

July 9th, 2018 at 6:37 AM ^

That be awesome. I’ll see if I can find you.

Joined: 05/08/2009

MGoPoints: 1698

DoubleB

July 8th, 2018 at 3:21 PM ^

I agree with you 100% although what is often forgotten is that to often throw a 30+ yard a team has to protect the QB for more than a nanosecond.

The pros definitely don't throw it downfield enough although the protection issue is even more acute.

Joined: 11/19/2008

MGoPoints: 7470

Hugh White

July 8th, 2018 at 11:04 AM ^

Suggestions:

Visit this blog: http://harvardsportsanalysis.org/.

HSAC itself has a “Resources” page linking to a wide array of data and other like-minded pages: http://harvardsportsanalysis.org/links-2/

Joined: 11/26/2012

MGoPoints: 12405

July 8th, 2018 at 2:48 PM ^

Looks very rich. Thanks!

Joined: 05/08/2009

MGoPoints: 1698

M-GO-Beek

July 8th, 2018 at 11:15 AM ^

I have always wondered about efficiency based on field location (but obviously not enough to actually evaluate it myself). For example a WR or RB who gets a 1 yard touchdown has obviously helped the team and done their job, but when averaged into their stats, the 1 catch/carry doesn't really help them in their overall statistics and isn't properly captured. What was the hammering panda's per carry average, yet he had numerous touchdown. If you instead look at that he covered 100% of his total potential yards on many of his carries, I think his value would be better conveyed than his lousy yards/carry. I think the offensive efficiency statistic (5yds on 1st down, half the yards for a first down on 2nd and all the yards on 3rd down) is close, but does not tell the same story. Unfortunately, I don't have a suggestion for a database to study this (otherwise, I may have actually looked at it myself).

Joined: 02/03/2015

MGoPoints: 5667

July 8th, 2018 at 11:38 AM ^

I like that idea. I think the closest stat to that right now conceptually is EPA (Expected Points Added). E.g., on 1st-and-goal from the 1, if a college team runs the football, they will typically score 3.4 points on the play. If Hill scores a TD, that's a +2.6 for him. If he gains 0 yards, that's a -3.4. If he fumbles and it's returned for 6, that's a -9.4. (I'm not sure how the extra point gets factored into all that).

Could also do the same thing with WPA (Win Percentage Added). That makes a Hill TD run in a 10-10 game more valuable than a TD run in a 35-3 game.

I think the NFL Advanced Analytics guys use both of these - but I haven't really seen it for College. Bill Connelly's offensive efficiency is closest, yes.

Joined: 12/04/2012

MGoPoints: 43891

M-GO-Beek

July 8th, 2018 at 12:17 PM ^

yeah, obviously I picked the extreme example, but I think there should be a difference between someone who runs for 20 yards and gets a TD than someone who just runs for 20 yards. This is especially true when the RB could have run for many more yards if the ball was just somewhere else on the field when the play began.

Joined: 02/03/2015

MGoPoints: 5667

July 8th, 2018 at 2:18 PM ^

True - you simply can't run for 30 yards when the play starts from the opponent's 20. Whereas you can from mid-field.

You can only solve for "how much does this 20 yard run help in this particular time and place."

Case 1: 2nd-and-8 from the opponent's 20. Typically, teams in this position will score 2.7 points on that drive (I'm making this # up, but w/ a big enough data base you can figure this # out). E.g., Expected Points = 2.7. Typically, teams that run the ball in this spot will have an Expected Points = 2.9 on their next play. So if Higdon runs for 20 yards, he should get credit for (6-2.9) = 3.1 "points added above expected points."

Case 2: 2nd-and-8 from your own 40. Expected points = 1.4. Typically, teams run the ball in this spot will have an Expected points = 1.5 on their next play. If Higdon runs for 20, that's now 1st-and-10 from the opponent's 40. Expected points there = 1.8. So Higdon's run for 20 yards, he gets credit for (1.8-1.5) = 0.3 points of "points added above expected points."

I think this metric would solve for your question. Higdon's 20 yards in Case 1 is a lot more valuable than in Case 2. I made up a bunch of numbers above, but I'd be very surprised if there are many situations where "X yards for a touchdown, where theoretically he could have run for more" aren't more valuable than "X yards somewhere else on the field, where he could have run for more but didn't."

Now, of course --- Higdon is one of 11. Good luck figuring how his "points added above expected points" gets apportioned among everybody else on the field.

I tend to think football is the most analytically difficult of all the sports. Baseball is fairly easy - basketball comes next. Then ice hockey and soccer. Football has a ways to go though.

Joined: 12/04/2012

MGoPoints: 43891

July 8th, 2018 at 2:50 PM ^

So looks like this is blending drives with individual plays. Interesting

Joined: 05/08/2009

MGoPoints: 1698

Muttley

July 8th, 2018 at 6:13 PM ^

I think there should be a difference between someone who runs for 20 yards and gets a TD than someone who just runs for 20 yards. This is especially true when the RB could have run for many more yards if the ball was just somewhere else on the field when the play began.

Forrest Gump disagrees

Joined: 07/07/2009

MGoPoints: -73245

Bluetotoy

July 9th, 2018 at 8:52 AM ^

Great idea and easy to convert to valuable fantasy stats.

Joined: 01/02/2017

MGoPoints: 186

July 8th, 2018 at 12:18 PM ^

An OL efficiency parameter would be nice. I would say 2017 and 2008 probably had low scores. 2001-2002, '95-'97, '86 and '76 may have been our better OL?

Joined: 11/10/2009

MGoPoints: 32169

July 8th, 2018 at 12:37 PM ^

Early 90s along with '99 had great OL. '06 line was good enough to give Henne some time to complete his passes. I thought the '86 OL played rather well, if not for that Gopher game the regular season would have had 0 losses. I can't believe Cooper beat us in the bowl game that season.

Joined: 11/10/2009

MGoPoints: 32169

July 8th, 2018 at 2:53 PM ^

I wonder how far back stats like hurries, hits, knockdowns, etc go. Would think that style of play for various eras would need to be normalized somehow

Joined: 05/08/2009

MGoPoints: 1698

July 8th, 2018 at 3:58 PM ^

I think QB sack stats go back quite a few years. You may be able to find a relation between QB sacks and QB hurries and extrapolate that back before the qb hurry stats were kept. Yes, before Rick Leach's soph season, Michigan probably ran the ball 85 percent of the time, so much will need to be normalized before 1976. Though, I remember the '76 OL was quite good. The right side dominated defenses with Dufek and Donahue. Lytle and Huckleby got quite a few yards running right.

Joined: 11/10/2009

MGoPoints: 32169

taistreetsmyhero

July 8th, 2018 at 3:52 PM ^

I spent some time compiling old UFRs into an excel sheet, but I didn’t know how to write a script that could do it for me and it was mind numbingly tedious.

Joined: 08/08/2012

MGoPoints: 39396

July 8th, 2018 at 4:02 PM ^

Save the UFR xcel sheet into a delimited file and write a perl script to parse out what you need. I wonder if Brian does this to create stats on everything.

Joined: 11/10/2009

MGoPoints: 32169