that makes one of us
- Member for
- 3 years 31 weeks
|1 week 5 days ago||yes, that's right. One||
yes, that's right.
One way to think about this is that the initial regression results were estimated using the smallest bins possible (i.e., one bin per observation). By binning, we artificially boost the R^2, and also introduce some bias in the regression estimates. This suggests that the first model should be the one we use to predict, not the last model (though we probably shouldn't put too much stock in these predictions anyway).
For DG, we have:
78.564 - .5677 × 146.1 = -4.37697
|1 week 6 days ago||Binning and R^2||
I don't think your binning/averaging approach works. It only compresses the data around the best fit line, which simply reduces the variance to be explained. Given the definition of R^2, it’s not surprising at all to see its value jump near 1.
If you’re worried about outliers and high leverage points, try using some form of robust regression (e.g., M-estimator). Though to be honest, I’m really not sure that would be worth it: to my eye, the plots don’t seem to reveal any huge outlier issues.
Overall, I would say a .31 R^2 is actually pretty impressive given that this is just a bivariate relationship.
|30 weeks 1 day ago||Same reaction||
I don't comment often but I have to say I had the same reaction. This was a careless and insensitive post. This website is an important platform for you, and that comes with responsibility.
|50 weeks 1 day ago||Slow mo is awesome||
Thanks for figuring out
|50 weeks 3 days ago||good stuff||
I even like the light dose of personal anecdotes. Thanks!
|1 year 2 weeks ago||thanks dude||
|1 year 6 weeks ago||I like this feature||
I really like this new feature. Keep it coming!
|1 year 10 weeks ago||great read||
|1 year 11 weeks ago||Sexist!||
Ha! I was sooo ready to call you a dumb sexist f**k. But yeah, this is 100% right...
|1 year 13 weeks ago||wow!||
|1 year 42 weeks ago||delete||
|1 year 42 weeks ago||R?||
Was that an R reference on Mgoblog? It sure looks like it, but it all seems so improbable...
|1 year 44 weeks ago||thanks for the pointer. that||
thanks for the pointer. that was awesome
|1 year 45 weeks ago||This is awesome||
|1 year 49 weeks ago||+1||
|1 year 51 weeks ago||sounds good||
Where in A2 can you buy that stuff?
|1 year 51 weeks ago||Moosehead||
I'm a cheap Canadian
|1 year 51 weeks ago||Thanks!||
|2 years 3 days ago||Year-to-year dependence||
I did something very similar, looking at year-to-year dependence for fumble recoveries in 120 teams over a 10 year period. I ran a simple linear model, regressing fumble recoveries in year T on recoveries in year T-1. I also tried including "fixed-effects" (i.e. team dummies) to control for unobserved heterogeneity between teams. The results are pretty clear:
Last year's fumble recovery rate explains only about 2% of the variance in this year's numbers (R^2). Also, the coefficient on the lagged dependent variable appears to be *negative*
I also drew a neat picture that Brian used on the front page at some point.
The data and R code I used can be found here:
Here's the content of my email to Brian:
A sentence in Blue Seoul's recent Nebraska recap led me to re-visit your claims that fumble recoveries are random. I got data from teamrankings.com and drew a graph that you may (or may not) find useful (see attached for 2012 opponents).
In addition, I estimated a simple linear model using fumble recovery rate in year t to predict fumble recovery rate in year t+1. The model also controls for the baseline recovery rate of each team by allowing intercept shifts. Two things stand out. First, past recovery rates explain relatively little variation in current recovery rates (R-squared=0.13 in a model with lagged DV and team fixed effects). Second, the relationship between recovery rates last year and recovery rates this year is *negative* and statistically significant. If this general pattern holds true in the case of Michigan this year, we should expect the team to recover fumbles at a slightly lower rate than its baseline for the 2002-2012 period (i.e. 61.57%). Obviously, this is a ridiculously simple model, but it reinforces whatever evidence you were basing your previous comments on.
And the basic results:
> # Manual FE > mod_fe = plm(recovery_rate ~ lag(recovery_rate, 1), model='within', data=dat) > mod_pool = plm(recovery_rate ~ lag(recovery_rate, 1), model='pooling', data=dat) > summary(mod_fe) Oneway (individual) effect Within Model Call: plm(formula = recovery_rate ~ lag(recovery_rate, 1), data = dat, model = "within") Unbalanced Panel: n=120, T=4-9, N=1069 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -113.000 -5.590 0.113 5.620 88.600 Coefficients : Estimate Std. Error t-value Pr(>|t|) lag(recovery_rate, 1) -0.148795 0.032253 -4.6134 4.505e-06 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 130790 Residual Sum of Squares: 127920 R-Squared : 0.021958 Adj. R-Squared : 0.019472 F-statistic: 21.2833 on 1 and 948 DF, p-value: 4.5054e-06 > summary(mod_pool) Oneway (individual) effect Pooling Model Call: plm(formula = recovery_rate ~ lag(recovery_rate, 1), data = dat, model = "pooling") Unbalanced Panel: n=120, T=4-9, N=1069 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -126.000 -5.840 0.208 5.880 107.000 Coefficients : Estimate Std. Error t-value Pr(>|t|) (Intercept) 51.308810 1.581618 32.4407 <2e-16 *** lag(recovery_rate, 1) -0.026893 0.030823 -0.8725 0.3831 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Total Sum of Squares: 147510 Residual Sum of Squares: 147410 R-Squared : 0.00071295 Adj. R-Squared : 0.00071161 F-statistic: 0.761256 on 1 and 1067 DF, p-value: 0.38313
|2 years 5 days ago||Refugee irony||
Yeah, this was funny at first, but I think you're starting to overplay that refugee-irony bit. I'm sure you're right that the MSM will have a trip with this stuff, but that doesn't give us the right to trivialize.
|2 years 4 weeks ago||Delete me||
|2 years 8 weeks ago||Abstracts||
Most academic journal articles start with an abstract that states the question that is adressed and that answer that is given. So I guess I disagree with your characterization of academic writing as mystery novels.
|2 years 14 weeks ago||Wrong comparison group||
"Ex-NFL players are dying at a rate half that of the general population after they retire and are 59 percent less likely to commit suicide."
That's the wrong control group. Comparing how long athletes live to the general population, including fat dudes who have never exercised, is meaningless. The interesting counterfactual is not whether or not these guys would have lived longer if they exercised as much as me, but whether they would have lived longer *in the same physical condition except for the head trauma*.
|2 years 15 weeks ago||My vote goes here|
|2 years 17 weeks ago||right you are||
right you are
|2 years 17 weeks ago||I don't get it||
What was the point of this? I mean, the guy writes decently well, and I guess I wouldn't pass up free advertising for my book in the NY Times either, but really? There are tons of people working shitty jobs and double/night shifts, and this guy complains cause he retired in his mid-30s...
|2 years 19 weeks ago||Or this gem from last year: no suspension, no fine||
Chara hits Pacioretty
|2 years 22 weeks ago||A counterpoint||
I just feel it's my responsibility to say something here. Michigan is a great school, and I am honored to be learning and teaching there. But I think it's important to put things into perspective. The main reason you're going to spend thousands of dollars on college in the next few years is to learn useful stuff. In that regard, Michigan is certainly a fantastic environment, but it also not *that* exceptional. Your life will (hopefully) last a long time, and the years of college should be fun ones, but loving a team and feeling at home on campus for 4 years shouldn't be #1 or even #10 on your list of priorities when looking for a school. Look at the long game.
|2 years 23 weeks ago||you're ugly||
yes you are
|2 years 23 weeks ago||+1||
Smith looks like a machine.