Is it just me or does it seem odd that "Penn" would show up as a popular word but not State? You'd think with the combo of Penn making the list and the endless number of threads on State, it would show up. I guess it's just another sign of their unimportance.
Summer is upon is, and with it, a bit of a lull in our mgoblogging fervor - there are simply not as many sports to talk about. The great wait for the football season begins.
With this in mind, what better time to celebrate this very blog in some bizarre and uniquely mgobloggish way? Hence I present: MGoStats, a statistical look at this blog over the years since its inception.
It began on December 4th, 2004, with the following post at 6:30am by some guy named Brian:
An inauspicious beginning, to say the least, but thus mgoblog was born. In the years since, we have all come here for a multitude of reasons: to celebrate the highs, commiserate during the lows, but mostly for one single reason, which is to hear what one Brian Cook has to say about all matters Michigan Football (and occasionally other sports).
So I found myself wondering: how much has Brian said over the years? A couple of python scripts later, I had some answers. I wrote a trivial script to download the entire blog (old pages are available through links of the form
http://www.mgoblog.com/?page=X, where higher
X values link to older pages), and then a less trivial script to parse the downloaded content into a more manageable form. The python SGML parser is amazing, for those of you who care about such things.
What I found follows below. Note: there may be some errors, but I believe the results to be in the right ballpark.
Perhaps the single most amazing fact is that Brian himself has written something on the order of 3 million words (or typed about 17 million characters) over about 3600 articles. Wow! That's a lot of content, from his hands to our eyeballs.
|Who||Articles||Words (Millions)||Characters (Millions)|
The table shows these sums, as well as the sums across all contributed articles (including ones from Tim, TomVH, formerlyanonymous, and anyone else who has made the front page). It might be interesting to see how these counts (number of articles, number of words, number of comments made by users) play out on a week-by-week basis. So interesting one could even make a ... chart? Chart. Or actually, Charts.
The first chart I present is the number of articles published per week over the entire existence of mgoblog.
From the chart, one can observe some interesting facts. First, from mgoblog we should expect about 14 articles per week on average over the course of a year. Second, that number is notably higher in the fall (no surprise), and lower in the spring. Finally, and perhaps most interestingly, one can see the growth of the mgoblog community in the orange bars, which represent articles written by somebody other than Brian; this content, which now represents a significant portion of mgoblog, picked up halfway through last year and has continued to get stronger. Brian's efforts at making the blog more than just himself are clearly having an impact.
The second chart just shows the number of words on a per week basis:
The graph reflects the same trends seen above, but in word counts. Even early on, Brian was producing above 10,000 words per week during football season, and last year during the same season, we were spoiled with over 30,000 words per week about the sport and team we love.
Finally, I show the number of comments per article:
The big effect in this graph is the lack of comments before the switch to the new blog infrastructure (e.g., the Haloscan era). The other effect is the growth of the community: the difference in the number of comments in Fall '08 and Fall '09 is likely a sign of the increased importance of this site as a place for the broad UM football community. Aside: the one early outlier which has a large number of comments (Fall '06) is just full of a bunch of comment spam: Unverified Voracity 99 Bonus Guest. Who knows why it's there, but Brian should probably remove those comments.
I was also interested in what the longest articles were, but that should have been obvious: UFRs. Here are the ten longest articles (by number of letters in the article):
- 10. Upon Further Review: Defense vs Notre Dame (by Brian on September/16/2009, 48949 letters long)
- 9. Upon Further Review: Defense vs Iowa (by Brian on October/14/2009, 49477 letters long)
- 8. Upon Further Review: Defense vs Indiana (by Brian on September/30/2009, 49913 letters long)
- 7. Upon Further Review: Offense vs Iowa (by Brian on October/15/2009, 50279 letters long)
- 6. Upon Further Review: Offense vs Illinois (by Brian on November/5/2009, 50421 letters long)
- 5. Upon Further Review: Defense vs Purdue (by Brian on November/11/2009, 51002 letters long)
- 4. Upon Further Review: Offense vs Purdue (by Brian on November/12/2009, 51279 letters long)
- 3. Upon Further Review: Offense vs Notre Dame (by Brian on September/17/2009, 51572 letters long)
- 2. Upon Further Review: Offense vs Western Michigan (by Brian on September/10/2009, 51616 letters long)
- 1. Upon Further Review: Offense vs Indiana (by Brian on October/1/2009, 51721 letters long)
If you remove the UFRs from the list, these ten get the longest billing. A number of previews and various other summaries show up:
- 10. Michigan 2007, Part II: Defense (by Brian on August/31/2007, 28513 letters long)
- 9. Michigan State: Sometimes The Bar Eats You (by Brian on August/13/2007, 28636 letters long)
- 8. Purdue 2007: You're Killing Your Father, Larry (by Brian on August/23/2007, 29656 letters long)
- 7. Purdue 2008: Tiller On A Treadmill (by Brian on July/31/2008, 29964 letters long)
- 6. Illinois Preview: Redact This (by Brian on August/9/2007, 30014 letters long)
- 5. Michigan Preview 2005: A Tale Of Two Units, Part I (by Brian on August/30/2005, 30163 letters long)
- 4. Offense Unit By Unit, 2008 (by Brian on August/26/2008, 33989 letters long)
- 3. Michigan Preview Part I: Offense (by Brian on August/29/2006, 34844 letters long)
- 2. Penn State Preview: Stupefying (by Brian on July/20/2007, 35006 letters long)
- 1. Michigan 2007, Part I: Offense (by Brian on August/30/2007, 38809 letters long)
Most-Commented Upon Articles
I was also interested in the most commented-on articles. They were:
Nothing gets people rev'd up like the Offense's Units, or RAWK MUSIC, I guess.
Finally, I was generally curious as to what words show up in the blog. Sounds like a case for a ... chart? Nope. But close, a wordle:
The word cloud here shows a list of the most popular words used in this blog, with some editing done by y.t. to remove words like "the" (actually the most popular word on the site) and so forth.
Anyhow, that's all for now. An amazing amount of content, built up over the years on the backs of UFRs and other regular features we all know and love. Thanks Brian for all the hard work - it is truly staggering to see the sheer verbiage that has powered the site over the years.
I believe this could be accounted for due to an odd obsession with the actor Sean Penn.
It's probably because some people will write Penn St. and others will write Penn State. So Penn is used more than St. and State.
That week to end 2009 with no comments. Seems strange to me. Sure, holidays and such. But there were also bowl games and stuff going on. I would have assumed at least a bit closer to average comment levels.
This is good stuff Coach. Always interesting to see info like this on the blog itself.
Perhaps, front-page worthy?
Brian averages 18 more words per post than everyone else. I must make my posts longer to suffice!
Short take: I find this diary amusing.
Long take: I started following this blog around when we were looking for a new coach and it hooked me, it was awesome then and that's why I got hooked, but I have to say that the move to Drupal, the addition of the board/diaries and the Mgomerger were all nothing short of brilliant. People who don't read as frequently as I do might not know that this is the most frequently visited college football blog there is, but it's a testament to Brian's foresight in those moves. There's more content now- some of it (including 99%, maybe 100% of my contributions), doesn't add much, but the Decimated Defense posts, Mathlete's analysis, Tom VH's recruiting posts, Pauls videos and Tim and FA's general all around coverage elevate this blog to more than what's possible with just one person's perspective, no matter how thoughtful and insightful that person might be. A big part of Brian's genius is in recognizing and tapping into that. When I have a real job and am making more than $9 an hour I'll show the full extent of my appreciation in monetary terms, but until then all I can say is Hail! This site is the embodiment of Leaders and Best.
Sidenote: Aarongoblue, by virtue of some glitch accounts for something like 300 of the comments on the Offense By Unit 2008 post. If somebody wants to count, that number should be revised.
Nice work. Does the most commented articles take into account the old days of haloscan? IIRC, the comments sections over there got pretty long during the coaching search. IMO, probably not worth remembering. Also, does your program count the number of cat pictures posted?
FA, you must have some sharp eyes because I looked for Penn on that wordle for a while before I found it. Shocked that state isn't on there.
Each comment by THE KNOWLEDGE counts as ten comments.
In that wordle, is that Rodriguez with a G or a Q?
didn't make the wordle.
To be serious for a moment, Brian's accomplishment thus far is quite impressive. I have been a reader much longer than a contributor -- and most of my (ahem) contributions don't amount to much (see subject line above). The volume and quality of content that Brian has produced and disseminated is a rare commodity on the web. Even more extraordinary is that the content is related to our favorite academic and athletic institution. This is a great web community and it keeps me close to the University.
I'm surprised "chart" didn't make the wordle.
Also, in re: "state," oftentimes it's abbreviated to "St." And oftentimes we use "Little Brother" instead.
At a time when there is very little going on, I found myself very interested in this post. It is great to see the growth via charts.
I don't know if this is just a mistake on my computer or just what but it is showing me that the #1 non-UFR article has a post that is repeated like 200 times. Anyone else see this???
I admit, though, I was disappointed that this only counted articles that were front-paged. Diaries and board posts weren't in it.
I'd be more interested in stats about that. Which posters respond the most, which get the most plusses per response, which have the most words/characters per post, and who uses the longest words (just divide the number of characters by words).
and raise you an antidisestablishmentarianism.
This is a really amusing post, it's cool to see how much this site has grown recently.
"Tremendous" is not being used enough
Tremendous work will be put in to fixing this tremendous problem.
How could Kittens not make the wordle??? Here's hoping that next year "muppets" once again rules the mgoblog wordle.