MGoStatistics
Introduction
Summer is upon is, and with it, a bit of a lull in our mgoblogging fervor - there are simply not as many sports to talk about. The great wait for the football season begins.
With this in mind, what better time to celebrate this very blog in some bizarre and uniquely mgobloggish way? Hence I present: MGoStats, a statistical look at this blog over the years since its inception.
It began on December 4th, 2004, with the following post at 6:30am by some guy named Brian:
GoBlog()
{
HelloWorld
}
An inauspicious beginning, to say the least, but thus mgoblog was born. In the years since, we have all come here for a multitude of reasons: to celebrate the highs, commiserate during the lows, but mostly for one single reason, which is to hear what one Brian Cook has to say about all matters Michigan Football (and occasionally other sports).
So I found myself wondering: how much has Brian said over the years? A couple of python scripts later, I had some answers. I wrote a trivial script to download the entire blog (old pages are available through links of the form http://www.mgoblog.com/?page=X
, where higher X
values link to older pages), and then a less trivial script to parse the downloaded content into a more manageable form. The python SGML parser is amazing, for those of you who care about such things.
What I found follows below. Note: there may be some errors, but I believe the results to be in the right ballpark.
Overall Results
Perhaps the single most amazing fact is that Brian himself has written something on the order of 3 million words (or typed about 17 million characters) over about 3600 articles. Wow! That's a lot of content, from his hands to our eyeballs.
Who | Articles | Words (Millions) | Characters (Millions) |
Brian | 3595 | 2.952 | 17.48 |
Total | 3976 | 3.258 | 19.19 |
The table shows these sums, as well as the sums across all contributed articles (including ones from Tim, TomVH, formerlyanonymous, and anyone else who has made the front page). It might be interesting to see how these counts (number of articles, number of words, number of comments made by users) play out on a week-by-week basis. So interesting one could even make a ... chart? Chart. Or actually, Charts.
Charts
The first chart I present is the number of articles published per week over the entire existence of mgoblog.
From the chart, one can observe some interesting facts. First, from mgoblog we should expect about 14 articles per week on average over the course of a year. Second, that number is notably higher in the fall (no surprise), and lower in the spring. Finally, and perhaps most interestingly, one can see the growth of the mgoblog community in the orange bars, which represent articles written by somebody other than Brian; this content, which now represents a significant portion of mgoblog, picked up halfway through last year and has continued to get stronger. Brian's efforts at making the blog more than just himself are clearly having an impact.
The second chart just shows the number of words on a per week basis:
The graph reflects the same trends seen above, but in word counts. Even early on, Brian was producing above 10,000 words per week during football season, and last year during the same season, we were spoiled with over 30,000 words per week about the sport and team we love.
Finally, I show the number of comments per article:
The big effect in this graph is the lack of comments before the switch to the new blog infrastructure (e.g., the Haloscan era). The other effect is the growth of the community: the difference in the number of comments in Fall '08 and Fall '09 is likely a sign of the increased importance of this site as a place for the broad UM football community. Aside: the one early outlier which has a large number of comments (Fall '06) is just full of a bunch of comment spam: Unverified Voracity 99 Bonus Guest. Who knows why it's there, but Brian should probably remove those comments.
Longest Articles
I was also interested in what the longest articles were, but that should have been obvious: UFRs. Here are the ten longest articles (by number of letters in the article):
- 10. Upon Further Review: Defense vs Notre Dame (by Brian on September/16/2009, 48949 letters long)
- 9. Upon Further Review: Defense vs Iowa (by Brian on October/14/2009, 49477 letters long)
- 8. Upon Further Review: Defense vs Indiana (by Brian on September/30/2009, 49913 letters long)
- 7. Upon Further Review: Offense vs Iowa (by Brian on October/15/2009, 50279 letters long)
- 6. Upon Further Review: Offense vs Illinois (by Brian on November/5/2009, 50421 letters long)
- 5. Upon Further Review: Defense vs Purdue (by Brian on November/11/2009, 51002 letters long)
- 4. Upon Further Review: Offense vs Purdue (by Brian on November/12/2009, 51279 letters long)
- 3. Upon Further Review: Offense vs Notre Dame (by Brian on September/17/2009, 51572 letters long)
- 2. Upon Further Review: Offense vs Western Michigan (by Brian on September/10/2009, 51616 letters long)
- 1. Upon Further Review: Offense vs Indiana (by Brian on October/1/2009, 51721 letters long)
If you remove the UFRs from the list, these ten get the longest billing. A number of previews and various other summaries show up:
- 10. Michigan 2007, Part II: Defense (by Brian on August/31/2007, 28513 letters long)
- 9. Michigan State: Sometimes The Bar Eats You (by Brian on August/13/2007, 28636 letters long)
- 8. Purdue 2007: You're Killing Your Father, Larry (by Brian on August/23/2007, 29656 letters long)
- 7. Purdue 2008: Tiller On A Treadmill (by Brian on July/31/2008, 29964 letters long)
- 6. Illinois Preview: Redact This (by Brian on August/9/2007, 30014 letters long)
- 5. Michigan Preview 2005: A Tale Of Two Units, Part I (by Brian on August/30/2005, 30163 letters long)
- 4. Offense Unit By Unit, 2008 (by Brian on August/26/2008, 33989 letters long)
- 3. Michigan Preview Part I: Offense (by Brian on August/29/2006, 34844 letters long)
- 2. Penn State Preview: Stupefying (by Brian on July/20/2007, 35006 letters long)
- 1. Michigan 2007, Part I: Offense (by Brian on August/30/2007, 38809 letters long)
Most-Commented Upon Articles
I was also interested in the most commented-on articles. They were:
Nothing gets people rev'd up like the Offense's Units, or RAWK MUSIC, I guess.
Word Usage
Finally, I was generally curious as to what words show up in the blog. Sounds like a case for a ... chart? Nope. But close, a wordle:
The word cloud here shows a list of the most popular words used in this blog, with some editing done by y.t. to remove words like "the" (actually the most popular word on the site) and so forth.
Anyhow, that's all for now. An amazing amount of content, built up over the years on the backs of UFRs and other regular features we all know and love. Thanks Brian for all the hard work - it is truly staggering to see the sheer verbiage that has powered the site over the years.
Comments