MGoStatistics

Submitted by Swayze Howell Sheen on

Introduction

Summer is upon is, and with it, a bit of a lull in our mgoblogging fervor - there are simply not as many sports to talk about. The great wait for the football season begins.

With this in mind, what better time to celebrate this very blog in some bizarre and uniquely mgobloggish way? Hence I present: MGoStats, a statistical look at this blog over the years since its inception.

It began on December 4th, 2004, with the following post at 6:30am by some guy named Brian:

 

GoBlog()
{
HelloWorld
}

An inauspicious beginning, to say the least, but thus mgoblog was born. In the years since, we have all come here for a multitude of reasons: to celebrate the highs, commiserate during the lows, but mostly for one single reason, which is to hear what one Brian Cook has to say about all matters Michigan Football (and occasionally other sports).

So I found myself wondering: how much has Brian said over the years? A couple of python scripts later, I had some answers. I wrote a trivial script to download the entire blog (old pages are available through links of the form http://www.mgoblog.com/?page=X, where higher X values link to older pages), and then a less trivial script to parse the downloaded content into a more manageable form. The python SGML parser is amazing, for those of you who care about such things.

What I found follows below. Note: there may be some errors, but I believe the results to be in the right ballpark.

Overall Results

Perhaps the single most amazing fact is that Brian himself has written something on the order of 3 million words (or typed about 17 million characters) over about 3600 articles. Wow! That's a lot of content, from his hands to our eyeballs.

 

Who Articles Words (Millions) Characters (Millions)
Brian 3595 2.952 17.48
Total 3976 3.258 19.19

The table shows these sums, as well as the sums across all contributed articles (including ones from Tim, TomVH, formerlyanonymous, and anyone else who has made the front page). It might be interesting to see how these counts (number of articles, number of words, number of comments made by users) play out on a week-by-week basis. So interesting one could even make a ... chart? Chart. Or actually, Charts.

Charts

The first chart I present is the number of articles published per week over the entire existence of mgoblog.

From the chart, one can observe some interesting facts. First, from mgoblog we should expect about 14 articles per week on average over the course of a year. Second, that number is notably higher in the fall (no surprise), and lower in the spring. Finally, and perhaps most interestingly, one can see the growth of the mgoblog community in the orange bars, which represent articles written by somebody other than Brian; this content, which now represents a significant portion of mgoblog, picked up halfway through last year and has continued to get stronger. Brian's efforts at making the blog more than just himself are clearly having an impact.

The second chart just shows the number of words on a per week basis:

The graph reflects the same trends seen above, but in word counts. Even early on, Brian was producing above 10,000 words per week during football season, and last year during the same season, we were spoiled with over 30,000 words per week about the sport and team we love.

Finally, I show the number of comments per article:

The big effect in this graph is the lack of comments before the switch to the new blog infrastructure (e.g., the Haloscan era). The other effect is the growth of the community: the difference in the number of comments in Fall '08 and Fall '09 is likely a sign of the increased importance of this site as a place for the broad UM football community. Aside: the one early outlier which has a large number of comments (Fall '06) is just full of a bunch of comment spam: Unverified Voracity 99 Bonus Guest. Who knows why it's there, but Brian should probably remove those comments.

Longest Articles

I was also interested in what the longest articles were, but that should have been obvious: UFRs. Here are the ten longest articles (by number of letters in the article):

 

If you remove the UFRs from the list, these ten get the longest billing. A number of previews and various other summaries show up:

 

Most-Commented Upon Articles

I was also interested in the most commented-on articles. They were:

 

 

  • 10. The Feagin Reveal (by Brian on August/9/2009, 202 comments)
  •  

  • 9. An Interview With Compliance Guy (by Brian on February/24/2010, 213 comments)
  •  

  • 8. It (by Brian on July/30/2009, 218 comments)
  •  

  • 7. Hello: Shawn Conway (by Tim on February/20/2010, 236 comments)
  •  

  • 6. Unverified Voracity 9/9 + Bonus Guest! (by Brian on September/9/2005, 244 comments)
  •  

  • 5. Morgan Trent Is Not All In (by Brian on May/10/2010, 284 comments)
  •  

  • 4. Need A Whiskey To Boycott? (by Brian on February/25/2009, 299 comments)
  •  

  • 3. " (by Brian on November/9/2009, 402 comments)
  •  

  • 2. Noise, Piped-In And Otherwise (by Brian on September/15/2009, 410 comments)
  •  

  • 1. Offense Unit By Unit, 2008 (by Brian on August/26/2008, 529 comments)
  • Nothing gets people rev'd up like the Offense's Units, or RAWK MUSIC, I guess.

    Word Usage

    Finally, I was generally curious as to what words show up in the blog. Sounds like a case for a ... chart? Nope. But close, a wordle:

    The word cloud here shows a list of the most popular words used in this blog, with some editing done by y.t. to remove words like "the" (actually the most popular word on the site) and so forth.

    Anyhow, that's all for now. An amazing amount of content, built up over the years on the backs of UFRs and other regular features we all know and love. Thanks Brian for all the hard work - it is truly staggering to see the sheer verbiage that has powered the site over the years.

    Comments