MGoStatistics

Submitted by Swayze Howell Sheen on May 15th, 2010 at 11:19 PM

Introduction

Summer is upon is, and with it, a bit of a lull in our mgoblogging fervor - there are simply not as many sports to talk about. The great wait for the football season begins.

With this in mind, what better time to celebrate this very blog in some bizarre and uniquely mgobloggish way? Hence I present: MGoStats, a statistical look at this blog over the years since its inception.

It began on December 4th, 2004, with the following post at 6:30am by some guy named Brian:

 

GoBlog()
{
HelloWorld
}

An inauspicious beginning, to say the least, but thus mgoblog was born. In the years since, we have all come here for a multitude of reasons: to celebrate the highs, commiserate during the lows, but mostly for one single reason, which is to hear what one Brian Cook has to say about all matters Michigan Football (and occasionally other sports).

So I found myself wondering: how much has Brian said over the years? A couple of python scripts later, I had some answers. I wrote a trivial script to download the entire blog (old pages are available through links of the form http://www.mgoblog.com/?page=X, where higher X values link to older pages), and then a less trivial script to parse the downloaded content into a more manageable form. The python SGML parser is amazing, for those of you who care about such things.

What I found follows below. Note: there may be some errors, but I believe the results to be in the right ballpark.

Overall Results

Perhaps the single most amazing fact is that Brian himself has written something on the order of 3 million words (or typed about 17 million characters) over about 3600 articles. Wow! That's a lot of content, from his hands to our eyeballs.

 

Who Articles Words (Millions) Characters (Millions)
Brian 3595 2.952 17.48
Total 3976 3.258 19.19

The table shows these sums, as well as the sums across all contributed articles (including ones from Tim, TomVH, formerlyanonymous, and anyone else who has made the front page). It might be interesting to see how these counts (number of articles, number of words, number of comments made by users) play out on a week-by-week basis. So interesting one could even make a ... chart? Chart. Or actually, Charts.

Charts

The first chart I present is the number of articles published per week over the entire existence of mgoblog.

From the chart, one can observe some interesting facts. First, from mgoblog we should expect about 14 articles per week on average over the course of a year. Second, that number is notably higher in the fall (no surprise), and lower in the spring. Finally, and perhaps most interestingly, one can see the growth of the mgoblog community in the orange bars, which represent articles written by somebody other than Brian; this content, which now represents a significant portion of mgoblog, picked up halfway through last year and has continued to get stronger. Brian's efforts at making the blog more than just himself are clearly having an impact.

The second chart just shows the number of words on a per week basis:

The graph reflects the same trends seen above, but in word counts. Even early on, Brian was producing above 10,000 words per week during football season, and last year during the same season, we were spoiled with over 30,000 words per week about the sport and team we love.

Finally, I show the number of comments per article:

The big effect in this graph is the lack of comments before the switch to the new blog infrastructure (e.g., the Haloscan era). The other effect is the growth of the community: the difference in the number of comments in Fall '08 and Fall '09 is likely a sign of the increased importance of this site as a place for the broad UM football community. Aside: the one early outlier which has a large number of comments (Fall '06) is just full of a bunch of comment spam: Unverified Voracity 99 Bonus Guest. Who knows why it's there, but Brian should probably remove those comments.

Longest Articles

I was also interested in what the longest articles were, but that should have been obvious: UFRs. Here are the ten longest articles (by number of letters in the article):

 

If you remove the UFRs from the list, these ten get the longest billing. A number of previews and various other summaries show up:

 

Most-Commented Upon Articles

I was also interested in the most commented-on articles. They were:

 

 

  • 10. The Feagin Reveal (by Brian on August/9/2009, 202 comments)
  •  

  • 9. An Interview With Compliance Guy (by Brian on February/24/2010, 213 comments)
  •  

  • 8. It (by Brian on July/30/2009, 218 comments)
  •  

  • 7. Hello: Shawn Conway (by Tim on February/20/2010, 236 comments)
  •  

  • 6. Unverified Voracity 9/9 + Bonus Guest! (by Brian on September/9/2005, 244 comments)
  •  

  • 5. Morgan Trent Is Not All In (by Brian on May/10/2010, 284 comments)
  •  

  • 4. Need A Whiskey To Boycott? (by Brian on February/25/2009, 299 comments)
  •  

  • 3. " (by Brian on November/9/2009, 402 comments)
  •  

  • 2. Noise, Piped-In And Otherwise (by Brian on September/15/2009, 410 comments)
  •  

  • 1. Offense Unit By Unit, 2008 (by Brian on August/26/2008, 529 comments)
  • Nothing gets people rev'd up like the Offense's Units, or RAWK MUSIC, I guess.

    Word Usage

    Finally, I was generally curious as to what words show up in the blog. Sounds like a case for a ... chart? Nope. But close, a wordle:

    The word cloud here shows a list of the most popular words used in this blog, with some editing done by y.t. to remove words like "the" (actually the most popular word on the site) and so forth.

    Anyhow, that's all for now. An amazing amount of content, built up over the years on the backs of UFRs and other regular features we all know and love. Thanks Brian for all the hard work - it is truly staggering to see the sheer verbiage that has powered the site over the years.

    Comments

    formerlyanonymous

    May 15th, 2010 at 11:35 PM ^

    Is it just me or does it seem odd that "Penn" would show up as a popular word but not State? You'd think with the combo of Penn making the list and the endless number of threads on State, it would show up. I guess it's just another sign of their unimportance.

    formerlyanonymous

    May 15th, 2010 at 11:38 PM ^

    That week to end 2009 with no comments. Seems strange to me. Sure, holidays and such. But there were also bowl games and stuff going on. I would have assumed at least a bit closer to average comment levels.

    MGoShoe

    May 15th, 2010 at 11:44 PM ^

    ...of the top 10 most commented on articles are within the last 15 months.  That in and of itself shows the tremendous growth of the MGoBlog community and the popularity of MGoBlog. 

    Brian's output is impressive indeed and we're definitely all the better for it.

    mejunglechop

    May 16th, 2010 at 1:42 AM ^

    Short take: I find this diary amusing.

    Long take: I started following this blog around when we were looking for a new coach and it hooked me, it was awesome then and that's why I got hooked, but I have to say that the move to Drupal, the addition of the board/diaries and the Mgomerger were all nothing short of brilliant.  People who don't read as frequently as I do might not know that this is the most frequently visited college football blog there is, but it's a testament to Brian's foresight in those moves. There's more content now- some of it (including 99%, maybe 100% of my contributions), doesn't add much, but the Decimated Defense posts, Mathlete's analysis, Tom VH's recruiting posts, Pauls videos and Tim and FA's general all around coverage elevate this blog to more than what's possible with just one person's perspective, no matter how thoughtful and insightful that person might be. A big part of Brian's genius is in recognizing and tapping into that. When I have a real job and am making more than $9 an hour I'll show the full extent of my appreciation in monetary terms, but until then all I can say is Hail! This site is the embodiment of Leaders and Best.

    Sidenote: Aarongoblue, by virtue of some glitch accounts for something like 300 of the comments on the Offense By Unit 2008 post. If somebody wants to count, that number should be revised.

    bacon

    May 16th, 2010 at 7:57 AM ^

    Nice work.  Does the most commented articles take into account the old days of haloscan?  IIRC, the comments sections over there got pretty long during the coaching search.  IMO, probably not worth remembering.  Also, does your program count the number of cat pictures posted?

    FA, you must have some sharp eyes because I looked for Penn on that wordle for a while before I found it.  Shocked that state isn't on there.

    MadMonkey

    May 16th, 2010 at 10:05 AM ^

    didn't make the wordle.

    To be serious for a moment, Brian's accomplishment thus far is quite impressive.  I have been a reader much longer than a contributor -- and most of my (ahem) contributions don't amount to much (see subject line above).  The volume and quality of content that Brian has produced and disseminated is a rare commodity on the web.  Even more extraordinary is that the content is related to our favorite academic and athletic institution.  This is a great web community and it keeps me close to the University. 

    learmanj

    May 16th, 2010 at 2:21 PM ^

    I don't know if this is just a mistake on my computer or just what but it is showing me that the #1 non-UFR article has a post that is repeated like 200 times.  Anyone else see this??? 

    Seth

    May 16th, 2010 at 2:23 PM ^

    I admit, though, I was disappointed that this only counted articles that were front-paged. Diaries and board posts weren't in it.

    I'd be more interested in stats about that. Which posters respond the most, which get the most plusses per response, which have the most words/characters per post, and who uses the longest words (just divide the number of characters by words).

    OneFootIn

    May 17th, 2010 at 4:52 PM ^

    Great stuff that shows what I had been imagining was happening, namely, that this site was really taking off as more folks realized how cool it is. As a professor who writes for a living, I am just amazed at someone writing three million words period, much less three million that are so insightful, funny, and generally excellent. But I agree with previous posts. What is now making this a unique site, and thus truly a Meeeechigan site, is the phenomenal community that has sprung up here, adding its own millions of words. I am moving away from Ann Arbor this summer and am so glad I will have mgoblog to stay in touch with all things Blue. More stats please!