Very OT: Connecting IMDb & Google spreadsheet help

Submitted by Bleedin9Blue on March 18th, 2010 at 8:18 PM

I've been working on a pretty fun little Google spreadsheet. What it allows you to do is simply paste in the IMDb link to a movie (for Avatar it would be this: http://www.imdb.com/title/tt0499549/) and the sheet will then pull information about the director(s), writer(s), lead actor(s), year of release, IMDb rating, top 250 or bottom 100 ranking, and I'm currently working on grabbing the "genres" too.

It's pretty interesting, I've been doing it because I want to start cataloging all of the movies that I've seen and rate them myself while also comparing my ratings to the IMDb ones. I also have visions of eventually making it so that by simply typing in the name of an actor, director, etc. I can see all of the movies that I've seen that that person is involved with and by how much my rankings differ compared to IMDb's.

You can see what I've done so far here

But, I didn't realize that Google restricts people to only 50 "import" functions. Since each row in the sheet uses multiple "import" functions, I would reach my limit of 50 far before I'd reach the end of all the movies that I've seen. Thus, I'm turning to the internet for help. I've looked on Google and posted on forums with no success, so now I must turn to MGoBlog. I know that there are CEs that frequent this board, so I would especially appreciate your help. I'm trying to remake my spreadsheet in Excel, but I've found that the only way to make external connections to specific URLs seems to involve far too much work on my part- I have to specify what tables to import and then grab the appropriate data. Worse, there's no function that you type in to import a table, it has to be done through a wizard which means that I have to import every table individually AND keep it in the Excel file forever.

So, does anyone know of some way around this? I'm essentially looking for an Excel version of Google spreadsheet's ImportHTML/ImportXML/ImportData. All of my research seems to indicate that there's no identical function in Excel and I haven't found any workarounds will be acceptable for what I'm doing.

Any help would be appreciated. Thanks.

Note 1: I apologize for posting something OT, normally I try to avoid that but I'm just out of ideas at this point.

Note 2: If you open the Google spreadsheet link that I posted, you might see some random stuff after the first 2 lines, that's just me experimenting with whatever isn't working at the moment- currently that's either getting genres to work or finding a way to grab the whole release date and not just the year.

Comments

BlockM

March 18th, 2010 at 8:34 PM ^

I'm not sure, but you may be able to throw together a relatively quick Perl (or something similar) script to open up the HTML source file from the website and parse it. This would work best if you're just going to import it once to get the data, rather than having it go check every time you open the spreadsheet.

Once you have parsed the data you want, it would be pretty simple to fill in the cells on the spreadsheet automatically as well I think.

Bleedin9Blue

March 18th, 2010 at 8:40 PM ^

I've never used Perl so that would be a little bit difficult. Obviously this isn't a complex job but I've generally found that the initial hump of trying to learn how to properly import something and then work with what you've imported into a language you don't really know can be difficult.

Additionally, one of the allures of Google spreadsheets was that it updates automatically very often. For example, I know that Avatar was #67 earlier today on the top 250 because that was what I was using to test some of my functions, now it shows up as #68. So, to lose the ability to update often would be sad.

Regardless, I guess I can look into Perl a little bit.

BlockM

March 18th, 2010 at 8:46 PM ^

I had never used Perl before my little project last summer of bringing all the UFR's into a database, but it was surprisingly easy to get the hang of (at least for relatively simple scripts).

I suppose you could have the scripts run every time you open the spreadsheet, but I'm not sure how much time you'd have to wait every time you wanted to see the stuff. The wait would get longer each time you add a new movie though...

Either way, after a very quick Google search, the following site seems to at least provide the basics on getting web data using Perl: http://www.perl.com/pub/a/2002/08/20/perlandlwp.html.

Good luck, and let us know how it goes!

goblue_westcoast

March 18th, 2010 at 9:28 PM ^

Perl would not be my first choice for languages to pick up. You could probably get many of the same goodies with something simpler like Python. Python also has libraries to google data APIs so you can keep writing to your google spreadsheet. There is no such Perl library implementation of the google data APIs.

As I said though, IMDB might not be too happy that you are scraping data.

Bleedin9Blue

March 18th, 2010 at 9:36 PM ^

Now that is a good idea, especially with what you said about have some good APIs.

I'm not overly worried about IMDb since if I make this in Excel I can have it only grab data once whenever the file is opened. Thus, it should only be using a very small amount of bandwidth. And since I'm not profiting from what I'm doing, it's purely personal, I don't think that they'll have any cause to do anything.

Of course, if I do succeed in making this and a lot of people start using it (which is extremely doubtful) then they might get angry. But, that's way too far down the road for me to care, I'm doing this at least partly because it's fun and gives me something to do after work.

If you do think that IMDb might be more upset then I suspect, then I'd be happy to here it and any suggestions that you might have (short of not doing this at all) to mitigate their anger/minimize the amount of things I do that they may frown upon.

goblue_westcoast

March 18th, 2010 at 8:42 PM ^

IMDB apparently has content licensing requirements now (new since March 1, 2010). http://www.imdb.com/licensing/

I was going to suggest that you use the URLFetch javascript support and call an IMDB API. Then you would just have a row with a cell containing each url that you need.

That was when I came across the bad news about content licensing. I would suggest looking at the scripting APIs that are available. They are pretty powerful.

Mirasola

March 18th, 2010 at 11:27 PM ^

I can't help you, but think it's very interesting what you're putting together. IMDb ratings align very closely with my movie tastes 95% of the time, and I'm always interested to see what other people think.

I just wanted to say that if you are like me, I would highly recommend Cinema Paradiso (and most of the other foreign movies on the top 250 list). It is my favorite movie and I am glad to see it on the list of top movies.

Sorry I didn't offer any help, but good luck with this.

kielpedia

March 19th, 2010 at 12:41 AM ^

In my professional life I am actually a Flex/Java Developer and have been thinking about making an AIR app mashup of Theatre listings, imdb info, and rotten tomatoes reviews.

Unfortunately, I havent had the time to devote to it and I dont think Rotten Tomatoes even has a public API. However, I still think it could be pretty cool especially since Adobe is coming out with their AIR app store called Shibuya which is in Beta right now. Though I havent read it, they provide code that you can just plug in to your source that will automatically provide free trials and stuff. Its pretty interesting.

goblue_westcoast

March 19th, 2010 at 12:55 AM ^

Not a bad idea for a mashup mobile app. Does air run on iPhone or android?

You could use yelp for theatre ratings. Not sure about rotten tomatoes but one alternative is to setup a search feed to google and have them return you movies reviews in a wellformed API.

wike05

September 14th, 2014 at 11:11 AM ^

Im interested in this.  I keep a list of my movie collection in a google spreadsheet, and would like to pull generes, moive release date, and plot.