Hollywood + data

Ever heard of the Information is Beautiful awards? It’s visualization competitions a monthly visualization competition put together by David McCandless of Information is Beautiful fame.

Part of it are monthly competitions that run on a curated dataset. Jen Lowe and I are making a team for the current one about the movie industry and we are going to deliver a competitive entry! even our drafts are rocking! But, while we were looking for that one great idea I explored the data.

Part of the dataset is the total box office earnings of over 600 of the movies released in the US in the last 5 years. What I did is cross that list with the user-contributed plot keywords on imdb. Then, I ran a regression on that to find out how much each keyword would generate in the box office? (I only kept those keywords mentioned over 5 times, that’s just over 2500, because else the full list is  20,000+). The full list is at the end of the article.