I Love Nic Cage

Nic Cage is everything I love about Hollywood. He consistently headlines a scattershot of inconsistent movies spanning all genres. He bailed on the name "Coppola," because that would be too easy, and named himself "Cage" instead. He married Elvis' daughter because obviously. He bought a castle. 

More than anything, Cage is a damn fine actor (a fact that I'm reminded of watching his 2014 film, Joe, this morning). But despite that skill, he's also not too proud to give us the pulpy gems we all want. I'm sure they also probably help finance his awesome spending habits (did I mention the castle?).

So as he apparently turned 50 last month, let's celebrate Nic Cage, our National Treasure, with a data-driven review.

33 Years of Cage

Based on the data available at IMDB, Cage films have grossed over $5 Billion at the Box Office alone. That probably underestimates real growth, and doesn't even take into account the fourteen Laser Disc copies of The Rock that anybody bought ever. Compare that to roughly $2.5 Billion in Production Budgets, and he is doubling his investors' money.  Nice work, Cage.

Below we've plotted cumulative Budget (Gray) against cumulative Box Office Gross (Teal) from the start of his career through today. The pink line shows the average IMDB rating.  Hover over the graph for the numbers, and click on any movie title for the film's details. Scroll left for the full history

What a storied career.

The Cage Network

Next I had to explore the network of individuals that were lucky enough to share the stage with Cage at least once in their career. It's a pretty complex network diagram, so here are the quick notes:

  • Each colored bubble is somebody you should be jealous of (an actor that got to work with Cage)
  • The black bubble is Cage himself
  • Bubbles are sized by the number of movies the actor has ever appeared in
  • Colors are based on the first movie the actor appeared in with Cage
  • The links are based on the number of films any of these actors ever appeared in together (Cage film or otherwise)

Play around, clicking on an actor to expose some stats and the links they share with the rest of the network. I'd recommend selecting the "Fullscreen" link for full effect.

It's fun to find bridge actors (e.g., Ed Harris) as miscolored spots placed neatly between clusters of like-colored spots.  You can also pull out island movies - apparently nobody from Moonstruck dared get in Nic Cage's way again. As I mentioned in an earlier post, I generally find these graphs to be gimmicky and uninformative. But they are very pretty, and in a Nic Cage post I'm OK w/ a few gimmicks.

A Sporadic Career

Cage's career is all over the place. Films of all genres, all budgets, and with widely varying viewer appreciation. To see this, I turn to the trusty bubble chart. They are super easy to read and convey data across four dimensions (X,Y,Size,Color).  Below we've plotted each movie, by year, IMDB Rating, Box Office Gross, and genre.

The result is a shotgun blast of crazy that epitomizes Cage's career.

Perhaps the most obvious trend is that immediately after The Rock, Cage found his place in big budget Adventure/Thrillers. Unfortunately for him, critical reception of these movies has waned over the two decades that followed. Toggle off the other genres, and an obvious downward trend emerges. Luckily, as indicated by the big bubbles to the right, the cash keeps rolling in.

About the Data

All of this data is sourced from IMDB, which publishes a flat-file version of their entire database via public FTP sites. For somebody constantly looking for large public datasets, this is a goldmine. It has plenty of flaws, but they are easy to overcome given how interesting the dataset's subject matter is. Everybody loves movies.

Thankfully, software engineer Davide Alberani put together a great python toolkit to parse through the public files and build an easily searchable local instance of the database, complete with indexes and foreign references. Oddly, initial attempts to load into my MSSQL instance had major performance issues, resulting in a projected completion times 10 days. I'm not that patient, even for Cage. Instead, I threw them into a MySQL database (shudder), which only took a couple of hours.

From there, the analysis is a piece of cake. Initially, I had planned to do an interactive network map of actors up to three (3) degrees away from Cage, but the explosive nature of the resulting data rendered the D3 visual unusable in a web browser. Dialing down the analysis to the immediate Cage circle helped, and removing less prolific actors brought the visual to a workable, if still poorly performing state.

In my next post, I'll put my Cage obsession aside and perform a much broader analysis of 100 years of cinema with all kinds of actors, directors, writers and producers.

Until then, more Cage: