Four More Years

In the first post I analyzed a year's worth of emails. It was fun enough, but I had four more years stashed away somewhere in backup storage that could complete the story. Many hours later, after acquiring a very peculiar piece of $60 software, and after frequent emails with a shockingly prompt and helpful support rep named Antonio, I was able to jam all that information into a database.

The result is over 90,000 emails from 2,300 unique senders with content totaling 14.5M words.

Let's dig into it.

Five Years of Email, by Week and Time of Day

Turns out five years of data is much more interesting than just one. It looks like I missed a few months in 2012 do to poor backup management - hence the dead zone. Those emails are likely sitting on some exchange server somewhere, but I digging them up will be a big pain.

Check out the top of the graph - the first year out of college was totally idyllic. Very few emails, only a handful of super late nights. It was a different time... Starting in late 2010 I apparently started to hate myself and worked way too often at all hours. Thankfully I was young and single, so working frequently until 3 or 4a and eating baked beans from a can was an acceptable habit to have.

Social Network Analysis

With over 2,000 unique "From" addresses, and countless more in the "To" and "CC" fields of the email, this data is prime for some social network analysis. By looking at who emails who else and when, one can probably build a fairly accurate set of social clusters.

I could have used a traditional force-directed network graph, but I've always thought those to be necessarily noisy and complex. I usually see them at conferences with vendors pitching pie-in-the-sky concepts and are looking to dazzle prospects with a bunch of bouncing bubbles and complex talk about healthcare referral patterns or prescription behavioral clusters. I'm much more fond of the Adjacency Matrix, which can get at some similar results in a much neater visual.

Adjacency Matrix with Email Co-occurrence

An adjacency matrix with the right clustering identifies friend groups and client teams that are frequently on the same email together.

The graph shows co-occurrence of addresses on an individual email. This includes directed relationships (John emailed Lindsay) as well as lateral "co-occurrence" (somebody else emailed both John and Lindsay). I've grouped addresses by color according to the first time they showed up in the data. Red is 2009, Blue is 2010, Yellow is 2011, and so on.

Problematically, this creates a huge amount of data. Email chains contain dozens or hundreds of recipients. Multiply that by the tens of thousands of emails, and you've got an explosive data problem. So, instead of an interactive version that would crash most readers' browsers, I've opted for a screenshot.

This graph did immediately highlight very real social clusters that I've identified qualitatively day-to-day. Each magnified bubble is a cluster of either friends or clients of mine that frequently show up on the same emails together.

Relationships Over Time

Over five years, through many role changes and a few promotions, my relationships with various co-workers has evolved. This is especially true of the early years as a consultant, where it wasn't uncommon to hole up in a conference room with only a few other team members for months at a time, only to repeat the process with a different set of colleagues as soon as the project ended.

Below, I've plotted email frequency month-by-month for some of my top email buddies. Hover over a line to highlight the complete trajectory of the relationship and see whatever random name I assigned to the co-worker.

This graph is a pure shot of nostalgia. I don't expect anyone to share that sentiment (what with the fake names and all...), but hovering over each line is like tracing back a friendship to the start. Peaks highlight some crazy projects, valleys show some blissful moments where we were able to get out of each other's hair for a bit. Occasionally the lines go flat as my co-worker moved on to another company. Pour one out.

Working Patterns by Co-Worker

Everybody works a bit differently. I happen to be most productive late into the night. Others prefer to wake up early so they can properly shut down after work. Some people are even disciplined enough to turn on at 9a and shut off at 5p, though I've never understood how that would work.

By plotting email frequencies on 30 minute intervals across a five-year dataset, work patterns emerge for each of my colleagues. I've categorized them into 4 buckets:

  • Midnight Oil. I fall very neatly into this category. I'm terrible at shutting off, and find myself most productive between 7p and 2a.
  • Early to rise. A very appealing habit, but one I'm afraid I'll never adapt to. These folks are up at 5a or earlier to carpe the worm or however that saying goes.
  • Late starter. I'm not going to say that all of these folks are developers, but there is a strong correlation. Roll in at 10 or 11a, but stick around until the job is done.
  • Nine to five. A totally foreign concept to me. Clock in, work hard, clock out. Completely admirable but seemingly unattainable.

The graph below shows how a number of my co-workers stack up. Click on a button for one of the categories to spend a day in the life of those workers' shoes.

My favorite here is obviously Spencer, who apparently has a shot of productivity at 1a and uses it to send me a whole lot of emails.  This "Small Multiples" graph, closely related to the sparkline, is a great way to densely stack up dozens of series and find patterns and aberrations.

Up to now I've really only focused on who is sending emails and when. Next time, I'll rip apart the bodies of the emails to find out what people are actually saying with all of these messages.