We swear a lot. I think.
2,133 curses (by a relatively limited definition) in 5 years of email. That means 147 of every 1 million words in my inbox are swears.
Having not thought about this at any point previously in my life, I have no idea if 147 Curses Per Million is a particularly high ratio. We need a baseline - let's check out some external sources.
- My coworkers apparently drop a couple swears in every 100 emails or so. Result: 147 CPM
- The Declaration of Independence is a bit over 1000 words, and they managed to not curse once. Nice work, boys. Result: 0.0 CPM
- James Joyce's dialogue in Ulysses didn't shy from a curse or two. It was banned back in the 1920's because it was so obscene, but they banned everything fun back then. The book weighs in at a hefty 264,922 words, of which 114 are curses. Result: 430 CPM
- The Wu Tang Clan may be for the children, but they use plenty of adult words. In Bring da Ruckus, they drop 18 curses in 455 words. Result: 39,560 CPM
It looks like we fall somewhere between Thomas Jefferson and ODB. Helpful perspective.
So when are we dropping all those F-bombs?
Frequency of Cursing by Hour
The graph above shows the percentage of emails by hour that have a curse word in the body. Here you can see massive spikes between 7p and 2a - the after hours crowd. While I can't speak for everyone, usually when I'm working late I'm pretty grumpy. Or I'm just really excited about what I'm working on. Anger and passion both seem like good ingredients for increased foul language.
Not many people are cursing after lunch. When you've just had a turkey sandwich on a baguette while surfing TMZ (or whatever people do during lunch), you're probably just not that pissed off. In fact, another way to read this graph is a very strong correlation between the number of swears and the amount of time since one's last meal. Maybe everybody just gets a serious case of the "hanger" and starts spewing profanity into my inbox. I'll go with that theory.
So who are these sailor-mouthed degenerates bombarding me with ugly language? I've decided to de-identify the data (because I don't name names, dammit), but we can take a look anyway.
A Few Bad Apples - Curses by Sender
It's very clear there are four major culprits here. Chung, Cleo, Brad, and Andres (apparently I found the best random name generator ever) account for over 30% of the total curses. They're followed by a very long trail - there were 168 unique senders that contributed at least one expletive to the pile. Though the names are randomized, they do reflect the reality that the top 21 offenders are all male.
But let's talk about Chung, am I right?! He clocks in at 209 total swears, and leads in every category save for "Damn," which, let's be honest, is a fake swear anyways. Chung - you know who you are.
If you recall the initial metric, 147 out of every 1 million words is an obscenity. That leaves 999,853 OTHER words that we haven't even touched yet. I've profiled the 14.5 million words in over 90,000 emails and put together the list of the most used ones below. I used some text search tools to collapse conjugates into the root, combine plural/singular nouns, etc. and stripped out "Nick", "Stepro", the name of my company, and over 500 common "stop words," to neaten up the data.
For space, I collapsed a lot of words into a single gray line. While those gray lines are not interactive you can click on the orange bars for a more details on a few of the choices.
I really can't explain how happy I was to find out that there are more instances of the word "shit" in my inbox than there are "graph". Unexpected and just wonderful.
"Would some rockstar on the bench put together a straw-man deck to communicate the value and scope of this engagement to our client? We'll leverage an SME to wordsmith it, add the takeaways from the last meeting and polish the deliverable. Oh yeah, and synergy."
Consultants love to make up words. In their defense, they have to talk a lot. I imagine it's just a lot more fun if you invent cool words to say. Below is a somewhat arbitrary list of 14 buzzwords and their popularity over time amongst those communicating with me. Apparently we got a lot of leverage in our engagements in 2013…
The Graphs and Data
Most of this post was done with the open-source library NVD3, which sits on top of D3 but makes it super easy to deploy simple, interactive visuals with minimal code. I've found it a big time-saver in moments like these, where a simple Bar/Line/Bubble chart will suffice. However, as frequently happens when I choose the easy option, I frequently butt up against some edge cases that require forking the code or building an alternative from scratch.
As is likely obvious, the big Curse Balls and the Very Tall Bar Graph do not use the NVD3 library.
On the data processing side, I used MSSQL Full Text Search to comb through the bodies of 90,000 emails and pull out root words. Performance was pretty remarkable, especially given I was using a local instance of MSSQL on my laptop. Only a few minutes to rip through all of that information. Will probably explore it further, or one of the few alternatives like Lucene.