For the Back End class at GA, we have to submit a final project. One thing that is near and dear is E-Discovery. I didn’t realize that you needed a law degree and license to read emails, but it appears that you do - and it pays. So, when the final project was brought up, I thought I could develop some basic assessment tools - an app that will help assess the size of the email data collected and the contents of the data collected. I’m in the Data Science classes at Coursera too, so I thought I could plot some info and subset in R. Pretty cool right? Ruby & R!
I wasn’t sure where to start with the Ruby script, so I turned to my old pal Google and we started to chat about Ruby script and email, when finally Google mentioned Sau Cheong Chang and the book Exploring Everyday Things, part of which deals with examining email with Ruby and R – bingo!
The Ruby script below, is pretty much from the book - I didn’t change too much up. It basically reads my Gmail account and spits out a CSV file. For my final project, things will have to be different, but this is a start.
Chang also provides an R script to plot the data, I changed it up a little to suit my needs. I found that the time frames for my inbox and sent mail is a bit different, not surprising since I get a lot more mail than I send out. So I had to adjust the R script to account for the differing times. Eventually, I just subsetted the data to May and then merged the data with “NA” where appropriate.
Below are the plots rendered from the R script - as you can see I get a whole lot more email than I send.
Where does all that email come from?