Email, Gource, Hadoop, and Python
I never knew that one of the guys (Andrew C) who works at Catalyst wrote a fantastic times series visualisation tool called Gource . It's incredible what people have done with it - just look on Youtube. The focus of use seems to have been on analysis of source code repository activity, but I think there is more mileage to be had from Gource than this. I wrote a simple Map/Reduce map chain for Hadoop (not really necessary for my volume of data) that stripped out the from/to/date information from all my mbox history since 1996. It really is simple - all you need is to a generate a file in the customformat - eg.:
0970518767|"DJ Adams" <DJ_Adams@rank.com>|M|Andrew_Powis/RVSUK/FES/Rank@rank.com ...
and then pump this through Gource:<pre> gource –start-position 0.28 –stop-position 0.29 –title ‘Communication sphere since 1996’ -s 1 –log-format custom email-log.txt </pre>
You can record it as a video too:
gource --start-position 0.28 --stop-position 0.29 --title 'Communication sphere since 1996' -s 1 \ --log-format custom email-log.txt -1280x720 -o - | ffmpeg -y -r 60 \ -f image2pipe -vcodec ppm -i - -vcodec libx264 -preset ultrafast -crf 1 -threads 0 -bf 0 gource-video-of-email.mp4
And this is what it looks like:
Posted by PiersHarding at April 12, 2012 8:27 PM