{"id":427,"date":"2008-07-12T14:51:59","date_gmt":"2008-07-12T19:51:59","guid":{"rendered":"http:\/\/www.mccambridge.org\/blog\/?p=427"},"modified":"2022-09-11T00:40:38","modified_gmt":"2022-09-11T00:40:38","slug":"the-great-mail-merge-of-2008","status":"publish","type":"post","link":"http:\/\/www.mccambridge.org\/blog\/2008\/07\/the-great-mail-merge-of-2008\/","title":{"rendered":"The Great Mail Merge of 2008"},"content":{"rendered":"

So, like a lot of computer people, I have the odd clepto-esque habit of saving all<\/em> of my email.\u00a0 Now, this wouldn’t be anything newsworthy if I had done a decent job of it, and just kept some nice little archive folder somewhere, or fed it all into GMail and had done.<\/p>\n

Unfortunately, what I actually<\/em> kept over the years is a mess of “I’m about to reformat this machine, copy all the mail off and I’ll deal with it later” backups.\u00a0 In fact, I have no less than 123 mbox files from past Thunderbird installs, 4 more mboxes from an Evolution backup, 4 Outlooks PSTs, and for good measure two Outlook Express profile folders and a maildir from… well, I actually have no idea where that’s from… maybe KMail once upon a time?<\/p>\n

So, upwards of 132 independent message sources.\u00a0 Nice work, Colin.<\/p>\n

First off, some interesting stats about this pile of mail:<\/p>\n

\n
Earliest Date<\/dt>\n
March 15, 2002<\/dd>\n
Latest Date<\/dt>\n
June 21, 2007<\/dd>\n
Total Emails Archived<\/dt>\n
15493<\/dd>\n
Number of Duplicate Copies<\/dt>\n
12567<\/dd>\n
Percent of Messages With \u22651 Duplicate<\/dt>\n
27.87%<\/dd>\n
Average Number of Duplicates (of those with \u22651)<\/dt>\n
2.910<\/dd>\n
Maximum Number of Duplicates<\/dt>\n
14<\/dd>\n<\/dl>\n

And for posterity’s sake (aka, the next<\/em> time I have to do this…) here’s some tips on how to clean up the mess:<\/p>\n