So, like a lot of computer people, I have the odd clepto-esque habit of saving all of my email. Now, this wouldn’t be anything newsworthy if I had done a decent job of it, and just kept some nice little archive folder somewhere, or fed it all into GMail and had done.
Unfortunately, what I actually kept over the years is a mess of “I’m about to reformat this machine, copy all the mail off and I’ll deal with it later” backups. In fact, I have no less than 123 mbox files from past Thunderbird installs, 4 more mboxes from an Evolution backup, 4 Outlooks PSTs, and for good measure two Outlook Express profile folders and a maildir from… well, I actually have no idea where that’s from… maybe KMail once upon a time?
So, upwards of 132 independent message sources. Nice work, Colin.
First off, some interesting stats about this pile of mail:
- Earliest Date
- March 15, 2002
- Latest Date
- June 21, 2007
- Total Emails Archived
- 15493
- Number of Duplicate Copies
- 12567
- Percent of Messages With ≥1 Duplicate
- 27.87%
- Average Number of Duplicates (of those with ≥1)
- 2.910
- Maximum Number of Duplicates
- 14
And for posterity’s sake (aka, the next time I have to do this…) here’s some tips on how to clean up the mess:
-
Use Thunderbird + the Remove Duplicates (Alternate) Plugin
I really can’t say enough about the “Remove Duplicate Messages (Alternate)” plugin. I highly recommend it over the non-Alternate version. Here’s the basic idea. Install the plugin. Right-click a Thunderbird folder and select “Set Original message folder(s) for next duplicate search.” Then, right-click some other folder and select “Remove Duplicates…”. Up pops a window (after a few brief seconds of churn) with a list showing all duplicate (or triplicate or more) messages, side by side to make it abundantly clear that they are true duplicates. Hit [OK] and they’re gone. Perfect. Clean, simple, and effective. -
How to Import mail from Outlook PSTs
The one key point to make here is that the only program I trust to read Outlook’s PST format is Outlook. I’ve seen a few open source / third party tools, such as LibPST, but mostly they’re shareware “recovery” apps, and they just scare me :). Besides, if you have Outlook to make the PST, just use it to read it. Or ask a friend. Whatever.
The magic to getting your messages out of Outlook is: Thunderbird! Just install on the same machine as Outlook, have Outlook running with your PST opened (File->Open->Outlook data file…), and use Thunderbird’s Tools->Import… feature to suck in all the messages from Outlook. Remove those you weren’t interested in and you’re done. The rest are now present in Thunderbird. -
How to Import mail from Maildirs
The magic here is a neat little shell script by Joerg Reinhardt, which I found on linuxquestions.org. Drill is, run it like:sh md2mb.sh <maildir>
and you’ll get an mbox out namedmaildir.mbox
-
How to Import mail from Outlook Express
Yeah, I know. Outlook Express is old, not geeky, etc.. but back in the day (these messages are dated from 2002) I was young and naive, so here we are. How to deal? Well, the simplest way I found is just to copy my dbx files back over top a blank identity in Outlook Express on an XP box. Use a VM or an old machine, either way. Then install Thunderbird alongside, and import just as to extract messages from PSTs. Notes: I was not able to getreaddbx
from libdbx working, nor was I able to open the dbx’s in Outlook 2003 by tring to import them using the Import/Export tools. Sad face.
And there you have it: how to build your very own email archive Frankenstein, bootstrapped up from over a hundred pieces and jolted into life with a dash of Thunderbird. (And yes, Jason, I know you could write me a VBA app in 5 minutes to do this whole mess in Outlook… but you’re not here :-P)