Update 2016-05-25. I found a much easier way.
Remove Requests
We don’t like to bother our customers with lots of emails, but from time to time we let them know about new products and sales. Some of them use the remove link on the bottom of the email to unsubscribe from the list. I have a rule in Apple Mail that automatically routes the remove response to a folder. We usually get a couple of remove requests each time we send out a mailing and we usually process them manually. I wanted to automate the process a bit and create a master list of email addresses that do not want our mailings. That way if someone orders from us under a different name or address, we won’t be sending them email if they opted out earlier.
After looking around in the Library/Mail folder it looks like our Remove folder is located at
~/Library/Mail/V2/Mailboxes/Remove.mbox/
I CD to that folder and then redirect the results of a recursive grep command to a file on the desktop. Since they are responding to our email, I look for the From: portion of the email. Note the period after the search term. I means to look for all files in the current directory. The -r says to look in all files in directories below this one too.
grep -r "From: " . > ~/Desktop/Remove.txt
Then I open the file in BBEdit and remove extra lines, e.g. anything with our company name.
I only want the email addresses, so I can use this grep line to remove everything before the address.
.*<
I can remove everything after the address with this.
>.*
Sort the lines and process duplicates and you’re done. I then import the email addresses into my MySQL database.
Bad Domain
Our mail server will try to find missing domains for a few days and generates warning messages. When it finally gives up it generates a message with the subject “Returned mail: see transcript for details”. The nice thing about these messages is that the failed address is easy to find. It looks like this:
The following address(es) failed:
abby612@earthink.net
retry timeout exceeded
In this case it probably couldn’t find the server because it was looking for earthink but the email was probably to earthlink. When I pull them out of the failure message, I correct obvious mis-typing before adding them to the bad email database.
The code to find the addresses is:
grep -r "^ .*@.*$" . > ~/Desktop/Failed.txt
The code looks for lines starting with a space, then an email address, then the end of the line. There are surprisingly few false positives e.g. lines that contain other things than just an email address.
Bad Addresses
It’s much harder to remove the bad email addresses since the format from different email providers varies tremendously. I ended up looking for lines that contain an “@” and doing a lot of manual cleanup. I’ll see if I can figure out a better way next month.