I wanted to find all of the emails in my customer database that were duplicates. This could be because several people at the same business share an address or because someone is in the database twice with slightly different spellings of their name e.g. Kathy and Kathleen, or with and without a middle initial, etc.
Before I started I set all of the email addresses that were empty strings to NULL, since joins don’t match on NULL but do on empty strings.
Then on the first try I used this:
SELECT t1.name, t1.email, t2.name, t2.email
FROM customers AS t1
INNER JOIN customers AS t2
ON t1.email = t2.email
WHERE t1.name <> t2.name
Or in WHERE notation.
SELECT t1.name, t1.email, t2.name, t2.email
FROM customers AS t1, customers AS t2
WHERE t1.email = t2.email
AND t1.name <> t2.name
I got 384 records, but I got a lot of duplicates where t1.name and t2.name are reversed. i.e.
Helen Jaffa hjaffa@ino.net Helen A Jaffa hjaffa@ino.net
Helen A Jaffa hjaffa@ino.net Helen Jaffa hjaffa@ino.net
To eliminate duplicates I used a UNION and two OUTER JOINS
SELECT t1.email, t1.name
FROM customers AS t1
LEFT OUTER JOIN customers AS t2 ON t1.email = t2.email
WHERE t1.name <> t2.name
UNION
SELECT t2.email, t2.name
FROM customers AS t2
LEFT OUTER JOIN customers AS t1 ON t1.email = t2.email
WHERE t1.name <> t2.name
This yielded 343 records and no duplicates.