Major enhancements of Duplicate Contact Manager for Thunderbird
This is a (hitherto) unofficial major update, which I tend to call Version 0.9, of the
Duplicate Contact Manager.
The so far available Version 0.8.2 was a good starting point,
but since I urgently needed a more sophisticated tool, I started improving it myself.
My changes have been motivated — and checked — using my personal
address book with some pretty well hand-managed 1000 entries and using
the automatically generated collected address book with some 2500
entries including many duplicates and weird variants of names etc.
The changelog is:
When a potential duplicate is presented, the default marked "red" one
is not simply the the right-hand side one, but the one having less
information. If automatic removal is chosen, this one will be removed.
In order to determine which one of two likey duplicates is chosen for deletion,
the following multi-stage comparison process is performed:
- fields of retained duplicate entry can be edited
- can compare across two different address books
- new option to first collect all duplicates and then handle them
- card matching is less aggressive and more fault tolerant:
only very likely duplicates — and more of these — are presented
- the less complete duplicate is selected for removal by default
- automatic removal option now removes also less complete duplicates
- made the overall search process interruptable and repeatable
- moved Thunderbird menu entry into Address Book → Tools
- many other small improvements, e.g., on progress bar and final info
- internal: major code cleanup (would be still a lot TODO)
- internal: added README.txt and COPYING.txt files
- TODO: check/update/improve French and Spanish translations
- TODO: update online documentation
- abstraction: ignore some fields for comparison, e.g. PhotoType, UUID
- normalization: remove leading/trailing white space, expand umlauts
and ligatures, correct order of first and last name and name prefixes,
interpret both primary and secondary emails (if present) as a set, and
convert to lower case.
- consider a card as less complete (less information, to be deleted) if
each its fields either has the same (abstracted) value or is empty.
- if the same fields are present for both cards and all have equal
abstracted values, prefer to keep the one with higher character weight:
with more uppercase letters and more special (non-letter) characters.
- if still both cards are equivalent, prefer to keep the first one.
Note that the string manipulations are computed only for the
comparison, that is they do not change the actual card fields.
Here is an example: the card on the right will be preferred for deletion.
|FirstName: ||"Peter" ||"Peter van "
|LastName: ||"van Müller" ||" mueller"
|DisplayName: ||"Peter van Müller" ||"Müller, Peter van"
|PrimaryEmail: ||"Peter.vanMueller@company.com" ||"email@example.com"
|SecondaryEmail: ||"P.van.Mueller@gmx.de" ||""
|WorkPhone: ||"+49 89 12345678" ||""
|PopularityIndex: ||"5" ||"3"
|AllowRemoteContent: ||"1" (yes) ||"0" (no)
|UUID ||"23434-446-234234-45dd" ||"asdfzdkjqwrtlnkkq"
Questions and comments are welcome.
Last modified: Sat Apr 7 18:09:09 CEST 2012