Major enhancements of Duplicate Contact Manager for Thunderbird

This is a (hitherto) unofficial major update, which I tend to call Version 0.9, of the Duplicate Contact Manager.

The so far available Version 0.8.2 was a good starting point, but since I urgently needed a more sophisticated tool, I started improving it myself. My changes have been motivated — and checked — using my personal address book with some pretty well hand-managed 1000 entries and using the automatically generated collected address book with some 2500 entries including many duplicates and weird variants of names etc.
The changelog is:

When a potential duplicate is presented, the default marked "red" one is not simply the the right-hand side one, but the one having less information. If automatic removal is chosen, this one will be removed. In order to determine which one of two likey duplicates is chosen for deletion, the following multi-stage comparison process is performed:
  1. abstraction: ignore some fields for comparison, e.g. PhotoType, UUID
  2. normalization: remove leading/trailing white space, expand umlauts and ligatures, correct order of first and last name and name prefixes, interpret both primary and secondary emails (if present) as a set, and convert to lower case.
  3. consider a card as less complete (less information, to be deleted) if each its fields either has the same (abstracted) value or is empty.
  4. if the same fields are present for both cards and all have equal abstracted values, prefer to keep the one with higher character weight: with more uppercase letters and more special (non-letter) characters.
  5. if still both cards are equivalent, prefer to keep the first one.

Note that the string manipulations are computed only for the comparison, that is they do not change the actual card fields.
Here is an example: the card on the right will be preferred for deletion.

FirstName: "Peter" "Peter van  "
LastName: "van Müller" " mueller"
DisplayName: "Peter van Müller" "Müller, Peter van"
PrimaryEmail: "Peter.vanMueller@company.com" "p.van.mueller@gmx.de"
SecondaryEmail: "P.van.Mueller@gmx.de" ""
WorkPhone: "+49 89 12345678" ""
PopularityIndex: "5" "3"
AllowRemoteContent: "1" (yes) "0" (no)
UUID "23434-446-234234-45dd" "asdfzdkjqwrtlnkkq"

Questions and comments are welcome.

backZurück
-----------------------------------------------------------------------------------------------
[Valid HTML5] URL: http://David.von-Oheimb.de/perlen/Duplicate_Contact_Manager.html Last modified: Sat Apr 7 18:09:09 CEST 2012