Matching Algorithms Used with Matching Methods

The matching method and its corresponding matching algorithms are part of the matching rule's matching criteria. They help determine how a specific field in one record is compared to the same field in another record and whether the fields are considered matches.

We've provided an exact matching method and a variety of fuzzy matching methods. If the exact matching method is selected, then the exact matching algorithm is automatically used to compare the fields.
If one of the fuzzy matching method is selected, then a variety of fuzzy matching algorithms is used to compare the fields. A field can be compared using more than one matching algorithm, and a matching score is given to each matching algorithm based on how closely it's able to match the fields. The fields being compared by the matching algorithms are not case sensitive.

Matching Algorithms Available with Exact Matching Method

Matching AlgorithmDescription
ExactDetermines whether two strings are the same. For example, salesforce.com and Salesforce are not considered a match because they're not exactly the same, and return a match score of 0.

Matching Algorithms Available with Fuzzy Matching Methods

Matching AlgorithmDescription
1.AcronymDetermines whether a business name matches its acronym. For example, Advanced Micro Devices and its acronym AMD are considered a match and return a match score of 100.
2.Edit DistanceDetermines the similarity between two strings based on the number of deletions, insertions, and character replacements needed to transform one string into the other. For example, VP Sales matches VP of Sales with match score of 73.
3.InitialsDetermines the similarity of two sets of initials in personal names. For example, the first name Jonathan and its initial J match and return a match score of 100.
4.Jaro-Winkler DistanceDetermines the similarity between two strings based on the number of character replacements needed to transform one string into the other. This method is best for short strings, such as personal names. For example, Johnny matches Johny with a match score of 97.
5.Keyboard DistanceDetermines the similarity between two strings based on the number of deletions, insertions, and character replacements needed to transform one string into the other, weighted by the position of the keys on the keyboard.
6.Kullback Liebler DistanceDetermines the similarity between two strings based on the percentage of words in common. For example Director of Engineering matches Engineering Director with a match score of 65.
7.Metaphone 3Determines the similarity between two strings based on their sounds. This algorithm attempts to account for the irregularities among languages and works well for first and last names. For example, Joseph matches Josef with a match score of 100.
8.Name VariantDetermines whether two names are variation of each other. For example, Bob is a variation of Robert and returns a match score of 100. Bob is not a variation of Bill and returns a match score of 0.
9.Syllable AlignmentDetermines the similarity between two strings based on their sounds. First, the character strings are converted into syllables strings. Then the syllable strings are also compared and scored using the Edit Distance algorithm. This matching algorithm works well for company names.

For example, Syllable Alignment gives Department of Energy and Department of Labor have a relatively low match score of 59 because the syllable sequences of these two company names differ more than their character sequences ( "energy" sounds very different than "labor"). Edit Distance gives the two strings a score of 74. Therefore, Syllable Alignment works better because the two strings should not be considered a match.

No comments:

Counters