Matching Algorithms Used with Matching Methods

The matching method and its corresponding matching algorithms are part of the matching rule's matching criteria. They help determine how a specific field in one record is compared to the same field in another record and whether the fields are considered matches.

We've provided an exact matching method and a variety of fuzzy matching methods. If the exact matching method is selected, then the exact matching algorithm is automatically used to compare the fields.
If one of the fuzzy matching method is selected, then a variety of fuzzy matching algorithms is used to compare the fields. A field can be compared using more than one matching algorithm, and a matching score is given to each matching algorithm based on how closely it's able to match the fields. The fields being compared by the matching algorithms are not case sensitive.

Matching Algorithms Available with Exact Matching Method

Matching AlgorithmDescription
ExactDetermines whether two strings are the same. For example, salesforce.com and Salesforce are not considered a match because they're not exactly the same, and return a match score of 0.

Matching Algorithms Available with Fuzzy Matching Methods

Matching AlgorithmDescription
1.AcronymDetermines whether a business name matches its acronym. For example, Advanced Micro Devices and its acronym AMD are considered a match and return a match score of 100.
2.Edit DistanceDetermines the similarity between two strings based on the number of deletions, insertions, and character replacements needed to transform one string into the other. For example, VP Sales matches VP of Sales with match score of 73.
3.InitialsDetermines the similarity of two sets of initials in personal names. For example, the first name Jonathan and its initial J match and return a match score of 100.
4.Jaro-Winkler DistanceDetermines the similarity between two strings based on the number of character replacements needed to transform one string into the other. This method is best for short strings, such as personal names. For example, Johnny matches Johny with a match score of 97.
5.Keyboard DistanceDetermines the similarity between two strings based on the number of deletions, insertions, and character replacements needed to transform one string into the other, weighted by the position of the keys on the keyboard.
6.Kullback Liebler DistanceDetermines the similarity between two strings based on the percentage of words in common. For example Director of Engineering matches Engineering Director with a match score of 65.
7.Metaphone 3Determines the similarity between two strings based on their sounds. This algorithm attempts to account for the irregularities among languages and works well for first and last names. For example, Joseph matches Josef with a match score of 100.
8.Name VariantDetermines whether two names are variation of each other. For example, Bob is a variation of Robert and returns a match score of 100. Bob is not a variation of Bill and returns a match score of 0.
9.Syllable AlignmentDetermines the similarity between two strings based on their sounds. First, the character strings are converted into syllables strings. Then the syllable strings are also compared and scored using the Edit Distance algorithm. This matching algorithm works well for company names.

For example, Syllable Alignment gives Department of Energy and Department of Labor have a relatively low match score of 59 because the syllable sequences of these two company names differ more than their character sequences ( "energy" sounds very different than "labor"). Edit Distance gives the two strings a score of 74. Therefore, Syllable Alignment works better because the two strings should not be considered a match.

Matching Methods Used with Matching Rules

The Exact matching method looks for strings that match a pattern exactly. If you're using international data, we recommend you use the Exact matching method with your matching rules. We've provided an exact matching method that can be used for almost any field, including custom fields.

The Fuzzy matching methods look for strings that match a pattern approximately.



Matching MethodMatching AlgorithmsScoring MethodThresholdSpecial Handling
ExactExact
Fuzzy: First NameExact

Initials

Jaro-Winkler

Name Variant

Maximum85The Middle Name field, if used in your matching rule, is compared by the Fuzzy: First Name matching method.
Fuzzy: Last NameExact

Keyboard Distance

Metaphone 3

Maximum90
Fuzzy: Company NameAcronym

Exact

Syllable Alignment

Maximum70Removes words such as Inc and Corpbefore comparing fields. Also, company names are normalized. For example, IBMis normalized to International Business Machines.
Fuzzy: PhoneExactWeighted Average80Phone numbers are broken into sections and compared by those sections. Each section has its own matching method and match score. The section scores are weighted to come up with one score for the field. This process works best with North American data.
  • International code (Exact, 10% of field's match score)
  • Area code (Exact, 50% of field's match score)
  • Next 3 digits (Exact, 30% of field's match score
  • Last 4 digits (Exact, 10% of field's match score)

For example, suppose these two phone numbers are being compared: 1-415-555-1234 and 1-415-555-5678.

All sections match exactly except the last 4 digits, so the field has a match score of 90, which is considered a match because it exceeds the threshold of 80.

Fuzzy: CityEdit Distance

Exact

Maximum85
Fuzzy: StreetExactWeighted Average80Addresses are broken into sections and compared by those sections. Each section has its own matching method and match score. The section scores are weighted to come up with one score for the field. This process works best with North American data.
  • Street Name (Edit Distance, 50% of field's match score)
  • Street Number (Exact, 20% of field's match score)
  • Street Suffix (Exact, 15% of field's match score)
  • Suite Number (Exact, 15% of field's match score)

For example, suppose these two billing streets are being compared: 123 Market Street, Suite 100 and123 Market Drive, Suite 300.

Because only the street number and street name match, the field has a match score of 70, which is not considered a match because it's less than the threshold of 80.

Fuzzy: ZIPExactWeighted Average80ZIP codes are broken into sections and compared by those sections. Each section has its own matching method and match score. The section scores are weighted to come up with one score for the field.
  • First 5 digits (Exact, 90% of field's match score)
  • Next 4 digits(Exact, 10% of field's match score)

For example, suppose these two ZIP codes are being compared: 94104–1001and 94104.

Because only the first 5 digits match, the field has a match score of 90, which is considered a match because it exceeds the threshold of 80.

Fuzzy: TitleAcronym

Exact

Kullback-Liebler Distance

Maximum50

Counters