The Fuzzy matching methods look for strings that match a pattern approximately.
Matching Method | Matching Algorithms | Scoring Method | Threshold | Special Handling |
---|---|---|---|---|
Exact | Exact | |||
Fuzzy: First Name | Exact Initials Jaro-Winkler Name Variant | Maximum | 85 | The Middle Name field, if used in your matching rule, is compared by the Fuzzy: First Name matching method. |
Fuzzy: Last Name | Exact Keyboard Distance Metaphone 3 | Maximum | 90 | |
Fuzzy: Company Name | Acronym Exact Syllable Alignment | Maximum | 70 | Removes words such as Inc and Corpbefore comparing fields. Also, company names are normalized. For example, IBMis normalized to International Business Machines. |
Fuzzy: Phone | Exact | Weighted Average | 80 | Phone numbers are broken into sections and compared by those sections. Each section has its own matching method and match score. The section scores are weighted to come up with one score for the field. This process works best with North American data.
For example, suppose these two phone numbers are being compared: 1-415-555-1234 and 1-415-555-5678. All sections match exactly except the last 4 digits, so the field has a match score of 90, which is considered a match because it exceeds the threshold of 80. |
Fuzzy: City | Edit Distance Exact | Maximum | 85 | |
Fuzzy: Street | Exact | Weighted Average | 80 | Addresses are broken into sections and compared by those sections. Each section has its own matching method and match score. The section scores are weighted to come up with one score for the field. This process works best with North American data.
For example, suppose these two billing streets are being compared: 123 Market Street, Suite 100 and123 Market Drive, Suite 300. Because only the street number and street name match, the field has a match score of 70, which is not considered a match because it's less than the threshold of 80. |
Fuzzy: ZIP | Exact | Weighted Average | 80 | ZIP codes are broken into sections and compared by those sections. Each section has its own matching method and match score. The section scores are weighted to come up with one score for the field.
For example, suppose these two ZIP codes are being compared: 94104–1001and 94104. Because only the first 5 digits match, the field has a match score of 90, which is considered a match because it exceeds the threshold of 80. |
Fuzzy: Title | Acronym Exact Kullback-Liebler Distance | Maximum | 50 |
No comments:
Post a Comment