Open In App

Finding Similar Type of Names | Fuzzy Search in SQL

Last Updated : 10 May, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

Fuzzy Search :
A technique of finding the strings that match a pattern approximately (rather than exactly). Users / Reviewers often capture names inaccurately. Typing mistake is a very common mistake that reviewer does while capturing the names which leads to inconsistency in Data. But sometimes, we need to search or match this inaccurate data anyway. For example, users should match existing customer records rather than creating unwanted duplicates.

Problem Statement :
Finding all kind of error types that described in the below table e.g. Spelling Mistake, Incomplete Name, Formatting Issue, Suffix Missing.

     FirstName           LastName             FirstName          LastName                   Error Type                  
Benjamin Sheriff Benjamin Sherrif Spelling Mistake
Cocoa Fairchil Cocoa Fairchild Spelling Mistake
Danuzio Pinheiro Danuzio Pinherio Spelling Mistake
Jeff Voight Jeff Voigt Spelling Mistake
Marie Prendergast Marie Prendergrast Spelling Mistake
Michael Heinrich Michael Heinricy Spelling Mistake
Nghia Le Nghia Lee Spelling Mistake
Leonardo Madrigal Leonardo Madrigal Del Valle Incomplete Name
Yolanda Balcazar Yolanda Balcazar-Rodriguez Incomplete Name
J Garcia Martinez J Garcia-Martinez Formatting Issue
Lauren Lord – Tuley Lauren Lord-Tuley Formatting Issue
Virginia Pulley Alberts Virginia Pulley-Alberts Formatting Issue
Connie Marks Connie Marks Iv Suffix Missing
Daniel Ladd Daniel Ladd Jr Suffix Missing
Rodrigo Delgado Rodrigo Delgado Jr Suffix Missing
Thomas Griffin Thomas Griffin Iii Suffix Missing
Wayne Nance Wayne Nance Ii Suffix Missing

Approach to solve :
SOUNDEX() Function can find the inconsistency in names. SOUNDEX() can evaluate the similarity of two names. SOUNDEX() only works well when we do have 1 or 2 tokens. Names generally contain 1-2 tokens, so it works fine with Name. But if we want to find similar names for the company then it’s not that useful as it contains multiple tokens.

Data :
Benjamin Sheriff and Benjamin Sherrif where FirstName is same but any of the LastName has a spelling error. Likewise, there are many errors in this data like formatting issues, incomplete names.

               FirstName                            LastName              
Benjamin Sheriff
Benjamin Sherrif
Virginia Pulley Alberts
Virginia Pulley-Alberts
Geoffery Bown
Geoffrey Bown
Anthoni Castoldi
Anthony Castoldi
Rodrigo Delgado
Rodrigo Delgado Jr
Leonardo Madrigal
Leonardo Madrigal Del Valle
Mitchel Myers
Mitchell Myers

Code :
Case-1: FirstName is exact the same but LastName is a similar type.
Case-2: LastName is exact the same but FirstName is a similar type.

Let’s discuss it one by one as follows.

Case-1 : 
FirstName is exact the same but LastName is similar type :

select distinct ss.firstname,ss.lastname,sd.firstname,sd.lastname
from load as ss, load as sd
where ss.firstName=sd.firstName and SOUNDEX(ss.lastName)=SOUNDEX(sd.lastname)
      and
      left(ss.lastname,2)=left(sd.lastname,2)
      and
      ss.lastName<>sd.lastName

Output : 

               FirstName                            LastName                              FirstName                            LastName              
Benjamin Sheriff Benjamin Sherrif
Benjamin Sherrif Benjamin Sheriff
Leonardo Madrigal Leonardo Madrigal Del Valle
Leonardo Madrigal Del Valle Leonardo Madrigal
Rodrigo Delgado Rodrigo Delgado Jr
Rodrigo Delgado Jr Rodrigo Delgado
Virginia Pulley Alberts Virginia Pulley-Alberts
Virginia Pulley-Alberts Virginia Pulley Alberts

Case-2 :
LastName is exact same but FirstName is similar type :

select distinct ss.lastname,ss.firstname,sd.lastname,sd.firstname
from load as ss, load as sd
where ss.lastname=sd.lastname and SOUNDEX(ss.firstname)=SOUNDEX(sd.firstname)
      and
      left(ss.firstname,2)=left(sd.firstname,2)
      and
      ss.firstname<>sd.firstname

Output : 

S.No.                FirstName                            LastName                              FirstName                            LastName              
1. Anthoni Castoldi Anthony Castoldi
2. Anthony Castoldi Anthoni Castoldi
3. Geoffery Bown Geoffrey Bown
4. Geoffrey Bown Geoffery Bown
5. Mitchel Myers Mitchell Myers
6. Mitchell Myers Mitchel Myers

Reference :
Github Link – https://github.com/SuryaSD/Finding-Similar-Types-of-Names-Fuzzy-Search


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads