Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

Finding Similar Type of Names | Fuzzy Search in SQL

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

Fuzzy Search :
A technique of finding the strings that match a pattern approximately (rather than exactly). Users / Reviewers often capture names inaccurately. Typing mistake is a very common mistake that reviewer does while capturing the names which leads to inconsistency in Data. But sometimes, we need to search or match this inaccurate data anyway. For example, users should match existing customer records rather than creating unwanted duplicates.

Problem Statement :
Finding all kind of error types that described in the below table e.g. Spelling Mistake, Incomplete Name, Formatting Issue, Suffix Missing.

     FirstName          LastName            FirstName         LastName                  Error Type                  
BenjaminSheriffBenjaminSherrifSpelling Mistake
CocoaFairchilCocoaFairchildSpelling Mistake
DanuzioPinheiroDanuzioPinherioSpelling Mistake
JeffVoightJeffVoigtSpelling Mistake
MariePrendergastMariePrendergrastSpelling Mistake
MichaelHeinrichMichaelHeinricySpelling Mistake
NghiaLeNghiaLeeSpelling Mistake
LeonardoMadrigalLeonardoMadrigal Del ValleIncomplete Name
YolandaBalcazarYolandaBalcazar-RodriguezIncomplete Name
JGarcia MartinezJGarcia-MartinezFormatting Issue
LaurenLord – TuleyLaurenLord-TuleyFormatting Issue
VirginiaPulley AlbertsVirginiaPulley-AlbertsFormatting Issue
ConnieMarksConnieMarks IvSuffix Missing
DanielLaddDanielLadd JrSuffix Missing
RodrigoDelgadoRodrigoDelgado JrSuffix Missing
ThomasGriffinThomasGriffin IiiSuffix Missing
WayneNanceWayneNance IiSuffix Missing

Approach to solve :
SOUNDEX() Function can find the inconsistency in names. SOUNDEX() can evaluate the similarity of two names. SOUNDEX() only works well when we do have 1 or 2 tokens. Names generally contain 1-2 tokens, so it works fine with Name. But if we want to find similar names for the company then it’s not that useful as it contains multiple tokens.

Data :
Benjamin Sheriff and Benjamin Sherrif where FirstName is same but any of the LastName has a spelling error. Likewise, there are many errors in this data like formatting issues, incomplete names.

               FirstName                           LastName              
BenjaminSheriff
BenjaminSherrif
VirginiaPulley Alberts
VirginiaPulley-Alberts
GeofferyBown
GeoffreyBown
AnthoniCastoldi
AnthonyCastoldi
RodrigoDelgado
RodrigoDelgado Jr
LeonardoMadrigal
LeonardoMadrigal Del Valle
MitchelMyers
MitchellMyers

Code :
Case-1: FirstName is exact the same but LastName is a similar type.
Case-2: LastName is exact the same but FirstName is a similar type.

Let’s discuss it one by one as follows.

Case-1 : 
FirstName is exact the same but LastName is similar type :

select distinct ss.firstname,ss.lastname,sd.firstname,sd.lastname
from load as ss, load as sd
where ss.firstName=sd.firstName and SOUNDEX(ss.lastName)=SOUNDEX(sd.lastname)
      and
      left(ss.lastname,2)=left(sd.lastname,2)
      and
      ss.lastName<>sd.lastName

Output : 

               FirstName                           LastName                             FirstName                           LastName              
BenjaminSheriffBenjaminSherrif
BenjaminSherrifBenjaminSheriff
LeonardoMadrigalLeonardoMadrigal Del Valle
LeonardoMadrigal Del ValleLeonardoMadrigal
RodrigoDelgadoRodrigoDelgado Jr
RodrigoDelgado JrRodrigoDelgado
VirginiaPulley AlbertsVirginiaPulley-Alberts
VirginiaPulley-AlbertsVirginiaPulley Alberts

Case-2 :
LastName is exact same but FirstName is similar type :

select distinct ss.lastname,ss.firstname,sd.lastname,sd.firstname
from load as ss, load as sd
where ss.lastname=sd.lastname and SOUNDEX(ss.firstname)=SOUNDEX(sd.firstname)
      and
      left(ss.firstname,2)=left(sd.firstname,2)
      and
      ss.firstname<>sd.firstname

Output : 

S.No.               FirstName                           LastName                             FirstName                           LastName              
1.AnthoniCastoldiAnthonyCastoldi
2.AnthonyCastoldiAnthoniCastoldi
3.GeofferyBownGeoffreyBown
4.GeoffreyBownGeofferyBown
5.MitchelMyersMitchellMyers
6.MitchellMyersMitchelMyers

Reference :
Github Link – https://github.com/SuryaSD/Finding-Similar-Types-of-Names-Fuzzy-Search

My Personal Notes arrow_drop_up
Last Updated : 10 May, 2021
Like Article
Save Article
Similar Reads