# FuzzyWuzzy Python library

There are many methods of comparing string in python. Some of the main methods are:

- Using regex
- Simple compare
- Using difflib

But one of the very easy method is by using **fuzzywuzzy** library where we can have a score out of 100, that denotes two string are equal by giving similarity index. This article talks about how we start using fuzzywuzzy library.

FuzzyWuzzy is a library of Python which is used for string matching. Fuzzy string matching is the process of finding strings that match a given pattern. Basically it uses Levenshtein Distance to calculate the differences between sequences.

FuzzyWuzzy has been developed and open-sourced by SeatGeek, a service to find sport and concert tickets. Their original use case, as discussed in their blog.

- Python 2.4 or higher
- python-Levenshtein

**Requirements of fuzzywuzzy**

**Install via pip : **

pip install fuzzywuzzy pip install python-Levenshtein

**How to use this library ?**

First of import these modules,

`from` `fuzzywuzzy ` `import` `fuzz ` `from` `fuzzywuzzy ` `import` `process ` |

*chevron_right*

*filter_none*

Simple ratio usage :

`fuzz.ratio(` `'geeksforgeeks'` `, ` `'geeksgeeks'` `) ` `87` ` ` `# Exact match ` `fuzz.ratio(` `'GeeksforGeeks'` `, ` `'GeeksforGeeks'` `) ` ` ` `100` `fuzz.ratio(` `'geeks for geeks'` `, ` `'Geeks For Geeks '` `) ` `80` |

*chevron_right*

*filter_none*

`fuzz.partial_ratio(` `"geeks for geeks"` `, ` `"geeks for geeks!"` `) ` `100` `# Exclamation mark in second string, ` `but still partially words are same so score comes ` `100` ` ` `fuzz.partial_ratio(` `"geeks for geeks"` `, ` `"geeks geeks"` `) ` `64` `# score is less because there is a extra ` `token ` `in` `the middle middle of the string. ` |

*chevron_right*

*filter_none*

Now, token set ratio an token sort ratio:

`# Token Sort Ratio ` `fuzz.token_sort_ratio(` `"geeks for geeks"` `, ` `"for geeks geeks"` `) ` `100` ` ` `# This gives 100 as every word is same, irrespective of the position ` ` ` `# Token Set Ratio ` `fuzz.token_sort_ratio(` `"geeks for geeks"` `, ` `"geeks for for geeks"` `) ` `88` ` ` `fuzz.token_set_ratio(` `"geeks for geeks"` `, ` `"geeks for for geeks"` `) ` `100` `# Score comes 100 in second case because token_set_ratio ` `considers duplicate words as a single word. ` |

*chevron_right*

*filter_none*

Now suppose if we have list of list of options and we want to find the closest match(es), we can use the **process** module

`query ` `=` `'geeks for geeks'` `choices ` `=` `[` `'geek for geek'` `, ` `'geek geek'` `, ` `'g. for geeks'` `] ` ` ` `# Get a list of matches ordered by score, default limit to 5 ` `process.extract(query, choices) ` `[(` `'geeks geeks'` `, ` `95` `), (` `'g. for geeks'` `, ` `95` `), (` `'geek for geek'` `, ` `93` `)] ` ` ` `# If we want only the top one ` `process.extractOne(query, choices) ` `(` `'geeks geeks'` `, ` `95` `) ` |

*chevron_right*

*filter_none*

There is also one more ratio which is used often called **WRatio**, sometimes its better to use WRatio instead of simple ratio as WRatio handles lower and upper cases and some other parameters too.

`fuzz.WRatio(` `'geeks for geeks'` `, ` `'Geeks For Geeks'` `) ` `100` `fuzz.WRatio(` `'geeks for geeks!!!'` `,` `'geeks for geeks'` `) ` `100` `# whereas simple ratio will give for above case ` `fuzz.ratio(` `'geeks for geeks!!!'` `,` `'geeks for geeks'` `) ` `91` |

*chevron_right*

*filter_none*

**Full Code**

`# Python code showing all the ratios together, ` `# make sure you have installed fuzzywuzzy module ` ` ` `from` `fuzzywuzzy ` `import` `fuzz ` `from` `fuzzywuzzy ` `import` `process ` ` ` `s1 ` `=` `"I love GeeksforGeeks"` `s2 ` `=` `"I am loving GeeksforGeeks"` `print` `"FuzzyWuzzy Ratio: "` `, fuzz.ratio(s1, s2) ` `print` `"FuzzyWuzzy PartialRatio: "` `, fuzz.partial_ratio(s1, s2) ` `print` `"FuzzyWuzzy TokenSortRatio: "` `, fuzz.token_sort_ratio(s1, s2) ` `print` `"FuzzyWuzzy TokenSetRatio: "` `, fuzz.token_set_ratio(s1, s2) ` `print` `"FuzzyWuzzy WRatio: "` `, fuzz.WRatio(s1, s2),` `'\n\n'` ` ` `# for process library, ` `query ` `=` `'geeks for geeks'` `choices ` `=` `[` `'geek for geek'` `, ` `'geek geek'` `, ` `'g. for geeks'` `] ` `print` `"List of ratios: "` `print` `process.extract(query, choices), ` `'\n'` `print` `"Best among the above list: "` `,process.extractOne(query, choices) ` |

*chevron_right*

*filter_none*

Output:

FuzzyWuzzy Ratio: 84 FuzzyWuzzy PartialRatio: 85 FuzzyWuzzy TokenSortRatio: 84 FuzzyWuzzy TokenSetRatio: 86 FuzzyWuzzy WRatio: 84 List of ratios: [('g. for geeks', 95), ('geek for geek', 93), ('geek geek', 86)] Best among the above list: ('g. for geeks', 95)

The FuzzyWuzzy library is built on top of difflib library, python-Levenshtein is used for speed. So it is one of the best way for string matching in python.

## Recommended Posts:

- Python Faker Library
- Python | Holidays library
- Python | Schedule Library
- Python math library | exp() method
- Python math library | isnan() method
- Python math library | gamma() function
- Python math library | isclose() method
- Python math library | expm1() method
- Pytube | Python library to download youtube videos
- Python math library | isfinite() and remainder() method
- Learning Model Building in Scikit-learn : A Python Machine Learning Library
- Python | Visualize missing values (NaN) values using Missingno Library
- OpenUI5 – Javascript UI Library from SAP
- Software Framework vs Library
- JSTL | JSP Standard Tag Library

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.