Find the k most frequent words from data set in Python
Given the data set, we can find k number of most frequent words.
The solution of this problem already present as Find the k most frequent words from a file. But we can solve this problem very efficiently in Python with the help of some high performance modules.
In order to do this, we’ll use a high performance data type module, which is collections. This module got some specialized container datatypes and we will use counter class from this module.
Input : "John is the son of John second. Second son of John second is William second." Output : [('second', 4), ('John', 3), ('son', 2), ('is', 2)] Explanation : 1. The string will converted into list like this : ['John', 'is', 'the', 'son', 'of', 'John', 'second', 'Second', 'son', 'of', 'John', 'second', 'is', 'William', 'second'] 2. Now 'most_common(4)' will return four most frequent words and its count in tuple. Input : "geeks for geeks is for geeks. By geeks and for the geeks." Output : [('geeks', 5), ('for', 3)] Explanation : most_common(2) will return two most frequent words and their count.
- Import Counter class from collections module.
- Split the string into list using split(), it will return the lists of words.
- Now pass the list to the instance of Counter class
- The function 'most-common()' inside Counter will return the list of most frequent words from list and its count.
Below is Python implementation of above approach :
[('Geeks', 5), ('to', 4), ('and', 4), ('article', 3)]