Python program for most frequent word in Strings List
Last Updated :
15 Apr, 2023
Given Strings List, write a Python program to get word with most number of occurrences.
Example:
Input : test_list = [“gfg is best for geeks”, “geeks love gfg”, “gfg is best”]
Output : gfg
Explanation : gfg occurs 3 times, most in strings in total.
Input : test_list = [“geeks love gfg”, “geeks are best”]
Output : geeks
Explanation : geeks occurs 2 times, most in strings in total.
Method #1 : Using loop + max() + split() + defaultdict()
In this, we perform task of getting each word using split(), and increase its frequency by memorizing it using defaultdict(). At last, max(), is used with parameter to get count of maximum frequency string.
Python3
from collections import defaultdict
test_list = [ "gfg is best for geeks" , "geeks love gfg" , "gfg is best" ]
print ( "The original list is : " + str (test_list))
temp = defaultdict( int )
for sub in test_list:
for wrd in sub.split():
temp[wrd] + = 1
res = max (temp, key = temp.get)
print ( "Word with maximum frequency : " + str (res))
|
Output
The original list is : ['gfg is best for geeks', 'geeks love gfg', 'gfg is best']
Word with maximum frequency : gfg
Time Complexity: O(n*n)
Auxiliary Space: O(n)
Method #2 : Using list comprehension + mode()
In this, we get all the words using list comprehension and get maximum frequency using mode().
Python3
from statistics import mode
test_list = [ "gfg is best for geeks" , "geeks love gfg" , "gfg is best" ]
print ( "The original list is : " + str (test_list))
temp = [wrd for sub in test_list for wrd in sub.split()]
res = mode(temp)
print ( "Word with maximum frequency : " + str (res))
|
Output
The original list is : ['gfg is best for geeks', 'geeks love gfg', 'gfg is best']
Word with maximum frequency : gfg
Method #3: Using list() and Counter()
- Append all words to empty list and calculate frequency of all words using Counter() function.
- Find max count and print that key.
Below is the implementation:
Python3
from collections import Counter
def mostFrequentWord(words):
lis = []
for i in words:
for j in i.split():
lis.append(j)
freq = Counter(lis)
max = 0
for i in freq:
if (freq[i] > max ):
max = freq[i]
word = i
return word
words = [ "gfg is best for geeks" , "geeks love gfg" , "gfg is best" ]
print ( "The original list is : " + str (words))
print ( "Word with maximum frequency : " + mostFrequentWord(words))
|
Output
The original list is : ['gfg is best for geeks', 'geeks love gfg', 'gfg is best']
Word with maximum frequency : gfg
The time and space complexity for all the methods are the same:
Time Complexity: O(n2)
Space Complexity: O(n)
Method #4: Using Counter() and reduce()
Here is an approach to solve the problem using the most_common() function of the collections module’s Counter class and the reduce() function from the functools module:
Python3
from collections import Counter
from functools import reduce
def most_frequent_word(test_list):
all_words = reduce ( lambda a, b: a + b, [sub.split() for sub in test_list])
word_counts = Counter(all_words)
return word_counts.most_common( 1 )[ 0 ][ 0 ]
test_list = [ "gfg is best for geeks" , "geeks love gfg" , "gfg is best" ]
print ( "The original list is: " , test_list)
print ( "Word with most frequency: " , most_frequent_word(test_list))
|
Output
The original list is: ['gfg is best for geeks', 'geeks love gfg', 'gfg is best']
Word with most frequency: gfg
Explanation:
We use the reduce() function to concatenate the list of all words from each string in the test_list.
We then create a Counter object from the list of all words to get a count of the frequency of each word.
Finally, we use the most_common() function to get the word with the highest frequency and return it.
Time complexity: O(n * k), where n is the number of strings in the test_list and k is the average number of words in each string.
Auxiliary Space: O(n * k), since we are storing the words in a list before creating a Counter object.
Method #5: Using heapq:
- We start by initializing an empty list all_words, which will be used to store all the individual words from the input list.
- We iterate over each string in the input list using a list comprehension and split each string into individual words using the split() method.
- We add the resulting list of words to all_words using the extend() method.
- We create a Counter object from the list of words. A Counter object is a dictionary that stores the frequency of each element in the list.
- We use the heapq.nlargest() function to get the word with the highest frequency from the Counter object.
- We return the most frequent word.
Python3
import heapq
from collections import Counter
def most_frequent_word(test_list):
all_words = [sub.split() for sub in test_list]
word_counts = Counter(word for sublist in all_words for word in sublist)
return heapq.nlargest( 1 , word_counts, key = word_counts.get)[ 0 ]
test_list = [ "gfg is best for geeks" , "geeks love gfg" , "gfg is best" ]
print ( "The original list is: " , test_list)
print ( "Word with most frequency: " , most_frequent_word(test_list))
|
Output
The original list is: ['gfg is best for geeks', 'geeks love gfg', 'gfg is best']
Word with most frequency: gfg
The time complexity : O(n log k), where n is the total number of words in the input list and k is the number of unique words. The most time-consuming operation in this algorithm is the creation of the Counter object, which has a time complexity of O(n). The heapq.nlargest() function has a time complexity of O(k log k), as it maintains a heap of size k.
The auxiliary space : O(k), where k is the number of unique words in the input list. This is because we create a Counter object and a heap of size k to store the k most frequent words.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...