Python – Get word frequency in percentage
Last Updated :
09 Mar, 2023
Given a list of strings, the task is to write a Python program to get a percentage share of each word in the strings list.
Computational Explanation: (Occurrence of X word / Total words) * 100.
Example:
Input : test_list = [“Gfg is best for geeks”, “All love Gfg”, “Gfg is best for CS”, “For CS geeks Gfg is best”]
Output : {‘Gfg’: 0.21052631578947367, ‘is’: 0.15789473684210525, ‘best’: 0.15789473684210525, ‘for’: 0.10526315789473684, ‘geeks’: 0.10526315789473684, ‘All’: 0.05263157894736842, ‘love’: 0.05263157894736842, ‘CS’: 0.10526315789473684, ‘For’: 0.05263157894736842}
Explanation : Frequency percentage of each word wrt. all other words in list is computed. Gfg occurs 4 times. Total words = 19.
Input : test_list = [“Gfg is best for geeks”, “All love Gfg”]
Output : {‘Gfg’: 0.25, ‘is’: 0.125, ‘best’: 0.125, ‘for’: 0.125, ‘geeks’: 0.125, ‘All’: 0.125, ‘love’: 0.125}
Explanation : Frequency percentage of each word wrt. all other words in list is computed.
Method #1: Using sum() + Counter()+ join() + split()
In this, we perform the task of getting each word using split() after joining each string using join(). Counter() gets the frequency of each word mapped. Post that all words size computed using sum(), can get the required share of each word, harnessing frequency stored in Counter.
Python3
from collections import Counter
test_list = [ "Gfg is best for geeks" ,
"All love Gfg" ,
"Gfg is best for CS" ,
"For CS geeks Gfg is best" ]
print ( "The original list is : " + str (test_list))
joined = " " .join(ele for ele in test_list)
mappd = Counter(joined.split())
total_val = sum (mappd.values())
res = {key: val / total_val for key,
val in mappd.items()}
print ( "Percentage share of each word : " + str (res))
|
Output
The original list is : ['Gfg is best for geeks', 'All love Gfg', 'Gfg is best for CS', 'For CS geeks Gfg is best']
Percentage share of each word : {'Gfg': 0.21052631578947367, 'is': 0.15789473684210525, 'best': 0.15789473684210525, 'for': 0.10526315789473684, 'geeks': 0.10526315789473684, 'All': 0.05263157894736842, 'love': 0.05263157894736842, 'CS': 0.10526315789473684, 'For': 0.05263157894736842}
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #2: Using combined one-liner
Similar to the above method, just combining each segment to provide a compact one liner solution.
Python3
from collections import Counter
test_list = [ "Gfg is best for geeks" , "All love Gfg" ,
"Gfg is best for CS" , "For CS geeks Gfg is best" ]
print ( "The original list is : " + str (test_list))
mappd = Counter( " " .join(ele for ele in test_list).split())
res = {key: val / sum (mappd.values()) for key,
val in mappd.items()}
print ( "Percentage share of each word : " + str (res))
|
Output
The original list is : ['Gfg is best for geeks', 'All love Gfg', 'Gfg is best for CS', 'For CS geeks Gfg is best']
Percentage share of each word : {'Gfg': 0.21052631578947367, 'is': 0.15789473684210525, 'best': 0.15789473684210525, 'for': 0.10526315789473684, 'geeks': 0.10526315789473684, 'All': 0.05263157894736842, 'love': 0.05263157894736842, 'CS': 0.10526315789473684, 'For': 0.05263157894736842}
The time and space complexity of method 1 and 2 is :
Time Complexity: O(n)
Auxiliary Space: O(n)
Method #3 : Using join(),split() and count()
Initially join all the elements of list by space, after that split the string by space which will result in a list.Now iterate over a list and check whether element is already present or not in dictionary keys.If not present add element as key to dictionary with occurrence of word divided by length of list as value(nothing but word frequency percentage)
Python3
test_list = [ "Gfg is best for geeks" ,
"All love Gfg" ,
"Gfg is best for CS" ,
"For CS geeks Gfg is best" ]
print ( "The original list is : " + str (test_list))
joined = " " .join(ele for ele in test_list)
p = joined.split()
d = dict ()
for i in p:
if i not in d.keys():
d[i] = p.count(i) / len (p)
print ( "Percentage share of each word : " + str (d))
|
Output
The original list is : ['Gfg is best for geeks', 'All love Gfg', 'Gfg is best for CS', 'For CS geeks Gfg is best']
Percentage share of each word : {'Gfg': 0.21052631578947367, 'is': 0.15789473684210525, 'best': 0.15789473684210525, 'for': 0.10526315789473684, 'geeks': 0.10526315789473684, 'All': 0.05263157894736842, 'love': 0.05263157894736842, 'CS': 0.10526315789473684, 'For': 0.05263157894736842}
Time complexity: O(n^2)
Auxiliary Space: O(n)
Method #4 : Using join(),split() and operator.countOf()
Python3
import operator as op
test_list = [ "Gfg is best for geeks" ,
"All love Gfg" ,
"Gfg is best for CS" ,
"For CS geeks Gfg is best" ]
print ( "The original list is : " + str (test_list))
joined = " " .join(ele for ele in test_list)
p = joined.split()
d = dict ()
for i in p:
if i not in d.keys():
d[i] = op.countOf(p,i) / len (p)
print ( "Percentage share of each word : " + str (d))
|
Output
The original list is : ['Gfg is best for geeks', 'All love Gfg', 'Gfg is best for CS', 'For CS geeks Gfg is best']
Percentage share of each word : {'Gfg': 0.21052631578947367, 'is': 0.15789473684210525, 'best': 0.15789473684210525, 'for': 0.10526315789473684, 'geeks': 0.10526315789473684, 'All': 0.05263157894736842, 'love': 0.05263157894736842, 'CS': 0.10526315789473684, 'For': 0.05263157894736842}
Time Complexity: O(N)
Auxiliary Space: O(N)
Method #5 : Using reduce() and Counter():
1. Initialize a list of strings test_list.
2. Import the Counter and reduce functions from the collections and functools modules, respectively.
3. Use the reduce function to iterate over the list of strings test_list.
4.For each string y in test_list, split the string into individual words using the split() method.
5. Use the Counter() function to count the frequency of each word in the list of words.
6. Add the Counter object to the previous value of x using the + operator.
7. The result of the reduce function is a Counter object that contains the frequency of each word in all the strings in test_list.
8. Use the sum() function to calculate the total number of words in the list.
9. Use a dictionary comprehension to iterate over the key-value pairs in the Counter object.
10. For each key-value pair, calculate the percentage share of the word by dividing the value by the total number of words.
11. Store the percentage share of each word in a dictionary.
12. Print the dictionary of percentage shares.
Python3
from collections import Counter
from functools import reduce
test_list = [ "Gfg is best for geeks" , "All love Gfg" ,
"Gfg is best for CS" , "For CS geeks Gfg is best" ]
print ( "The original list is : " + str (test_list))
word_counts = reduce ( lambda x, y: x + Counter(y.split()), test_list, Counter())
total_words = sum (word_counts.values())
res = {key: val / total_words for key, val in word_counts.items()}
print ( "Percentage share of each word : " + str (res))
|
Output
The original list is : ['Gfg is best for geeks', 'All love Gfg', 'Gfg is best for CS', 'For CS geeks Gfg is best']
Percentage share of each word : {'Gfg': 0.21052631578947367, 'is': 0.15789473684210525, 'best': 0.15789473684210525, 'for': 0.10526315789473684, 'geeks': 0.10526315789473684, 'All': 0.05263157894736842, 'love': 0.05263157894736842, 'CS': 0.10526315789473684, 'For': 0.05263157894736842}
The time complexity: O(N*M), where N is the number of strings in the list and M is the maximum number of words in a string. This is because we need to split each string into individual words and count the frequency of each word.
The space complexity: O(N*M) as well because we need to store the frequency counts of each word in a dictionary. The reduce function also creates a new Counter object for each string in the list, which adds to the space complexity.
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...