Python program to extract Keywords from a list
Given a List of strings, extract all the words that are keywords.
Input : test_list = [“Gfg is True”, “Its a global win”, “try Gfg”],
Output : [‘is’, ‘True’, ‘global’, ‘try’]
Explanation : All strings in result list is valid Python keyword.
Input : test_list = [“try Gfg”],
Output : [‘try’]
Explanation : try is used in try/except block, hence a keyword.
Method #1 : Using iskeyword() + split() + loop
This is one of the ways in which this task can be performed. In this, we check for keyword using iskeyword() and convert a string to words using split(). The logic of extension to all strings happens using loop.
Python3
import keyword
test_list = [ "Gfg is True" , "Gfg will yield a return" ,
"Its a global win" , "try Gfg" ]
print ( "The original list is : " + str (test_list))
res = []
for sub in test_list:
for word in sub.split():
if keyword.iskeyword(word):
res.append(word)
print ( "Extracted Keywords : " + str (res))
|
Output
The original list is : ['Gfg is True', 'Gfg will yield a return', 'Its a global win', 'try Gfg']
Extracted Keywords : ['is', 'True', 'yield', 'return', 'global', 'try']
Time Complexity: O(n2)
Auxiliary Space: O(n)
Method #2: Using list comprehension
This is yet another way in which this task can be performed. Similar to the above method but much compact on paper, use similar functionalities as the above method.
Python3
import keyword
test_list = [ "Gfg is True" , "Gfg will yield a return" ,
"Its a global win" , "try Gfg" ]
print ( "The original list is : " + str (test_list))
res = [ele for sub in test_list for ele in sub.split() if keyword.iskeyword(ele)]
print ( "Extracted Keywords : " + str (res))
|
Output
The original list is : ['Gfg is True', 'Gfg will yield a return', 'Its a global win', 'try Gfg']
Extracted Keywords : ['is', 'True', 'yield', 'return', 'global', 'try']
Output:
The original list is : [‘Gfg is True’, ‘Gfg will yield a return’, ‘Its a global win’, ‘try Gfg’] Extracted Keywords : [‘is’, ‘True’, ‘yield’, ‘return’, ‘global’, ‘try’]
Time Complexity: O(n2)
Auxiliary Space: O(n)
Approach#3: Using re.findall(): This approach to solving this problem is to use regular expressions to extract words that match Python keywords. We can use the re-module to create a regular expression that matches Python keywords. Then, we can iterate over the given list and use the re.findall() function to extract all words that match the regular expression. Finally, we can remove any duplicates from the list of extracted keywords.
- Define a function to extract keywords from a list using regular expressions.
- Create a regular expression that matches Python keywords.
- Iterate over the given list and use the re.findall() function to extract all words that match the regular expression.
- Remove any duplicates from the list of extracted keywords.
- Return the list of extracted keywords.
Python3
import re
import keyword
def extract_keywords(string_list):
python_keywords = set (keyword.kwlist)
pattern = re. compile (r '\b(' + '|' .join(python_keywords) + r ')\b' )
extracted_keywords = []
for string in string_list:
words = pattern.findall(string)
extracted_keywords.extend(words)
return list ( set (extracted_keywords))
string_list = [ "Gfg is True" , "Gfg will yield a return" ,
"Its a global win" , "try Gfg" ]
print (extract_keywords(string_list))
|
Output
['True', 'yield', 'return', 'try', 'is', 'global']
Time Complexity: O(n*m), where n is the number of strings in the list and m is the average length of each string.
Space Complexity: O(k), where k is the number of unique Python keywords.
Approach 4: Using a set intersection method.
Steps-by-step approach:
- Create a set of all Python keywords using the keyword module.
- Loop through each string in the string_list.
- Split the string into words using the split() method.
- Convert the list of words into a set using the set() method.
- Find the intersection of the sets created in steps 2 and 4 using the & operator.
- Add the intersecting words to a list.
- Remove duplicates from the list using the list(set()) method.
- Return the final list of extracted keywords.
Python3
import keyword
def extract_keywords(string_list):
python_keywords = set (keyword.kwlist)
extracted_keywords = []
for string in string_list:
words = set (string.split())
intersect = words & python_keywords
extracted_keywords + = list (intersect)
return list ( set (extracted_keywords))
string_list = [ "Gfg is True" , "Gfg will yield a return" ,
"Its a global win" , "try Gfg" ]
print (extract_keywords(string_list))
|
Output
['return', 'try', 'global', 'yield', 'is', 'True']
Time Complexity: O(n * m), where n is the number of strings in string_list and m is the average number of words in each string.
Auxiliary Space: O(k), where k is the number of unique keywords extracted from the string_list.
Approach 5: Using numpy:
- Convert the list of strings into a NumPy array of strings using np.array(test_list, dtype=’U’).
- Split the array of strings into an array of arrays of words using np.char.split(arr).
- Flatten the array of arrays of words into a 1D array of words using np.concatenate(words).
- Use np.vectorize(keyword.iskeyword) to vectorize the keyword.iskeyword function for use with NumPy arrays.
- Extract the keywords by applying the vectorized keyword.iskeyword function to the 1D array of words using is_kw(flat_words).
- Filter out the non-keywords from the flattened array of words using boolean indexing,
- flat_words[is_kw(flat_words)].
Python3
import numpy as np
import keyword
test_list = [ "Gfg is True" , "Gfg will yield a return" ,
"Its a global win" , "try Gfg" ]
print ( "The original list is : " + str (test_list))
arr = np.array(test_list, dtype = 'U' )
words = np.char.split(arr)
flat_words = np.concatenate(words)
is_kw = np.vectorize(keyword.iskeyword)
keywords = flat_words[is_kw(flat_words)]
print ( "Extracted Keywords : " + str (keywords))
|
Output:
The original list is : ['Gfg is True', 'Gfg will yield a return', 'Its a global win', 'try Gfg']
Extracted Keywords : ['is' 'True' 'yield' 'return' 'global' 'try']
The time complexity : O(nm), where n is the number of strings in the input list, and m is the maximum number of words in any string.
The space complexity: O(nm), because we are creating a new NumPy array for the words and a flattened 1D array of words.
Last Updated :
11 May, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...