Open In App

Python | Extract words from given string

In Python, we sometimes come through situations where we require to get all the words present in the string, this can be a tedious task done using the native method. Hence having shorthand to perform this task is always useful. Additionally, this article also includes the cases in which punctuation marks have to be ignored.

Input: GeeksForGeeks is the best Computer Science Portal 
Output: ['GeeksForGeeks', 'is', 'the', 'best', 'Computer', 'Science', 'Portal'] 
Explanation: In this, we are extracting each word from a given string

Python Extract Words From String

Python Extract String Words using Split() 

In Python, using the split() function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks. 




# initializing string 
test_string = "Geeksforgeeks is best Computer Science Portal"
 
# printing original string
print ("The original string is : " +  test_string)
 
# using split()
# to extract words from string
res = test_string.split()
 
# printing result
print ("The list of words is : " +  str(res))

Output

The original string is : GeeksForGeeks is best Computer Science Portal
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']

Time Complexity: O(n)
Auxiliary Space: O(1)

Python Extract String Words using Find() 

In Python, using the find() function, we can extract string words. The find() method is called on a string and takes a single argument, which is the substring you want to search for. It returns the lowest index of the substring if found, or -1 if the substring is not present.




def extract_words_using_find(input_string):
    words = [input_string[start:space_index] for start, space_index in enumerate(input_string.split(' '))]
    return words
 
sentence ="GeeksForGeeks is best Computer Science Portal"
result_words = extract_words_using_find(sentence)
print(result_words)

Output

['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']

Time Complexity: O(n)
Auxiliary Space: O(1)

Python Extract String Words using List Comprehension

In Python, you can extract words from a string using list comprehension. List comprehension provides a concise and efficient way to iterate over the characters of a string and extract words based on specific conditions.




# Initializing string 
import string
test_string = "GeeksForGeeks,    is best @# Computer Science Portal.!!!"
 
# Using list comprehension and isalnum() method to extract words from string
res = [word.strip(string.punctuation) for word in test_string.split() if word.strip(string.punctuation).isalnum()]
 
# Printing result
print("The list of words is:", res)

Output

The list of words is: ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']

The time complexity of the program is O(n), where n is the length of the test string.
The space complexity of the program is also O(n), where n is the length of the test string. 

Python Extract String Words using Regex

In Python we can extract using Regular Expression. In the cases which contain all the special characters and punctuation marks, the conventional method of finding words in string using split can fail and hence requires regular expressions to perform this task. Findall() function returns the list after filtering the string and extracting words ignoring punctuation marks.




# using regex( findall() )
import re
 
# initializing string 
test_string = "GeeksForGeeks,    is best @# Computer Science Portal.!!!"
 
# printing original string
print ("The original string is : " +  test_string)
 
# using regex( findall() )
# to extract words from string
res = re.findall(r'\w+', test_string)
 
# printing result
print ("The list of words is : " +  str(res))

Output

The original string is : GeeksForGeeks,    is best @# Computer Science Portal.!!!
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']

Time Complexity: O(n)
Auxiliary Space: O(n)

Python Extract String Words using Regex() + String.Punctuation 

This method also used regular expressions, but string function of getting all the punctuations is used to ignore all the punctuation marks and get the filtered result string.




# using regex() + string.punctuation
import re
import string
 
# initializing string 
test_string = "GeeksForGeeks,    is best @# Computer Science Portal.!!!"
 
# printing original string
print ("The original string is : " +  test_string)
 
# using regex() + string.punctuation
# to extract words from string
res = re.sub('['+string.punctuation+']', '', test_string).split()
 
# printing result
print ("The list of words is : " +  str(res))

Output

The original string is : GeeksForGeeks,    is best @# Computer Science Portal.!!!
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']

Time Complexity: O(n)
Auxiliary Space: O(n)

Python Extract String Words using NLP Libraries

Python has a number of natural language processing (NLP) packages that enable sophisticated word extraction features. The NLTK (Natural Language Toolkit) is one such library. Here is an illustration of word extraction using NLTK.




import nltk
string = "GeeksForGeeks is the best Computer Science Portal ."
words = nltk.word_tokenize(string)
print(words)

Output

['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']

Time Complexity: O(n)
Auxiliary Space: O(n)


Article Tags :