In Python, we sometimes come through situations where we require to get all the words present in the string, this can be a tedious task done using the native method. Hence having shorthand to perform this task is always useful. Additionally, this article also includes the cases in which punctuation marks have to be ignored.
Input: GeeksForGeeks is the best Computer Science Portal
Output: ['GeeksForGeeks', 'is', 'the', 'best', 'Computer', 'Science', 'Portal']
Explanation: In this, we are extracting each word from a given string
Python Extract Words From String
Python Extract String Words using Split()
In Python, using the split() function, we can split the string into a list of words and this is the most generic and recommended method if one wished to accomplish this particular task. But the drawback is that it fails in cases the string contains punctuation marks.
Python3
test_string = "Geeksforgeeks is best Computer Science Portal"
print ( "The original string is : " + test_string)
res = test_string.split()
print ( "The list of words is : " + str (res))
|
Output
The original string is : GeeksForGeeks is best Computer Science Portal
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(1)
Python Extract String Words using Find()
In Python, using the find() function, we can extract string words. The find()
method is called on a string and takes a single argument, which is the substring you want to search for. It returns the lowest index of the substring if found, or -1 if the substring is not present.
Python3
def extract_words_using_find(input_string):
words = [input_string[start:space_index] for start, space_index in enumerate (input_string.split( ' ' ))]
return words
sentence = "GeeksForGeeks is best Computer Science Portal"
result_words = extract_words_using_find(sentence)
print (result_words)
|
Output
['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(1)
Python Extract String Words using List Comprehension
In Python, you can extract words from a string using list comprehension. List comprehension provides a concise and efficient way to iterate over the characters of a string and extract words based on specific conditions.
Python3
import string
test_string = "GeeksForGeeks, is best @# Computer Science Portal.!!!"
res = [word.strip(string.punctuation) for word in test_string.split() if word.strip(string.punctuation).isalnum()]
print ( "The list of words is:" , res)
|
Output
The list of words is: ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
The time complexity of the program is O(n), where n is the length of the test string.
The space complexity of the program is also O(n), where n is the length of the test string.
Python Extract String Words using Regex
In Python we can extract using Regular Expression. In the cases which contain all the special characters and punctuation marks, the conventional method of finding words in string using split can fail and hence requires regular expressions to perform this task. Findall() function returns the list after filtering the string and extracting words ignoring punctuation marks.
Python3
import re
test_string = "GeeksForGeeks, is best @# Computer Science Portal.!!!"
print ( "The original string is : " + test_string)
res = re.findall(r '\w+' , test_string)
print ( "The list of words is : " + str (res))
|
Output
The original string is : GeeksForGeeks, is best @# Computer Science Portal.!!!
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(n)
Python Extract String Words using Regex() + String.Punctuation
This method also used regular expressions, but string function of getting all the punctuations is used to ignore all the punctuation marks and get the filtered result string.
Python3
import re
import string
test_string = "GeeksForGeeks, is best @# Computer Science Portal.!!!"
print ( "The original string is : " + test_string)
res = re.sub( '[' + string.punctuation + ']' , '', test_string).split()
print ( "The list of words is : " + str (res))
|
Output
The original string is : GeeksForGeeks, is best @# Computer Science Portal.!!!
The list of words is : ['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(n)
Python Extract String Words using NLP Libraries
Python has a number of natural language processing (NLP) packages that enable sophisticated word extraction features. The NLTK (Natural Language Toolkit) is one such library. Here is an illustration of word extraction using NLTK.
Python3
import nltk
string = "GeeksForGeeks is the best Computer Science Portal ."
words = nltk.word_tokenize(string)
print (words)
|
Output
['GeeksForGeeks', 'is', 'best', 'Computer', 'Science', 'Portal']
Time Complexity: O(n)
Auxiliary Space: O(n)