Open In App

Python | Split a sentence into list of words

Improve
Improve
Like Article
Like
Save
Share
Report

Given a Sentence, write a Python program to convert the given sentence into a list of words. 

Examples: 

Input : 'Hello World'
Output : ['Hello', 'world']

Method 1: Split a sentence into a list using split()

The simplest approach provided by Python to convert the given list of Sentences into words with separate indices is to use split() method. This method split a string into a list where each word is a list item. We have alternative ways to use this function in order to achieve the required output.

Python3




# Driver code
lst =  "Geeks For geeks"
print( lst.split())


Output

['Geeks', 'For', 'geeks']

Time Complexity: O(n), where n is the length of the list 
Auxiliary Space: O(n) additional space of size n is created where n is the number of elements in the list

Method 2: Split a sentence into a list using for loop 

We can also use a Python for loop to split the first element. This method is also beneficial if we have more than one element.  

Python3




def convert(lst):
    return ([i for i in lst.split()])
     
# Driver code
lst =  'Geeksforgeeks is a portal for geeks'
print( convert(lst))


Output

['Geeksforgeeks', 'is', 'a', 'portal', 'for', 'geeks']

Time Complexity: O(n), where n is the length of the list lst.
Auxiliary Space: O(n), additional space of size n is created where n is the number of elements in the list

Method 3: Split a sentence into a list using join() 

We can split the given list and then join using join() function. We can also use this when you have a list of strings or a single string inside a list.  

Python3




def convert(lst):
    return ''.join(lst).split()
     
 
# Driver code
lst =  'Hello Geeks for geeks'
print( convert(lst))


Output

['Hello', 'Geeks', 'for', 'geeks']

Method 4: Split a sentence into a list using nltk

For our particular issue, the nltk library’s word tokenize() method can be used. This function divides a string into several substrings by taking a string as an input parameter.

Python3




import nltk
nltk.download('punkt')
 
string = "This is a sentence"
lst = nltk.word_tokenize(string)
print(lst)


Output:

['This', 'is', 'geeksforgeeks']

Method 5: Using re

Approach is using regular expressions to split the sentence into a list of words. Here is an example of how this could be done using the re module:

Python3




import re
 
def split_sentence(sentence):
    return re.findall(r'\b\w+\b', sentence)
#Driver code
sentence = 'Hello Geeks for geeks'
print(split_sentence(sentence))
#This code is contributed by Edula Vinay Kumar Reddy


Output

['Hello', 'Geeks', 'for', 'geeks']

This approach uses a regular expression to match any sequence of word characters (letters and digits) surrounded by word boundaries (non-word characters or the start/end of the string). The findall function returns a list of all the matches in the string. This can be a useful method if you need to split the sentence into words while ignoring punctuation or other non-word characters.

The time complexity of the split_sentence function is O(n), where n is the length of the input string. This is because the findall function performs a linear scan of the input string to find all the matches.

The space complexity of the split_sentence function is also O(n), n is the number of words in the input string. 

METHOD 6:Using lambda function.

APPROACH:

This approach splits the input sentence into a list of words using the split() method, and then applies a filter using filter() method and a lambda function that checks if each word contains any digits using isdigit() method. The filter function returns True if the word does not contain any digits, and False otherwise. The list() method is used to convert the resulting filter object into a list of valid words

ALGORITHM:

1. Split the input sentence into a list of words using the split() method.
2. Apply a filter to the list of words using the filter() method and a lambda function that checks if each word contains any digits using isdigit() method.
3. Convert the resulting filter object into a list of valid words using the list() method.
4. Return the list of valid words.

Python3




sentence = 'Hello Geeks for geeks'
words = list(filter(lambda word: not any(char.isdigit()
                                         for char in word), sentence.split()))
print(words)  # Output: ['Hello', 'Geeks', 'for', 'geeks']


Output

['Hello', 'Geeks', 'for', 'geeks']

Time complexity:

1. Splitting the sentence into a list of words using the split() method takes O(n) time, where n is the length of the sentence.
2. Checking if each word contains any digits using isdigit() method takes O(m) time, where m is the length of the word.
3.Applying the filter using filter() method takes O(n) time in the worst case.
4. Converting the resulting filter object into a list of valid words using the list() method takes O(k) time, where k is the number of valid words in the list.
5. Therefore, the overall time complexity of this approach is O(n * m + n + k).

Space complexity:

1. This approach uses O(n + k) space, where n is the length of the sentence and k is the number of valid words in the list.
2. This is because we create a list of words using the split() method, which takes O(n) space, and then create a new list of valid words using the filter() and list() methods, which takes O(k) space.
3. The lambda function used in the filter() method does not use any extra space.



Last Updated : 18 May, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads