Python | Delimited String List to String Matrix
Last Updated :
22 Apr, 2023
Sometimes, while working with Python strings, we can have problem in which we need to convert String list which have strings that are joined by deliminator to String Matrix by separation by deliminator. Lets discuss certain ways in which this task can be performed.
Method #1 : Using loop + split()
This is one of way in which this task can be performed. In this, we iterate for each string and perform the split using split().
Python3
test_list = [ 'gfg:is:best' , 'for:all' , 'geeks:and:CS' ]
print ( "The original list is : " + str (test_list))
res = []
for sub in test_list:
res.append(sub.split( ':' ))
print ( "The list after conversion : " + str (res))
|
Output :
The original list is : ['gfg:is:best', 'for:all', 'geeks:and:CS']
The list after conversion : [['gfg', 'is', 'best'], ['for', 'all'], ['geeks', 'and', 'CS']]
Method #2 : Using list comprehension + split()
Yet another way to perform this task, is just modification of above method. In this, we use list comprehension as shorthand and one-liner to perform this.
Python3
test_list = [ 'gfg:is:best' , 'for:all' , 'geeks:and:CS' ]
print ( "The original list is : " + str (test_list))
res = [sub.split( ':' ) for sub in test_list]
print ( "The list after conversion : " + str (res))
|
Output :
The original list is : ['gfg:is:best', 'for:all', 'geeks:and:CS']
The list after conversion : [['gfg', 'is', 'best'], ['for', 'all'], ['geeks', 'and', 'CS']]
The Time and Space Complexity for all the methods are the same:
Time Complexity: O(n)
Space Complexity: O(n)
Method 3: Using csv.reader and StringIO
This method uses the csv library’s reader method to split the string by a specified delimiter. This method can also handle different number of columns in each row of the matrix.
Python3
import csv
from io import StringIO
test_list = [ 'gfg:is:best' , 'for:all' , 'geeks:and:CS' ]
print ( "The original list is:" , test_list)
res = []
for sub in test_list:
s = StringIO(sub)
reader = csv.reader(s, delimiter = ':' )
res.append( list (reader)[ 0 ])
print ( "The list after conversion:" , res)
|
Output
The original list is: ['gfg:is:best', 'for:all', 'geeks:and:CS']
The list after conversion: [['gfg', 'is', 'best'], ['for', 'all'], ['geeks', 'and', 'CS']]
Time Complexity: O(n)
Auxiliary. Space: O(n)
Method 4: Using re.split()
Python3
import re
def delimited_list_to_matrix(string_list, delimiter):
matrix = []
for string in string_list:
split_string = re.split(delimiter, string)
matrix.append(split_string)
return matrix
test_list = [ 'gfg:is:best' , 'for:all' , 'geeks:and:CS' ]
print ( "The original list is : " + str (test_list))
delimiter = ':'
print (delimited_list_to_matrix(test_list, delimiter))
|
Output
The original list is : ['gfg:is:best', 'for:all', 'geeks:and:CS']
[['gfg', 'is', 'best'], ['for', 'all'], ['geeks', 'and', 'CS']]
Time Complexity: The time complexity of this code is O(n*m), where n is the number of strings in the input list and m is the average length of each string. This is because the code iterates over each string in the list, and for each string, it uses the re.split() function which takes O(m) time.
Auxiliary. Space:
The space complexity of this code is O(n*m), where n is the number of strings in the input list and m is the average length of each string. This is because the code creates a new list of substrings for each string in the input list, and the size of this list will be proportional to the size of the input list and the average length of each string.
Method 5: Using pandas
step-by-step approach
- Import the pandas library using the import pandas as pd statement. This will allow you to use the pandas library under the shorthand pd.
- Define a function called delimited_list_to_matrix that takes in two parameters: a list of strings called string_list, and a delimiter character called delimiter.
- Inside the function, create a pandas DataFrame from the string_list using the pd.DataFrame function. The columns parameter specifies the column name of the DataFrame.
- Split each string in the DataFrame into multiple columns using the specified delimiter character. This is done by calling the str.split method on the ‘string’ column of the DataFrame, with the expand=True parameter to create a new column for each split element.
- Convert the resulting DataFrame to a matrix using the to_numpy method. This returns a NumPy array of the DataFrame data.
- Return the resulting matrix.
- Call the delimited_list_to_matrix function with a test input consisting of a list of three strings and the ‘:’ delimiter.
- Print the resulting matrix.
Python3
import pandas as pd
def delimited_list_to_matrix(string_list, delimiter):
df = pd.DataFrame(string_list, columns = [ 'string' ])
df = df[ 'string' ]. str .split(delimiter, expand = True )
matrix = df.to_numpy()
return matrix
test_list = [ 'gfg:is:best' , 'for:all' , 'geeks:and:CS' ]
delimiter = ':'
print (delimited_list_to_matrix(test_list, delimiter))
|
OUTPUT :
[['gfg' 'is' 'best']
['for' 'all']
['geeks' 'and' 'CS']]
The time complexity O(n*m), where n is the length of the string list and m is the maximum number of substrings.
The space complexity : O(n*m).
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...