Open In App

How To Break Up A Comma Separated String In Pandas Column

Last Updated : 06 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pandas library is a Python library which is used to perform data manipulation and analysis. It offers various 2D data structures and methods to work with tables. Some times, the entire data can be in the format of string, which needed to be broken down in-order to organize the information in the pandas data structures. In this article, let us understand how to break a comma separated string in a pandas column along with different possible approaches.

Break Up A Comma Separated String In Pandas Column

Using str.split()

let us understand the requirements for this approach:

Requirements:

  • Pandas library: In this approach, we import the Pandas library and utilize the `DataFrame()` method in order to create a 2D data structure or table.
  • str.split() method: This method is used to split the string of comma-separated values into individual strings based on a delimiter. The str.split() method accepts the delimiter as a parameter along with `expand=true`.

Python3




import pandas as pd
# Example DataFrame
data = {'Category': ['Fruits', 'Vegetables', 'Dairy'],
        'Contains': ['Apple,Orange,Banana', 'Carrot,Potato,Tomato,Cucumber', 'Milk,Cheese,Yogurt']}
df = pd.DataFrame(data)
 
# Split the 'Items_string' column by commas and create a new column 'Items_list'
df['Contains_list'] = df['Contains'].str.split(',')
 
# Display the DataFrame
print(df)


Output:

     Category                       Contains  \
0 Fruits Apple,Orange,Banana
1 Vegetables Carrot,Potato,Tomato,Cucumber
2 Dairy Milk,Cheese,Yogurt
Contains_list
0 [Apple, Orange, Banana]
1 [Carrot, Potato, Tomato, Cucumber]
2 [Milk, Cheese, Yogurt]

In the above example, we have imported the pandas library.

  • The script begins by importing the Pandas library as ‘pd’, enabling the utilization of Pandas functionalities.
  • A DataFrame named ‘df’ is instantiated using a dictionary ‘data’, containing ‘Category’ and ‘Contains’ columns.
  • The comma-separated strings in the ‘Contains’ column are split into lists using the str.split(',') method
  • A new column ‘Contains_list’ is appended to the DataFrame, storing the resultant lists from the string splitting process.
  • The DataFrame ‘df’ is printed, showcasing the original columns alongside the newly created ‘Contains_list’, aiding in data visualization and interpretation.

Using str.split() with the expand

We will again create a dataframe and use “expand=True” parameter.

Python3




import pandas as pd
 
# Example DataFrame
data = {'Category': ['Fruits', 'Vegetables', 'Dairy'],
        'Contains': ['Apple,Orange,Banana', 'Carrot,Potato,Tomato,Cucumber', 'Milk,Cheese,Yogurt']}
df = pd.DataFrame(data)
 
# Split the 'Contains' column by commas and expand it into separate columns
df[['Item1', 'Item2', 'Item3', 'Item4']] = df['Contains'].str.split(',', expand=True)
 
# Display the modified DataFrame
print(df)


Output:

     Category                       Contains   Item1   Item2   Item3     Item4
0 Fruits Apple,Orange,Banana Apple Orange Banana None
1 Vegetables Carrot,Potato,Tomato,Cucumber Carrot Potato Tomato Cucumber
2 Dairy Milk,Cheese,Yogurt Milk Cheese Yogurt None
  • The str.split(',', expand=True) method splits each element of the ‘Contains’ column by commas and expands the result into separate columns.
  • Since the maximum number of items after splitting is 4 (in the second row), we create 4 new columns (‘Item1’, ‘Item2’, ‘Item3’, ‘Item4’) to accommodate the split values.
  • The resulting DataFrame shows each item from the original comma-separated string in its respective column. Any missing values are filled with None


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads