Open In App

How to Split Explode Pandas DataFrame String Entry to Separate Rows

Last Updated : 24 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Sometimes when working with data, one may encounter a situation where the string entries present in the data frame need to be split into different rows. This can be a challenging task especially when the data is large and complex. Still, a Python library known as pandas provides various functions using which this task can be accomplished easily and efficiently. So in this article, we’ll be looking into how to convert dataframe entries in string format to separate rows using methods available in pandas i.e. split and explode.

What is a Pandas Data frame?

A data frame in pandas is a two-dimensional tabular data structure with labelled axes known as rows and columns. Some of the properties of a data frame are:-

  • It can be heterogeneous means columns can have any data type.
  • It can be modified into n number of rows and columns (more rows/columns can be added)
  • Every data frame has axes defined i.e. rows and columns

PandasPandas is an open-source Python library used widely for performing operations on labelled/tabular data or time series in the data science field. It is fast and efficient for manipulating and analyzing data which helps in increasing productivity while performing data-related tasks.

The Problem Sometimes we have a data frame where the data is in the form of a string entry in one or more columns and we want to separate that data into rows.

What is Splitting and Exploding String Entries?

Splitting and exploding string entries are common operations performed on data frames. These operations are useful when the data present is in the form of strings separated by delimiters such as spaces, commas, etc.

Splitting

The process of splitting a single string or entry into several pieces according to a specified delimiter or pattern is known as splitting. This is frequently used when there are several values in a single entry that are separated by a common character or sequence, like a space, semicolon, or comma. Splitting is performed to divide a string entry into multiple parts based on delimiters present.

Python




import pandas as pd
 
data = {'Names': ['Alice,Bob,Charlie', 'David,Eve', 'Frank']}
df = pd.DataFrame(data)
print(df)


Output:

    Names
0 Alice,Bob,Charlie
1 David,Eve
2 Frank

After splitting

Python




df[['Name1', 'Name2', 'Name3']] = df['Names'].str.split(',', expand=True)
print(df)


Output:

               Names  Name1 Name2    Name3
0 Alice,Bob,Charlie Alice Bob Charlie
1 David,Eve David Eve None
2 Frank Frank None None

Exploding

Creating distinct rows out of a column that has lists or multiple values is known as “exploding.” The explode function in pandas is used to accomplish this. It takes a column containing lists or arrays and, while keeping the values in the other columns, generates a new row for each element in those lists. Exploding is performed to separate that divided string into different rows.

Python




import pandas as pd
 
# Create a list of names for each row
d = {'Names': [['Alice', 'Bob', 'Charlie'], ['David', 'Eve'], ['Frank']]}
df = pd.DataFrame(d)
print(df)


Output:

                Names
0 [Alice, Bob, Charlie]
1 [David, Eve]
2 [Frank]

After using explode method

Python




result = df.explode('Names')
print(result)


Output:

     Names
0 Alice
0 Bob
0 Charlie
1 David
1 Eve
2 Frank

Split Explode Pandas DataFrame String Entry to Separate Rows

Pandas provide us with an explode method which can be used to separate data into rows. Below is the implementation of the explode function along with split that will help in separating the comma separated data into different rows.

Step 1: import the Pandas Library

Firstly, we’ll start with importing all the libraries we’ll be needing for this. Only one library will be required i.e. pandas.

Python




#importing libraries
import pandas as pd


Step 2: Create a dataframe

Then we’ll be needing a dataframe for this implementation. For demo purposes, we’ll create a dummy dataframe using pd.dataframe() function in pandas.

Python




#making a dummy dataframe
df = pd.DataFrame({'id': [1, 2, 3], 'data': ['x, y', 'z, w', 'a']})
#printing dataframe
print(df)


Output:

   id  data
0 1 x, y
1 2 z, w
2 3 a

Step 3: Split the String Entry into a List

Then, we’ll use the str.split() function for splitting the string which are separated by ‘, ‘. If the string entries present are separated by spaces, then we can split by using the delimiter as ” “.

Python




#split the data column on ', '
df['data'] = df['data'].str.split(', ')
#print dataframe
print(df)


Output:

   id    data
0 1 [x, y]
1 2 [z, w]
2 3 [a]

Step 4: Explode the List into Separate Rows

Then the pd.explode() function will separate the values into different rows and we’ll get our desired final dataframe (shown below).

Python




#using the explode method to split the dataframe into rows
df = df.explode('data')
#print final dataframe
print(df)


Output:

   id data
0 1 x
0 1 y
1 2 z
1 2 w
2 3 a

Conclusion

In conclusion, using the split and explode functions in the pandas library, the initial raw data in the form of string entries can be converted to separate rows for easy data manipualtion.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads