Open In App

Unnest (Explode) Multiple List Columns In A Pandas Dataframe

Last Updated : 20 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

An open-source manipulation tool that is used for handling data is known as Pandas. Have you ever encountered a dataset that has columns with data as a list? In such cases, there is a necessity to split that column into various columns, as Pandas cannot handle such data. In this article, we will discuss the same, i.e., unnest or explode multiple list columns into a Pandas data frame.

Unnest (Explode) Multiple List Columns In A Pandas Dataframe

What are Pandas?

Pandas is an open-source data manipulation and analysis tool built on top of the Python programming language. It provides powerful data structures, such as DataFrame and Series, that allow users to easily manipulate and analyze data.

What are nested list columns?

Nested list columns are columns in a DataFrame where each cell contains a list of values, rather than a single scalar value. This occurs when the data is structured hierarchically, with each cell representing a collection of related sub-values.

Why to unnest multiple list columns?

Decoupling multiple list columns in a data frame can be useful for several reasons:

  • Data simplification: Unnesting converts complex nested data into a simpler tabular form, making it easier to understand and manipulate.Improved analysis: Nested data can be better analyzed with Panda and other data analysis tools. This allows data to be more easily combined, filtered and processed.
  • Improved visualization: Nested data can be visualized more effectively, allowing better understanding to be conveyed through charts, graphs, and charts.
  • Compatibility: Nested data is often needed for certain types of analysis, such as machine learning modeling, which typically requires tabular data as input.
  • Data integration: Decoupling can facilitate the integration of data from different sources or systems by aligning the data structure with a more standard table format.
  • Normalization: Content separation can be a step towards data normalization that can improve data quality and reduce redundancy..

Efficient ways to unnest multiple list columns in a Pandas dataframe:

  • Using the explode function
  • Using pandas.series.explode function
  • Using pandas.series with lambda function

Using the explode function

The way of flattening nested Series objects and DataFrame columns by splitting their content into multiple rows is known as the explode function. In this method, we will see how we can unnest multiple list columns using the explode function.

Syntax:

df=df.explode([‘Favourite Ice-cream’, ‘Favourite Soft-Drink’]).reset_index(drop=True)

Here,

  • column-1, column-2: These are the columns that you want to unnest.
  • df: It is the data frame that has those nested columns.

Implementations:

In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using the explode function.

Python3




# Import the Pandas library
import pandas as pd
 
# Create a data frame that has nested columns
df = pd.DataFrame({'Name': ['Arun', 'Aniket', 'Ishita', 'Raghav', 'Vinayak'],
                   'Favourite Ice-cream': [['Strawberry', 'Choco-chips'],
                                           ['Vanilla', 'Black Currant'],
                                           ['Butterscotch', 'Chocolate'],
                                           ['Mango', 'Choco-chips'],
                                           ['Kulfi', 'Kaju-Kishmish']],
                   'Favourite Soft-Drink': [['Coca Cola', 'Lemonade'],
                                            ['Thumbs Up', 'Sprite'],
                                            ['Moutain Dew', 'Fanta'],
                                            ['Mirinda', 'Maaza'],
                                            ['7Up', 'Sprite']]})
 
# Print the actual data frame
print('Actual dataframe:\n', df)
 
# Unnest the nested columns
df = df.explode(['Favourite Ice-cream', 'Favourite Soft-Drink']
                ).reset_index(drop=True)
 
# Print the unnested data frame
print('\nDataframe after unnesting:\n', df)


Output:

Actual dataframe:
       Name        Favourite Ice-cream   Favourite Soft-Drink
0     Arun  [Strawberry, Choco-chips]  [Coca Cola, Lemonade]
1   Aniket   [Vanilla, Black Currant]    [Thumbs Up, Sprite]
2   Ishita  [Butterscotch, Chocolate]   [Moutain Dew, Fanta]
3   Raghav       [Mango, Choco-chips]       [Mirinda, Maaza]
4  Vinayak     [Kulfi, Kaju-Kishmish]          [7Up, Sprite]
Dataframe after unnesting:
       Name Favourite Ice-cream Favourite Soft-Drink
0     Arun          Strawberry            Coca Cola
1     Arun         Choco-chips             Lemonade
2   Aniket             Vanilla            Thumbs Up
3   Aniket       Black Currant               Sprite
4   Ishita        Butterscotch          Moutain Dew
5   Ishita           Chocolate                Fanta
6   Raghav               Mango              Mirinda
7   Raghav         Choco-chips                Maaza
8  Vinayak               Kulfi                  7Up
9  Vinayak       Kaju-Kishmish               Sprite

Using pandas.series.explode function

The function that splits a series object containing list-like values into multiple rows, one for each element in the list is known as pandas.series.explode function. In this method, we will see how we can unnest multiple list columns using the pandas.series.explode function.

Syntax:

df=df.set_index([‘column-3’]).apply(pd.Series.explode).reset_index()

Here,

  • column-3: It is the column that is already unnested.
  • df: It is the data frame that has those nested columns.

Implementations:

In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using pandas.series.explode function.

Python3




# Import the Pandas library
import pandas as pd
 
# Create a data frame that has nested columns
df = pd.DataFrame({'Name': ['Arun','Aniket','Ishita', 'Raghav','Vinayak'],
                   'Favourite Ice-cream':[['Strawberry', 'Choco-chips'],
                                          ['Vanilla', 'Black Currant'],
                                          ['Butterscotch', 'Chocolate'],
                                          ['Mango', 'Choco-chips'],
                                          ['Kulfi', 'Kaju-Kishmish']],
                   'Favourite Soft-Drink':[['Coca Cola', 'Lemonade'],
                                           ['Thumbs Up', 'Sprite'],
                                           ['Moutain Dew', 'Fanta'],
                                           ['Mirinda', 'Maaza'],
                                           ['7Up', 'Sprite']]})
 
# Print the actual data frame
print ('Actual dataframe:\n',df)
 
# Unnest the nested columns
df=df.set_index(['Name']).apply(pd.Series.explode).reset_index()
 
# Print the unnested data frame
print ('\nDataframe after unnesting:\n',df)


Output:

Actual dataframe:
       Name        Favourite Ice-cream   Favourite Soft-Drink
0     Arun  [Strawberry, Choco-chips]  [Coca Cola, Lemonade]
1   Aniket   [Vanilla, Black Currant]    [Thumbs Up, Sprite]
2   Ishita  [Butterscotch, Chocolate]   [Moutain Dew, Fanta]
3   Raghav       [Mango, Choco-chips]       [Mirinda, Maaza]
4  Vinayak     [Kulfi, Kaju-Kishmish]          [7Up, Sprite]
Dataframe after unnesting:
       Name Favourite Ice-cream Favourite Soft-Drink
0     Arun          Strawberry            Coca Cola
1     Arun         Choco-chips             Lemonade
2   Aniket             Vanilla            Thumbs Up
3   Aniket       Black Currant               Sprite
4   Ishita        Butterscotch          Moutain Dew
5   Ishita           Chocolate                Fanta
6   Raghav               Mango              Mirinda
7   Raghav         Choco-chips                Maaza
8  Vinayak               Kulfi                  7Up
9  Vinayak       Kaju-Kishmish               Sprite

Using pandas.series with lambda function

An anonymous function that can take any number of arguments, but can only have one expression is known as lambda function. In this method, we will see how we can unnest multiple list columns using the pandas.series with lambda function.

Syntax:

df=df.set_index(‘Name’).apply(lambda x: x.apply(pd.Series).stack()).reset_index().drop(‘level_1’, 1)

Here,

  • column-3: It is the column that is already unnested.
  • df: It is the data frame that has those nested columns.

Implementations:

In this example, we have created a dataset, which has three columns, Name, Favourite Ice-cream and Favourite Soft-Drink, out of which Favourite Ice-cream and Favourite Soft-Drink columns are nested. We have unnested those columns using pandas.series with lambda function.

Python3




# Import the Pandas library
import pandas as pd
 
# Create a data frame that has nested columns
df = pd.DataFrame({'Name': ['Arun','Aniket','Ishita', 'Raghav','Vinayak'],
                   'Favourite Ice-cream':[['Strawberry', 'Choco-chips'],
                                          ['Vanilla', 'Black Currant'],
                                          ['Butterscotch', 'Chocolate'],
                                          ['Mango', 'Choco-chips'],
                                          ['Kulfi', 'Kaju-Kishmish']],
                   'Favourite Soft-Drink':[['Coca Cola', 'Lemonade'],
                                           ['Thumbs Up', 'Sprite'],
                                           ['Moutain Dew', 'Fanta'],
                                           ['Mirinda', 'Maaza'],
                                           ['7Up', 'Sprite']]})
 
# Print the actual data frame
print ('Actual dataframe:\n',df)
 
# Unnest the nested columns
df=df.set_index('Name').apply(
    lambda x: x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)
 
# Print the unnested data frame
print ('\nDataframe after unnesting:\n',df)


Output:

Actual dataframe:
       Name        Favourite Ice-cream   Favourite Soft-Drink
0     Arun  [Strawberry, Choco-chips]  [Coca Cola, Lemonade]
1   Aniket   [Vanilla, Black Currant]    [Thumbs Up, Sprite]
2   Ishita  [Butterscotch, Chocolate]   [Moutain Dew, Fanta]
3   Raghav       [Mango, Choco-chips]       [Mirinda, Maaza]
4  Vinayak     [Kulfi, Kaju-Kishmish]          [7Up, Sprite]
Dataframe after unnesting:
       Name Favourite Ice-cream Favourite Soft-Drink
0     Arun          Strawberry            Coca Cola
1     Arun         Choco-chips             Lemonade
2   Aniket             Vanilla            Thumbs Up
3   Aniket       Black Currant               Sprite
4   Ishita        Butterscotch          Moutain Dew
5   Ishita           Chocolate                Fanta
6   Raghav               Mango              Mirinda
7   Raghav         Choco-chips                Maaza
8  Vinayak               Kulfi                  7Up
9  Vinayak       Kaju-Kishmish               Sprite


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads