Concatenate strings from several rows using Pandas groupby
Pandas Dataframe.groupby() method is used to split the data into groups based on some criteria. The abstract definition of grouping is to provide a mapping of labels to the group name.
To concatenate string from several rows using Dataframe.groupby(), perform the following steps:
- Group the data using Dataframe.groupby() method whose attributes you need to concatenate.
- Concatenate the string by using the join function and transform the value of that column using lambda statement.
We will use the CSV file having 2 columns, the content of the file is shown in the below image:
Example 1: We will concatenate the data in the branch column having the same name.
Python3
# import pandas library import pandas as pd # read csv file df = pd.read_csv( "Book2.csv" ) # concatenate the string df[ 'branch' ] = df.groupby([ 'Name' ])[ 'branch' ].transform( lambda x : ' ' .join(x)) # drop duplicate data df = df.drop_duplicates() # show the dataframe print (df) |
Output:
Example 2: We can perform Pandas groupby on multiple columns as well.
We will use the CSV file having 3 columns, the content of the file is shown in the below image:
Apply groupby on Name and year column
Python3
# import pandas library import pandas as pd # read a csv file df = pd.read_csv( "Book1.csv" ) # concatenate the string df[ 'branch' ] = df.groupby([ 'Name' , 'year' ])[ 'branch' ].transform( lambda x: ' ' .join(x)) # drop duplicate data df = df.drop_duplicates() # show the dataframe df |
Output:
Please Login to comment...