Open In App

Split a column in Pandas dataframe and get part of it

Improve
Improve
Like Article
Like
Save
Share
Report

When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement.

We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str.split, it can be used with split to get the desired part of the string. To get the nth part of the string, first split the column by delimiter and apply str[n-1] again on the object returned, i.e. Dataframe.columnName.str.split(" ").str[n-1].

Let’s make it clear by examples.

Code #1: Print a data object of the splitted column.




import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id'
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
  
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
  
print(df.Geek_ID.str.split('_').str[0])


Output:

0    Geek1
1    Geek2
2    Geek3
3    Geek4
4    Geek5
dtype: object

 
Code #2: Print a list of returned data object.




import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
  
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
  
print(df.Geek_ID.str.split('_').str[0].tolist())


Output:

['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5']

 
Code #3: Print a list of elements.




import pandas as pd
import numpy as np
  
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
  
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
  
print(df.Geek_ID.str.split('_').str[1].tolist())


Output:

['id', 'id', 'id', 'id', 'id']


Last Updated : 21 Jan, 2019
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads