Split a column in Pandas dataframe and get part of it

When a part of any column in Dataframe is important and the need is to take it separate, we can split a column on the basis of the requirement.

We can use Pandas .str accessor, it does fast vectorized string operations for Series and Dataframes and returns a string object. Pandas str accessor has number of useful methods and one of them is str.split, it can be used with split to get the desired part of the string. To get the nth part of the string, first split the column by delimiter and apply str[n-1] again on the object returned, i.e. Dataframe.columnName.str.split(" ").str[n-1].

Let’s make it clear by examples.

Code #1: Print a data object of the splitted column.

filter_none

edit
close

play_arrow

link
brightness_4
code

import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id'
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
  
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
  
print(df.Geek_ID.str.split('_').str[0])

chevron_right


Output:

0    Geek1
1    Geek2
2    Geek3
3    Geek4
4    Geek5
dtype: object

 
Code #2: Print a list of returned data object.

filter_none

edit
close

play_arrow

link
brightness_4
code

import pandas as pd
import numpy as np
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
  
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
  
print(df.Geek_ID.str.split('_').str[0].tolist())

chevron_right


Output:

['Geek1', 'Geek2', 'Geek3', 'Geek4', 'Geek5']

 
Code #3: Print a list of elements.

filter_none

edit
close

play_arrow

link
brightness_4
code

import pandas as pd
import numpy as np
  
df = pd.DataFrame({'Geek_ID':['Geek1_id', 'Geek2_id', 'Geek3_id',
                                         'Geek4_id', 'Geek5_id'],
                'Geek_A': [1, 1, 3, 2, 4],
                'Geek_B': [1, 2, 3, 4, 6],
                'Geek_R': np.random.randn(5)})
  
# Geek_A  Geek_B   Geek_ID    Geek_R
# 0       1       1  Geek1_id    random number
# 1       1       2  Geek2_id    random number
# 2       3       3  Geek3_id    random number
# 3       2       4  Geek4_id    random number
# 4       4       6  Geek5_id    random number
  
print(df.Geek_ID.str.split('_').str[1].tolist())

chevron_right


Output:

['id', 'id', 'id', 'id', 'id']


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.