Open In App

String manipulations in Pandas DataFrame

String manipulation is the process of changing, parsing, splicing, pasting, or analyzing strings. As we know that sometimes, data in the string is not suitable for manipulating the analysis or get a description of the data. But Python is known for its ability to manipulate strings. So, by extending it here we will get to know how Pandas provides us the ways to manipulate to modify and process string data-frame using some builtin functions. Pandas library have some of the builtin functions which is often used to String Data-Frame Manipulations.

Create a String Dataframe using Pandas

First of all, we will know ways to create a string dataframe using Pandas.






# Importing the necessary libraries
import pandas as pd
import numpy as np
 
# df stands for dataframe
df = pd.Series(['Gulshan', 'Shashank', 'Bablu',
                'Abhishek', 'Anand', np.nan, 'Pratap'])
 
print(df)

Output:

 

Change Column Datatype in Pandas

Let’s change the type of the created dataframe to string type. There can be various methods to do the same. Let’s have a look at them in the below examples. 






# we can change the dtype after
# creation of dataframe
print(df.astype('string'))

Output:

 

Example 1: Creating the dataframe as dtype = ‘string’: 




# now creating the dataframe as dtype = 'string'
import pandas as pd
import numpy as np
 
df = pd.Series(['Gulshan', 'Shashank', 'Bablu', 'Abhishek',
                'Anand', np.nan, 'Pratap'], dtype='string')
 
print(df)

Output:

 

Example 2: Creating the dataframe as dtype = pd.StringDtype(): 




# now creating the dataframe as dtype = pd.StringDtype()
import pandas as pd
import numpy as np
 
df = pd.Series(['Gulshan', 'Shashank', 'Bablu', 'Abhishek',
                'Anand', np.nan, 'Pratap'], dtype=pd.StringDtype())
 
print(df)

Output:

 

String Manipulations in Pandas

Now, we see the string manipulations inside a Pandas Dataframe, so first, create a Dataframe and manipulate all string operations on this single data frame below, so that everyone can get to know about it easily.

Example:




# python script for create a dataframe
# for string manipulations
import pandas as pd
import numpy as np
 
df = pd.Series(['night_fury1', 'Is  ', 'Geeks, forgeeks',
                '100', np.nan, '  Contributor '])
df

Output:

 

Let’s have a look at various methods provided by this library for string manipulations.




# lower()
print(df.str.lower())

0        night_fury1
1                 is 
2    geeks, forgeeks
3                100
4                NaN
5        contributor 

dtype: object




#upper()
print(df.str.upper())

Output:

 




# strip()
print(df)
print('\nAfter using the strip:')
print(df.str.strip())

Output:

 




# split(pattern)
print(df)
print('\nAfter using the strip:')
print(df.str.split(','))
 
# now we can use [] or get() to fetch
# the index values
print('\nusing []:')
print(df.str.split(',').str[0])
 
print('\nusing get():')
print(df.str.split(',').str.get(1))

Output:

 

 




# len()
print("length of the dataframe: ", len(df))
print("length of each value of dataframe:")
print(df.str.len())

Output:

 




# cat(sep=pattern)
print(df)
 
print("\nafter using cat:")
print(df.str.cat(sep='_'))
 
print("\nworking with NaN using cat:")
print(df.str.cat(sep='_', na_rep='#'))

Output:

 




# get_dummies()
print(df.str.get_dummies())

Output:

 




# startswith(pattern)
print(df.str.startswith('G'))

Output:

 




# endswith(pattern)
print(df.str.endswith('1'))

Output:

 




# replace(a,b)
print(df)
print("\nAfter using replace:")
print(df.str.replace('Geeks', 'Gulshan'))

Output:

 




# repeat(value)
print(df.str.repeat(2))

Output:

 




# count(pattern)
print(df.str.count('n'))

Output:

 




# find(pattern)
# in result '-1' indicates there is no
# value matching with given pattern in
# particular row
print(df.str.find('n'))

Output:

 




# findall(pattern)
# in result [] indicates null list as
# there is no value matching with given
# pattern in particular row
print(df.str.findall('n'))

Output:

 




# islower()
print(df.str.islower())

Output:

 




# isupper()
print(df.str.isupper())

Output:

 




# isnumeric()
print(df.str.isnumeric())

Output:

 




# swapcase()
print(df.str.swapcase())

Output:

 


Article Tags :