Python | Pandas Series.str.partition()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
str.partition() works in a similar way like
str.split(). Instead of splitting the string at every occurrence of separator/delimiter, it splits the string only at the first occurrence. In the split function, the separator is not stored anywhere, only the text around it is stored in a new list/Dataframe. But in the
str.partition() method, the separator is also stored.
.str has to be prefixed every time before calling this method to differentiate it from the Python’s default function otherwise, it will throw an error.
Syntax: Series.str.partition(pat=’ ‘, expand=True)
pat: String value, separator or delimiter to separate string at. Default is ‘ ‘ (whitespace)
expand: Boolean value, returns a data frame with different value in different columns if True. Else it returns a series with list of strings. Default is True.
Return Type: Series of list or Data frame depending on expand Parameter
To download the CSV used in code, click here.
In the following examples, the data frame used contains data of some employees. The image of data frame before any operations is attached below.
Example #1: Splitting String into List
In this example, the Name column is splitted at the first occurrence of ‘, ‘. The expand parameter is kept False as to expand it into a list instead of Data Frame.
As shown in the output image, the Name column was splitted into list at first occurrence of ‘, ‘. As it can be seen, ‘, ‘ is also stored as an separate element of list.
Note: Do not get confused by two commas in the list, one is element and the other is element separator.
Example #2: Splitting String into Data frame
In this example, the First Name and Last name is separated from the Name column and stored into separate columns in the data frame.
As shown in the output image, the Name column was separated into a data frame with 3 columns(one of string before comma, and string after comma). After that data frame was used to create new columns in the same data frame. Old Name column was dropped using .drop() method.
New Data frame-
Data frame with Added columns-
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course