Skip to content
Related Articles

Related Articles

Improve Article

Python | Pandas Series.str.get_dummies()

  • Last Updated : 23 Aug, 2019

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas str.get_dummies() is used to separate each string in the caller series at the passed separator. A data frame is returned with all the possible values after splitting every string. If the text value in original data frame at same index contains the string (Column name/ Splited values) then the value at that position is 1 otherwise, 0.

Since this is a string operation, .str has to be prefixed every time before calling this function. Otherwise, it will throw an error.

Syntax: Series.str.get_dummies(sep=’|’)

sep: String value, separator to split strings at

Return type: Data frame with binary values only

To download the data set used in following examples, click here.

In the following examples, the data frame used contains data of some employees. The image of data frame before any operations is attached below.

Example #1: Separating different strings on whitespace.

In this example, string in the Team column have been split at ” ” (White-space) and the data frame is returned with all possible values after splitting. The value in returned data frame is 1 if the string(Column name) exists in the text value at same index in old data frame.

# importing pandas
import pandas as pd
# making data frame from csv at url
# making dataframe using get_dummies()
dummies = data["Team"].str.get_dummies(" ")
# display

As shown in the output image, it can be compared with the original image of data frame. If the string exists at that same index, then value is 1 otherwise 0.

Important points:

  • If string is not null, then at least one column will have value 1 at the same index.
  • If the value is null, then all columns will have 0 value at that index (Can be seen at 2nd element in above example)

    Example #2: Splitting at multiple points/Static value column

    In this example, a static value is taken for the new column (“Hello gfg family”). Then the get_dummies() method is applied and the string is separated at “g”. Since “g” is occurring more than once, there will be more than one column and also the values in all column must be same as the string is also same for all rows.

    # importing pandas
    import pandas as pd
    # making data frame from csv at url
    # string for new column
    string ="Hello gfg family"
    # creating new column
    data["New_column"]= string
    # creating dummies
    df = data["New_column"].str.get_dummies("g")
    # display

    As shown in output image, the new data frame has 3 columns and every row has same values.

     Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

    To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

    My Personal Notes arrow_drop_up
  • Recommended Articles
    Page :