Python | Pandas Series.str.get_dummies()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas str.get_dummies() is used to separate each string in the caller series at the passed separator. A data frame is returned with all the possible values after splitting every string. If the text value in original data frame at same index contains the string (Column name/ Splited values) then the value at that position is 1 otherwise, 0.

Since this is a string operation, .str has to be prefixed every time before calling this function. Otherwise, it will throw an error.



Syntax: Series.str.get_dummies(sep=’|’)

Parameters:
sep: String value, separator to split strings at

Return type: Data frame with binary values only

To download the data set used in following examples, click here.

In the following examples, the data frame used contains data of some employees. The image of data frame before any operations is attached below.

 
Example #1: Separating different strings on whitespace.

In this example, string in the Team column have been split at ” ” (White-space) and the data frame is returned with all possible values after splitting. The value in returned data frame is 1 if the string(Column name) exists in the text value at same index in old data frame.

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing pandas
import pandas as pd
  
# making data frame from csv at url
  
# making dataframe using get_dummies()
dummies = data["Team"].str.get_dummies(" ")
  
# display
dummies.head(10)

chevron_right


Output:
As shown in the output image, it can be compared with the original image of data frame. If the string exists at that same index, then value is 1 otherwise 0.

 
Important points:

  • If string is not null, then at least one column will have value 1 at the same index.
  • If the value is null, then all columns will have 0 value at that index (Can be seen at 2nd element in above example)
  •  
    Example #2: Splitting at multiple points/Static value column

    In this example, a static value is taken for the new column (“Hello gfg family”). Then the get_dummies() method is applied and the string is separated at “g”. Since “g” is occurring more than once, there will be more than one column and also the values in all column must be same as the string is also same for all rows.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # importing pandas
    import pandas as pd
      
    # making data frame from csv at url
      
    # string for new column
    string ="Hello gfg family"
      
    # creating new column
    data["New_column"]= string
      
    # creating dummies
    df = data["New_column"].str.get_dummies("g")
      
    # display
    df.head(10)

    chevron_right

    
    

    Output:
    As shown in output image, the new data frame has 3 columns and every row has same values.



    My Personal Notes arrow_drop_up

    Developer in day, Designer at night GSoC 2019 with Python Software Foundation (EOS Design system)

    If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

    Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



    Improved By : Akanksha_Rai