Skip to content
Related Articles

Related Articles

Improve Article

Python Pandas – get_dummies() method

  • Last Updated : 13 Oct, 2020
Geek Week

pandas.get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.

syntax:  pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Parameters:

  • data: whose data is to be manipulated.
  • prefix: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Default value is None.
  • prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’
  • dummy_na: It adds a column to indicate NaN values, default value is false, If false NaNs are ignored.
  • columns: Column names in the DataFrame that needs to be encoded. Default value is None, If columns is None then all the columns with object or category dtype will be converted.
  • sparse: It  specify whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). default value is False.
  • drop_first: Remove first level to get k-1 dummies out of k categorical levels.
  • dtype: Data type for new columns. Only a single dtype is allowed. Default value is np.uint8.

Returns: Dataframe (Dummy-coded data)

Example 1:



Python3




import pandas as pd
 
con = pd.Series(list('abcba'))
print(pd.get_dummies(con))

 
 Output:

Output 

 Example 2:

Python




import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li))

Output:

Nan column is not there as dummy_na is False by default

Example 3: (To get NaN column)

Python




import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li, dummy_na=True))

Output:



Example 4:

Python3




import pandas as pd
import numpy as np
 
 
# dictionary
diff = pd.DataFrame({'R': ['a', 'c', 'd'],
                     'T': ['d', 'a', 'c'],
                     'S_': [1, 2, 3]})
 
print(pd.get_dummies(diff, prefix=['column1', 'column2']))

Output:

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :