Skip to content
Related Articles

Related Articles

Python Pandas – get_dummies() method

Improve Article
Save Article
  • Difficulty Level : Basic
  • Last Updated : 13 Oct, 2020
Improve Article
Save Article

pandas.get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.

syntax:  pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Parameters:

  • data: whose data is to be manipulated.
  • prefix: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Default value is None.
  • prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’
  • dummy_na: It adds a column to indicate NaN values, default value is false, If false NaNs are ignored.
  • columns: Column names in the DataFrame that needs to be encoded. Default value is None, If columns is None then all the columns with object or category dtype will be converted.
  • sparse: It  specify whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). default value is False.
  • drop_first: Remove first level to get k-1 dummies out of k categorical levels.
  • dtype: Data type for new columns. Only a single dtype is allowed. Default value is np.uint8.

Returns: Dataframe (Dummy-coded data)

Example 1:

Python3




import pandas as pd
 
con = pd.Series(list('abcba'))
print(pd.get_dummies(con))

 
 Output:

Output 

 Example 2:

Python




import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li))

Output:

Nan column is not there as dummy_na is False by default

Example 3: (To get NaN column)

Python




import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li, dummy_na=True))

Output:

Example 4:

Python3




import pandas as pd
import numpy as np
 
 
# dictionary
diff = pd.DataFrame({'R': ['a', 'c', 'd'],
                     'T': ['d', 'a', 'c'],
                     'S_': [1, 2, 3]})
 
print(pd.get_dummies(diff, prefix=['column1', 'column2']))

Output:


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!