Python Pandas – get_dummies() method

Last Updated : 13 Oct, 2020

pandas.get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.

syntax: pandas.get_dummies(data, prefix=None, prefix_sep=’_’, dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

Parameters:

data: whose data is to be manipulated.

prefix: String to append DataFrame column names. Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Default value is None.

prefix_sep: Separator/delimiter to use if appending any prefix. Default is ‘_’

dummy_na: It adds a column to indicate NaN values, default value is false, If false NaNs are ignored.

columns: Column names in the DataFrame that needs to be encoded. Default value is None, If columns is None then all the columns with object or category dtype will be converted.

sparse: It specify whether the dummy-encoded columns should be backed by a SparseArray (True) or a regular NumPy array (False). default value is False.

drop_first: Remove first level to get k-1 dummies out of k categorical levels.

dtype: Data type for new columns. Only a single dtype is allowed. Default value is np.uint8.

Returns: Dataframe (Dummy-coded data)

Example 1:

Python3

import pandas as pd
 
con = pd.Series(list('abcba'))
print(pd.get_dummies(con))

Output:

Output

Example 2:

Python

import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li))

Output:

Nan column is not there as dummy_na is False by default

Example 3: (To get NaN column)

Python

import pandas as pd
import numpy as np
 
 
# list
li = ['s', 'a', 't', np.nan]
print(pd.get_dummies(li, dummy_na=True))

Output:

Example 4:

Python3

import pandas as pd
import numpy as np
 
 
# dictionary
diff = pd.DataFrame({'R': ['a', 'c', 'd'], 
                     'T': ['d', 'a', 'c'],
                     'S_': [1, 2, 3]})
 
print(pd.get_dummies(diff, prefix=['column1', 'column2']))