Skip to content
Related Articles
Open in App
Not now

Related Articles

Python | Pandas.factorize()

Improve Article
Save Article
Like Article
  • Last Updated : 27 Sep, 2018
Improve Article
Save Article
Like Article

pandas.factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas.factorize() and Series.factorize().

values : 1D sequence.
sort : [bool, Default is False] Sort uniques and shuffle labels.
na_sentinel : [ int, default -1] Missing Values to mark ‘not found’.

Return: Numeric representation of array

Code: Explaining the working of factorize() method

# importing libraries
import numpy as np
import pandas as pd
from pandas.api.types import CategoricalDtype
labels, uniques = pd.factorize(['b', 'd', 'd', 'c', 'a', 'c', 'a', 'b'])
print("Numeric Representation : \n", labels)
print("Unique Values : \n", uniques)

# sorting the numerics
label1, unique1 = pd.factorize(['b', 'd', 'd', 'c', 'a', 'c', 'a', 'b'], 
                                                           sort = True)
print("\n\nNumeric Representation : \n", label1)
print("Unique Values : \n", unique1)

# Missing values indicated
label2, unique2 = pd.factorize(['b', None, 'd', 'c', None, 'a', ], 
                                              na_sentinel = -101)
print("\n\nNumeric Representation : \n", label2)
print("Unique Values : \n", unique2)

# When factorizing pandas object; unique will differ 
a = pd.Categorical(['a', 'a', 'c'], categories =['a', 'b', 'c'])
label3, unique3 = pd.factorize(a)
print("\n\nNumeric Representation : \n", label3)
print("Unique Values : \n", unique3)

My Personal Notes arrow_drop_up
Like Article
Save Article
Related Articles

Start Your Coding Journey Now!