pandas.factorize() method helps to get the numeric representation of an array by identifying distinct values. This method is available as both pandas.factorize()
and Series.factorize()
.
Parameters:
values : 1D sequence.
sort : [bool, Default is False] Sort uniques and shuffle labels.
na_sentinel : [ int, default -1] Missing Values to mark ‘not found’.Return: Numeric representation of array
Code: Explaining the working of factorize() method
# importing libraries import numpy as np
import pandas as pd
from pandas.api.types import CategoricalDtype
labels, uniques = pd.factorize([ 'b' , 'd' , 'd' , 'c' , 'a' , 'c' , 'a' , 'b' ])
print ( "Numeric Representation : \n" , labels)
print ( "Unique Values : \n" , uniques)
|
# sorting the numerics label1, unique1 = pd.factorize([ 'b' , 'd' , 'd' , 'c' , 'a' , 'c' , 'a' , 'b' ],
sort = True )
print ( "\n\nNumeric Representation : \n" , label1)
print ( "Unique Values : \n" , unique1)
|
# Missing values indicated label2, unique2 = pd.factorize([ 'b' , None , 'd' , 'c' , None , 'a' , ],
na_sentinel = - 101 )
print ( "\n\nNumeric Representation : \n" , label2)
print ( "Unique Values : \n" , unique2)
|
# When factorizing pandas object; unique will differ a = pd.Categorical([ 'a' , 'a' , 'c' ], categories = [ 'a' , 'b' , 'c' ])
label3, unique3 = pd.factorize(a)
print ( "\n\nNumeric Representation : \n" , label3)
print ( "Unique Values : \n" , unique3)
|