In Pandas, there are various ways by which we can count distinct value of a Pandas Dataframe. Let’s see How to Count Distinct Values of a Pandas Dataframe Column.
Creating the Pandas Dataframe for a Reference
Consider a tabular structure as given below which has to be created as Dataframe. The columns are height, weight, and age. The records of 8 students form the rows.
165 |
63.5 |
20 |
165 |
64 |
22 |
164 |
63.5 |
22 |
158 |
54 |
21 |
167 |
63.5 |
23 |
160 |
62 |
22 |
158 |
64 |
20 |
165 |
64 |
21 |
The first step is to create the Dataframe for the above tabulation. Look at the code snippet below:
Python3
import pandas as pd
df = pd.DataFrame({
'height' : [ 165 , 165 , 164 ,
158 , 167 , 160 ,
158 , 165 ],
'weight' : [ 63.5 , 64 , 63.5 ,
54 , 63.5 , 62 ,
64 , 64 ],
'age' : [ 20 , 22 , 22 ,
21 , 23 , 22 ,
20 , 21 ]},
index = [ 'Steve' , 'Ria' , 'Nivi' ,
'Jane' , 'Kate' , 'Lucy' ,
'Ram' , 'Niki' ])
print (df)
|
Output
height weight age
Steve 165 63.5 20
Ria 165 64.0 22
Nivi 164 63.5 22
Jane 158 54.0 21
Kate 167 63.5 23
Lucy 160 62.0 22
Ram 158 64.0 20
Niki 165 64.0 21
Count Distinct Values of a Pandas Dataframe Column
Below are the ways by which we can count distinct values of a Pandas Dataframe column:
- Using pandas.unique()
- Using Dataframe.nunique()
- Using Series.value_counts()
- Using a loop
Count Distinct Values of a Column Using unique()
In this example, we are using the pandas library to create a DataFrame named df
with columns ‘height,’ ‘weight,’ and ‘age.’ It then calculates the number of unique values in the ‘height’ column using the pd.unique()
function and obtains the count using the len()
function, printing the result. The output indicates the number of unique height values in the DataFrame.
Python3
import pandas as pd
df = pd.DataFrame({
'height' : [ 165 , 165 , 164 ,
158 , 167 , 160 ,
158 , 165 ],
'weight' : [ 63.5 , 64 , 63.5 ,
54 , 63.5 , 62 ,
64 , 64 ],
'age' : [ 20 , 22 , 22 ,
21 , 23 , 22 ,
20 , 21 ]},
index = [ 'Steve' , 'Ria' , 'Nivi' ,
'Jane' , 'Kate' , 'Lucy' ,
'Ram' , 'Niki' ])
n = len (pd.unique(df[ 'height' ]))
print ( "No.of.unique values :" ,
n)
|
Output
No.of.unique values : 5
Pandas Count Distinct Values Using Dataframe.nunique()
In this example we are using the pandas library to create a DataFrame named df
with columns ‘height,’ ‘weight,’ and ‘age.’ It then calculates the number of unique values in each column using the nunique()
function with axis=0
and prints the result, showing the count of distinct values for each column.
Python3
import pandas as pd
df = pd.DataFrame({
'height' : [ 165 , 165 , 164 ,
158 , 167 , 160 ,
158 , 165 ],
'weight' : [ 63.5 , 64 , 63.5 ,
54 , 63.5 , 62 ,
64 , 64 ],
'age' : [ 20 , 22 , 22 ,
21 , 23 , 22 ,
20 , 21 ]},
index = [ 'Steve' , 'Ria' , 'Nivi' ,
'Jane' , 'Kate' , 'Lucy' ,
'Ram' , 'Niki' ])
n = df.nunique(axis = 0 )
print ( "No.of.unique values in each column :\n" ,
n)
|
Output
No.of.unique values in each column :
height 5
weight 4
age 4
dtype: int64
In this example we are using the pandas library to create a DataFrame named df
with columns ‘height,’ ‘weight,’ and ‘age.’ It then calculates and prints the number of unique values in the ‘height’ column using the nunique()
function specific to that column.
Python3
import pandas as pd
df = pd.DataFrame({
'height' : [ 165 , 165 , 164 ,
158 , 167 , 160 ,
158 , 165 ],
'weight' : [ 63.5 , 64 , 63.5 ,
54 , 63.5 , 62 ,
64 , 64 ],
'age' : [ 20 , 22 , 22 ,
21 , 23 , 22 ,
20 , 21 ]},
index = [ 'Steve' , 'Ria' , 'Nivi' ,
'Jane' , 'Kate' , 'Lucy' ,
'Ram' , 'Niki' ])
n = df.height.nunique()
print ( "No.of.unique values in height column :" ,
n)
|
Output
No.of.unique values in height column : 5
Count Distinct Values of Column Using Series.value_counts()
In this example we are using the pandas library to create a DataFrame named df
with columns ‘height,’ ‘weight,’ and ‘age.’ It then obtains a list of unique value counts in the ‘height’ column using value_counts()
and calculates the number of unique values by finding the length of the list.
Python3
import pandas as pd
df = pd.DataFrame({
'height' : [ 165 , 165 , 164 ,
158 , 167 , 160 ,
158 , 165 ],
'weight' : [ 63.5 , 64 , 63.5 ,
54 , 63.5 , 62 ,
64 , 64 ],
'age' : [ 20 , 22 , 22 ,
21 , 23 , 22 ,
20 , 21 ]},
index = [ 'Steve' , 'Ria' , 'Nivi' ,
'Jane' , 'Kate' , 'Lucy' ,
'Ram' , 'Niki' ])
li = list (df.height.value_counts())
print ( "No.of.unique values :" ,
len (li))
|
Output
No.of.unique values : 5
Pandas Count Unique Values Using for loop
The Dataframe has been created and one can hard coded using for loop and count the number of unique values in a specific column. For example In the above table, if one wishes to count the number of unique values in the column height. The idea is to use a variable cnt for storing the count and a list visited that has the previously visited values. Then for loop that iterates through the ‘height’ column and for each value, it checks whether the same value has already been visited in the visited list. If the value was not visited previously, then the count is incremented by 1.
Python3
import pandas as pd
df = pd.DataFrame({
'height' : [ 165 , 165 , 164 ,
158 , 167 , 160 ,
158 , 165 ],
'weight' : [ 63.5 , 64 , 63.5 ,
54 , 63.5 , 62 ,
64 , 64 ],
'age' : [ 20 , 22 , 22 ,
21 , 23 , 22 ,
20 , 21 ]},
index = [ 'Steve' , 'Ria' , 'Nivi' ,
'Jane' , 'Kate' , 'Lucy' ,
'Ram' , 'Niki' ])
cnt = 0
visited = []
for i in range ( 0 , len (df[ 'height' ])):
if df[ 'height' ][i] not in visited:
visited.append(df[ 'height' ][i])
cnt + = 1
print ( "No.of.unique values :" ,
cnt)
print ( "unique values :" ,
visited)
|
Output
No.of.unique values : 5unique values : [165, 164, 158, 167, 160]
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
01 Dec, 2023
Like Article
Save Article