How to Count Distinct Values of a Pandas Dataframe Column?

Let’s see How to Count Distinct Values of a Pandas Dataframe Column?

Consider a tabular structure as given below which has to be created as Dataframe. The columns are height, weight and age. The records of 8 students form the rows. 

  height weight age
Steve 165    63.5    20
Ria 165     64   22
Nivi 164    63.5 22
Jane 158     54 21
Kate 167    63.5 23
Lucy 160    62 22
Ram 158     64 20
Niki 165 64 21

First step is to create the Dataframe for the above tabulation. Look at the code snippet below.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi'
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# show the Dataframe
df

chevron_right


Output:



Dataframe

Method 1: Using for loop.

The Dataframe has been created and one can hard coded using for loop and count the number of unique values in a specific column. For example In the above table, if one wishes to count the number of unique values in the column height. The idea is to use a variable cnt for storing the count and a list visited that has the previously visited values. Then for loop that iterates through the ‘height’ column and for each value, it checks whether the same value has already been visited in the visited list. If the value was not visited previously, then the count is incremented by 1.

Below is the implementation:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi'
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# variable to hold the count
cnt = 0
  
# list to hold visited values
visited = []
  
# loop for counting the unique
# values in height
for i in range(0, len(df['height'])):
    
    if df['height'][i] not in visited: 
        
        visited.append(df['height'][i])
          
        cnt += 1
  
print("No.of.unique values :",
      cnt)
  
print("unique values :",
      visited)

chevron_right


Output :

No.of.unique values : 5
unique values : [165, 164, 158, 167, 160]

But this method is not so efficient when the Dataframe grows in size and contains thousands of rows and columns. To give an efficient there are three methods available which are listed below:

  • pandas.unique()
  • Dataframe.nunique()
  • Series.value_counts()

Method 2: Using unique().



The unique method takes a 1-D array or Series as an input and returns a list of unique items in it. The return value is a NumPy array and the contents in it based on the input passed. If indices are supplied as input, then the return value will also be the indices of the unique value. 

Syntax: pandas.unique(Series)

Example:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi'
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# counting unique values
n = len(pd.unique(df['height']))
  
print("No.of.unique values :"
      n)

chevron_right


Output:

No.of.unique values : 5

Method 3: Using Dataframe.nunique().

This method returns the count of unique values in the specified axis. The syntax is :

Syntax: Dataframe.nunique (axis=0/1, dropna=True/False)

Example:

Python3



filter_none

edit
close

play_arrow

link
brightness_4
code

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi'
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# check the values of 
# each row for each column
n = df.nunique(axis=0)
  
print("No.of.unique values in each column :\n",
      n)

chevron_right


Output:

No.of.unique values in each column :
height    5
weight    4
age       4
dtype: int64

To get the number of unique values in a specified column:

 Syntax: Dataframe.col_name.nunique()

Example:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi'
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
# count no. of unique 
# values in height column
n = df.height.nunique()
  
print("No.of.unique values in height column :",
      n)

chevron_right


Output:

No.of.unique values in height column : 5

Method 3: Using Series.value_counts().

This method returns the count of all unique values in the specified column. 

Syntax: Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Example:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# import library
import pandas as pd
  
# create a Dataframe
df = pd.DataFrame({ 
  'height' : [165, 165, 164
              158, 167, 160,
              158, 165],
    
  'weight' : [63.5, 64, 63.5,
              54, 63.5, 62,
              64, 64],
    
  'age' : [20, 22, 22
           21, 23, 22,
           20, 21]},
    
   index = ['Steve', 'Ria', 'Nivi'
            'Jane', 'Kate', 'Lucy',
            'Ram', 'Niki'])
  
  
# getting the list of unique values
li = list(df.height.value_counts())
  
# print the unique value counts
print("No.of.unique values :",
      len(li))

chevron_right


Output:

No.of.unique values : 5

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

I am Akshaya E, currently a student at NIT, Trichy I have keen interest in sharing what I know to people around me I like to explain things with easy and real-time examples I am even writing a blog where I teach python from scratch

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.