Open In App

Get unique values from a column in Pandas DataFrame

Improve
Improve
Like Article
Like
Save
Share
Report

The unique() function removes all duplicate values on a column and returns a single value for multiple same values. In this article, we will discuss how we can get unique values from a column in Pandas DataFrame.

Creating a Pandas Dataframe with Duplicate Elements

Create a sample Pandas dataframe with a dictionary of lists, say columns names are A, B, C, D, and E with duplicate elements.

Python3




# Import pandas package
import pandas as pd
 
# create a dictionary with five fields each
data = {
    'A': ['A1', 'A2', 'A3', 'A4', 'A5'],
    'B': ['B1', 'B2', 'B3', 'B4', 'B4'],
    'C': ['C1', 'C2', 'C3', 'C3', 'C3'],
    'D': ['D1', 'D2', 'D2', 'D2', 'D2'],
    'E': ['E1', 'E1', 'E1', 'E1', 'E1']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)


Get unique values from a column in Pandas DataFrame

Below are some examples by which we can get the unique values of a column in this dataframe.

  • Get the Unique Values of ‘B’ Column
  • Get the Unique Values of ‘E’ Column
  • Get Number of Unique Values in a Column
  • Using set() to Eliminate Duplicate Values from a Column
  • Using pandas.concat() and Unique() Methods
  • Using Series.drop_duplicates()

Get the Unique Values of ‘B’ Column

In this example, we are retrieving and printing the unique values from the ‘B’ column using the unique() method. The resulting unique values are ['B1', 'B2', 'B3', 'B4'].

Python3




# Import pandas package
import pandas as pd
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Get the unique values of 'B' column
df.B.unique()


Output

array(['B1', 'B2', 'B3', 'B4'], dtype=object)

Get the Unique Values of Pandas in ‘E’ Column

In this example, we create a pandas DataFrame from a dictionary and then retrieves the unique values from the ‘E’ column using the unique() method. The resulting unique values are ['E1'].

Python3




# Import pandas package
import pandas as pd
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Get the unique values of 'E' column
df.E.unique()


Output

array(['E1'], dtype=object)

Get Number of Unique Values in a Column

In this example, we create a pandas DataFrame from a dictionary and then calculates and prints the number of unique values in the ‘C’ column, excluding NaN values. The result is 3, indicating there are three unique values in column ‘C’.

Python3




# Import pandas package
import pandas as pd
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Get number of unique values in column 'C'
df.C.nunique(dropna=True)


Output

3

Eliminate Duplicate Values from a Column using set()

In this example, we create a pandas DataFrame from a dictionary and then uses the set() function to extract unique values from column ‘C’, eliminating duplicates. The resulting set, {'C1', 'C2', 'C3'}, represents the unique values in column ‘C’.

Python3




# Import pandas package
import pandas as pd
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Use set() to eliminate duplicate values in column 'C'
unique_values_set = set(df['C'])
 
# Print the unique values
print(unique_values_set)


Output

{'C1', 'C2', 'C3'}

Using pandas.concat() and Unique() Methods

In this example, we create a pandas DataFrame from a dictionary and then concatenates unique values from all columns using pd.concat(). The resulting NumPy array, when printed, displays all unique values from columns ‘A’ to ‘E’.

Python3




# Import pandas package
import pandas as pd
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Use pd.concat() to concatenate all columns and then apply unique()
unique_values_all_columns = pd.concat([df[col].unique() for col in df.columns])
 
# Print the unique values
print(unique_values_all_columns)


Output

['A1' 'A2' 'A3' 'A4' 'A5' 'B1' 'B2' 'B3' 'B4' 'C1' 'C2' 'C3' 'D1' 'D2' 'E1']

Using Series.drop_duplicates()

In this example, we create a pandas DataFrame from a dictionary and removes duplicates from columns ‘A’ and ‘D’ using the drop_duplicates() method. The resulting DataFrame, when printed, displays the unique values in columns ‘A’ and ‘D’, with NaN values where duplicates were removed from ‘D’.

Python3




# Import pandas package
import pandas as pd
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Use drop_duplicates() to remove duplicates from columns 'A' and 'D'
df['A'] = df['A'].drop_duplicates()
df['D'] = df['D'].drop_duplicates()
 
# Print the DataFrame after removing duplicates from columns 'A' and 'D'
print(df)


Output

    A   B   C   D   E
0  A1  B1  C1  D1  E1
1  A2  B2  C2  D2  E1
2  A3  B3  C3 NaN  E1
3  A4  B4  C3 NaN  E1
4  A5  B4  C3 NaN  E1



Last Updated : 01 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads