Open In App

Python | Pandas DataFrame.set_index()

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss Pandas DataFrame.set_index() in Python. Python stands out as an excellent language for data analysis, largely due to its outstanding ecosystem of data-centric packages. Among these packages, Pandas plays a significant role in simplifying importing and analyzing data.

Pandas DataFrame.set_index() Syntax

Syntax:  DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)

Parameters: 

  • keys: Column name or a list of column names.
  • drop: A Boolean value that, if True, drops the column used for the index.
  • append: If True, appends the specified column to the existing index column.
  • inplace: If True, the changes are made in the DataFrame itself.
  • verify_integrity: If True, checks the new index column for duplicates. 

What is Pandas DataFrame.set_index() ?

The Pandas Dataframe.set_index() method is used to assign a list, series, or another data frame as the index of a given data frame. It is particularly useful when combining multiple data frames, allowing for easy modification of the index. While an index column can be specified during data frame creation, set_index() provides a flexible way to change the index later on. In essence, it allows you to set a List, Series, or Data frame as the index of a Data Frame. But sometimes a data frame is made out of two or more data frames and hence later index can be changed using this method.

To download the CSV file used, Click Here.

Pandas DataFrame.set_index() Examples

Below are proper illustrations of the examples for Pandas DataFrame.set_index().

  • Pandas Set Index to Column
  • Multiple index Column 
  • Setting a single Float column as Index
  • Setting three columns as MultiIndex
  • Pandas Set Index of Specific Column

Pandas Set Index to Column

In this example, we are Changing Index column, First Name column has been made the index column of Data Frame. 

Python3




# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("employees.csv")
 
# setting first name as index column
data.set_index("First Name", inplace = True)
 
# display
data.head()


Output: Before operation –

After operation – 

As shown in the output images, earlier the index column was a series of number but later it has been replaced with First name.

Pandas Set Index to Multiple index Column 

In this example, two columns will be made as index column. Drop parameter is used to Drop the column and append parameter is used to append passed columns to the already existing index column. 

Python3




# importing pandas package
import pandas as pd
 
# making data frame from csv file
data = pd.read_csv("employees.csv")
 
# setting first name as index column
data.set_index(["First Name", "Gender"], inplace = True,
                            append = True, drop = False)
 
# display
data.head()


Output: 

As shown in the output Image, the data is having 3 index columns.  

Pandas Dataframe Index Setting a single Float column as Index

In this example the below code snippet uses the Pandas library to create a DataFrame named ‘df’ from a nested list of student data. It sets the ‘Agg_Marks’ column as the index and displays the resulting DataFrame with columns ‘Name’, ‘Age’, ‘City’, and ‘Country’.

Python3




# importing pandas library
import pandas as pd
 
# creating and initializing a nested list
students = [['jack', 34, 'Sydeny', 'Australia',85.96],
            ['Riti', 30, 'Delhi', 'India',95.20],
            ['Vansh', 31, 'Delhi', 'India',85.25],
            ['Nanyu', 32, 'Tokyo', 'Japan',74.21],
            ['Maychan', 16, 'New York', 'US',99.63],
            ['Mike', 17, 'las vegas', 'US',47.28]]
 
# Create a DataFrame object
df = pd.DataFrame(students,
                      columns=['Name', 'Age', 'City', 'Country','Agg_Marks'],
                           index=['a', 'b', 'c', 'd', 'e', 'f'])
 
# here we set Float column 'Agg_Marks' as index of data frame
# using dataframe.set_index() function
df = df.set_index('Agg_Marks')
 
 
# Displaying the Data frame
df


Output :

            Name  Age        City    Country
Agg_Marks                                   
85.96       jack   34      Sydeny  Australia
95.20       Riti   30       Delhi      India
85.25      Vansh   31       Delhi      India
74.21      Nanyu   32       Tokyo      Japan
99.63    Maychan   16    New York         US
47.28       Mike   17  las vegas         US

In the above example, we set the column ‘Agg_Marks‘ as an index of the data frame.

Pandas Dataframe Index Setting three columns as MultiIndex

In this example the below code utilizes Pandas to create a DataFrame ‘df’ from student data, with columns ‘Name’, ‘Age’, ‘City’, ‘Country’, ‘Agg_Marks’, and ‘ID’. It employs the `set_index()` function to establish a multi-level index using ‘Name’, ‘City’, and ‘ID’, and then displays the resulting DataFrame.

Python3




# importing pandas library
import pandas as pd
 
# creating and initializing a nested list
students = [['jack', 34, 'Sydeny', 'Australia',85.96,400],
            ['Riti', 30, 'Delhi', 'India',95.20,750],
            ['Vansh', 31, 'Delhi', 'India',85.25,101],
            ['Nanyu', 32, 'Tokyo', 'Japan',74.21,900],
            ['Maychan', 16, 'New York', 'US',99.63,420],
            ['Mike', 17, 'las vegas', 'US',47.28,555]]
 
# Create a DataFrame object
df = pd.DataFrame(students,
                      columns=['Name', 'Age', 'City', 'Country','Agg_Marks','ID'],
                           index=['a', 'b', 'c', 'd', 'e', 'f'])
 
# Here we pass list of 3 columns i.e 'Name', 'City' and 'ID'
# to dataframe.set_index() function
# to set them as multiIndex of dataframe
df = df.set_index(['Name','City','ID'])
 
 
# Displaying the Data frame
df


Output :

                    Age    Country  Agg_Marks
Name    City       ID                       
jack    Sydeny     400  34  Australia      85.96
Riti    Delhi      750  30      India      95.20
Vansh   Delhi      101  31      India      85.25
Nanyu   Tokyo      900  32      Japan      74.21
Maychan New York   420  16         US      99.63
Mike    las vegas  555  17         US      47.28

In the above example, we set the columns ‘Name‘, ‘City‘, and ‘ID‘ as multiIndex of the data frame.

Pandas Set Index of Specific Column

In this example the below code demonstrates how to use Pandas to create a DataFrame, set a specific column (‘Name’ in this case) as the index using the set_index() method, and then displays both the original and modified DataFrames. The inplace=True parameter ensures that the changes are applied directly to the DataFrame without the need for reassignment.

Python3




import pandas as pd
 
# Creating a sample DataFrame
data = {'Name': ['Geek1', 'Geek2', 'Geek3'],
        'Age': [25, 30, 35],
        'City': ['New York', 'San Francisco', 'Los Angeles']}
 
df = pd.DataFrame(data)
 
# Displaying the original DataFrame
print("Original DataFrame:")
print(df)
 
# Using set_index() to set 'Name' column as the index
df.set_index('Name', inplace=True)
 
# Displaying the DataFrame after setting the index
print("\nDataFrame after set_index:")
print(df)


Output :

Original DataFrame:
   Name  Age           City
0  Geek1   25       New York
1  Geek2   30  San Francisco
2  Geek3   35    Los Angeles
DataFrame after set_index:
      Age           City
Name                    
Geek1   25       New York
Geek2   30  San Francisco
Geek3   35    Los Angeles



Last Updated : 03 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads