Open In App

Adding new column to existing DataFrame in Pandas

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

Adding new columns to an existing DataFrame is a fundamental task in data analysis using Pandas. It allows you to enrich your data with additional information and facilitate further analysis and manipulation. This article will explore various methods for adding new columns, including simple assignment, the insert() method, the assign() method. Let’s discuss adding new columns to Pandas’s existing DataFrame.

What is Pandas DataFrame?

A Pandas DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It’s a fundamental data structure in the Python data science ecosystem and provides a powerful way to work with tabular data.

Here are some key features of a Pandas DataFrame:

  • Data representation: Stores data in a table format with rows and columns.
  • Heterogeneous data types: Can hold different data types in different columns (e.g., integers, floats, strings, booleans).
  • Labeling: Each row and column has a label (index and column names).
  • Mutable: Allows data manipulation and modification.
  • Powerful operations: Provides various functions and methods for data analysis, manipulation, and exploration.
  • Extensible: Can be customized and extended with additional functionalities through libraries and user-defined functions.

Adding a new Column to Existing DataFrame in Pandas in Python

There are multiple ways to add a new Column to an Existing DataFrame in Pandas in Python:

Creating a Sample Dataframe

Here we are creating a Sample Dataframe:

Python3




import pandas as pd
 
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
 
df = pd.DataFrame(data)
print(df)


Output:

 Name  Height Qualification
0     Jai     5.1           Msc
1  Princi     6.2            MA
2  Gaurav     5.1           Msc
3    Anuj     5.2           Msc

Note that the length of your list should match the length of the index column otherwise it will show an error. 

Add a New Column to an Existing Datframe using DataFrame.insert()

It gives the freedom to add a column at any position we like and not just at the end. It also provides different options for inserting the column values.

Python3




import pandas as pd
 
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Using DataFrame.insert() to add a column
df.insert(2, "Age", [21, 23, 24, 21], True)
 
# Observe the result
print(df)


Output: 

Name  Height  Age Qualification
0     Jai     5.1   21           Msc
1  Princi     6.2   23            MA
2  Gaurav     5.1   24           Msc
3    Anuj     5.2   21           Msc

Adding Columns to Pandas DataFrame using Dataframe.assign()

This method will create a new dataframe with a new column added to the old dataframe.

Python3




import pandas as pd
 
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
 
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Using 'Address' as the column name and equating it to the list
df2 = df.assign(address=['Delhi', 'Bangalore', 'Chennai', 'Patna'])
 
print(df2)


Output: 

 Name  Height Qualification    address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna

Pandas Add Column to DataFrame using a Dictionary

We can use a Python dictionary to add a new column in pandas DataFrame. Use an existing column as the key values and their respective values will be the values for a new column.

Python3




# Import pandas package
import pandas as pd
 
# Define a dictionary containing Students data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
 
# Define a dictionary with key values of
# an existing column and their respective
# value pairs as the # values for our new column.
address = {'Delhi': 'Jai', 'Bangalore': 'Princi',
           'Patna': 'Gaurav', 'Chennai': 'Anuj'}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Provide 'Address' as the column name
df['Address'] = address
 
# Observe the output
print(df)


Output: 

Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna

Adding a New Column to a Pandas DataFrame using List

In this example, Pandas add new columns from list “Address” to an existing Pandas DataFrame using a dictionary and a list.

Python3




# Declare a list that is to be converted into a column
address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
 
# Using 'Address' as the column name
# and equating it to the list
df['Address'] = address
 
print(df)


Output: 

Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna

Add A New Column To An Existing Pandas DataFrame using Dataframe.loc()

In this example, It creates a Pandas DataFrame named df with columns “Name”, “Height”, and “Qualification” and adds a new column “Address” using the loc attribute.

Python3




import pandas as pd
 
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
       'Height': [5.1, 6.2, 5.1, 5.2],
       'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
 
df = pd.DataFrame(data)
 
# Create the list of new column values
address = ["Delhi", "Bangalore", "Chennai", "Patna"]
 
# Add the new column using loc
df.loc[:, "Address"] = address
 
print(df)


Output:

Name  Height Qualification    Address
0     Jai     5.1           Msc      Delhi
1  Princi     6.2            MA  Bangalore
2  Gaurav     5.1           Msc    Chennai
3    Anuj     5.2           Msc      Patna

Adding More than One columns in Existing Dataframe 

In this example, it expands an existing Pandas DataFrame df with two new columns, “Age” and “State”, using their respective data lists.

Python3




import pandas as pd
 
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Height': [5.1, 6.2, 5.1, 5.2],
        'Qualification': ['Msc', 'MA', 'Msc', 'Msc'],
        'Address': ['Delhi', 'Bangalore', 'Chennai', 'Patna']}
 
df = pd.DataFrame(data)
 
# Define new data for additional columns
age = [22, 25, 23, 24]
state = ['NCT', 'Karnataka', 'Tamil Nadu', 'Bihar']
 
# Add multiple columns using dictionary assignment
new_data = {'Age': age, 'State': state }
df = df.assign(**new_data)
 
print(df)


Output:

    Name  Height Qualification    Address  Age       State
0     Jai     5.1           Msc      Delhi   22         NCT
1  Princi     6.2            MA  Bangalore   25   Karnataka
2  Gaurav     5.1           Msc    Chennai   23  Tamil Nadu
3    Anuj     5.2           Msc      Patna   24       Bihar

Conclusion

Understanding how to add new columns to DataFrames is essential for data exploration and manipulation in Pandas. Choosing the appropriate method depends on the specific context and desired outcome. By mastering these techniques, you can effectively manipulate, analyze, and gain valuable insights from your data.



Last Updated : 21 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads