Skip to content
Related Articles

Related Articles

Concatenating CSV files using Pandas module
  • Last Updated : 02 Dec, 2020

Using pandas we can perform various operations on CSV files such as appending, updating, concatenating, etc. In this article, we are going to concatenate two CSV files using pandas module.

Suppose we have one .csv file named Employee.csv which contains some records and it is as below:

Employee.csv

There is another .csv file named Updated.csv which contains new records as well as few records from Employee.csv file but with updated information. The file is given below:

Updated.csv

We can see that the first five records in Updated.csv are new and the rest have updated information. For instance, the salaries of Louis and Diane are changed, email_id of Joe is different and so on.

The aim of this article is to add new records and update the information of existing records from Updated.csv file into Employee.csv



Note: No two employees can have same emp_id.

Approach: Whenever it comes down to manipulate data using python we make use of Dataframes. The below approach has been used. 

  • Read Employee.csv and create a dataframe, say, employee_df..
  • Similarly, read Updated.csv and from a dataframe, say, updated_df.
  • Concatenate updated_df to employee_df and remove duplicates using emp_id as primary key.
  • Create a new .csv file named Updated_Employees.csv containing all the old, new as well as updated records.

Example 1:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

#import pandas
import pandas as pd
  
# read Employee file
employee_df = pd.read_csv('Employee.csv')
  
# print employee records
print(employee_df)
  
# read Updated file
updated_df = pd.read_csv('Updated.csv')
  
# print updated records
print(updated_df)
  
# form new dataframe by combining both employee_df and updated_df
# concat method appends records of updated_df to employee_df
# drop_duplicates method drop rows having same emp_id keeping 
# only the latest insertions
# resets the index to 0
final_dataframe = pd.concat([employee_df, updated_df]).drop_duplicates(
    subset='emp_id', keep='last').reset_index(drop=True)
  
# print old,new and updates records
print(final_dataframe)
  
# export all records to a new csv file
final_dataframe.to_csv(
    'Updated_Employees.csv', index=False)

chevron_right


Output:

employee_df

updated_df

final_dataframe

Below is the image of Updated_Employee.csv has been provided.

Updated_Employees.csv

Example:



Below are the two CSV files which are going to be concatenated:

gfg3.csv

gfg2.csv

Now executing the below program to concatenate the above CSV files.

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

#import pandas
import pandas as pd
  
# read Employee file
df1 = pd.read_csv('gfg1.csv')
  
# print employee records
print('\ngfg1.csv:\n', df1)
  
# read Updated file
df2 = pd.read_csv('gfg2.csv')
  
# print updated records
print('\ngfg2.csv:\n', df2)
  
# form new dataframe by combining both employee_df
# and updated_df concat method appends records of
# updated_df to employee_df drop_duplicates method 
# drop rows having same emp_id keeping only the
# latest insertions resets the index to 0
final_df = pd.concat([df1, df2]).drop_duplicates(
    subset='ORGANIZATION').reset_index(drop=True)
  
# print old,new and updates records
print('\ngfg3.csv:\n', final_df)
  
# export all records to a new csv file
final_df.to_csv(
    'gfg3.csv', index=False)

chevron_right


Output:

Below is the image of gfg3.csv:

gfg3.csv


Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.

My Personal Notes arrow_drop_up
Recommended Articles
Page :