Open In App

Python Pandas – DataFrame.copy() function

Last Updated : 26 Nov, 2020
Improve
Improve
Like Article
Like
Save
Share
Report

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

There are many ways to copy DataFrame in pandas. The first way is a simple way of assigning a dataframe object to a variable, but this has some drawbacks.

Syntax: DataFrame.copy(deep=True)

When deep=True (default), a new object will be created with a copy of the calling object’s data and indices. Modifications to the data or indices of the copy will not be reflected in the original object (see notes below).

When deep=False, a new object will be created without copying the calling object’s data or index (only references to the data and index are copied). Any changes to the data of the original will be reflected in the shallow copy (and vice versa).

Step 1) Let us first make a dummy data frame, which we will use for our illustration

Step 2) Assign that dataframe object to a variable

Step 3) Make changes in the original dataframe to see if there is any difference in copied variable

Python3




import pandas as pd
  
#Create Series
s = pd.Series([3,4,5],['earth','mars','jupiter'])
k = pd.Series([1,2,3],['earth','mars','jupiter'])
  
#Create DataFrame df from two series
df = pd.DataFrame({'mass':s,'diameter':k})
  
df


Output:

Dummy DataFrame df

Now, let’s assign the dataframe df to a variable and perform changes:

Python3




#Assign df to variable_copy
variable_copy = df
  
print(variable_copy) 
#Update the value of mass of earth in original dataframe
df['mass']['earth']=8
  
print(variable_copy)


Output:

Here, we can see that if we change the values in the original dataframe, then the data in the copied variable also changes. To overcome this, we use DataFrame.copy() 

Let us see this, with examples when deep=True(default ):

Python3




res = df.copy(deep=True)
print(res)


Output:



Previous Article
Next Article

Similar Reads

copy in Python (Deep Copy and Shallow Copy)
In Python, Assignment statements do not copy objects, they create bindings between a target and an object. When we use the = operator, It only creates a new variable that shares the reference of the original object. In order to create "real copies" or "clones" of these objects, we can use the copy module in Python. Syntax of Python DeepcopySyntax:
5 min read
Difference Between Shallow copy VS Deep copy in Pandas Dataframes
The pandas library has mainly two data structures DataFrames and Series. These data structures are internally represented with index arrays, which label the data, and data arrays, which contain the actual data. Now, when we try to copy these data structures (DataFrames and Series) we essentially copy the object's indices and data and there are two
4 min read
Shallow copy vs Deep copy in Pandas Series
The pandas library has mainly two data structures DataFrames and Series. These data structures are internally represented with index arrays, which label the data, and data arrays, which contain the actual data. Now, when we try to copy these data structures (DataFrames and Series) we essentially copy the object's indices and data and there are two
4 min read
Python | Pandas DataFrame.fillna() to replace Null values in dataframe
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Just like the pandas dropna() method manages and rem
3 min read
Difference Between Spark DataFrame and Pandas DataFrame
Dataframe represents a table of data with rows and columns, Dataframe concepts never change in any Programming language, however, Spark Dataframe and Pandas Dataframe are quite different. In this article, we are going to see the difference between Spark dataframe and Pandas Dataframe. Pandas DataFrame Pandas is an open-source Python library based o
3 min read
Pandas Dataframe.to_numpy() - Convert dataframe to Numpy array
Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). This data structure can be converted to NumPy ndarray with the help of the DataFrame.to_numpy() method. In this article we will see how to convert dataframe to numpy array. Syntax of Pandas DataFrame.to_numpy()
3 min read
Convert given Pandas series into a dataframe with its index as another column on the dataframe
First of all, let we understand that what are pandas series. Pandas Series are the type of array data structure. It is one dimensional data structure. It is capable of holding data of any type such as string, integer, float etc. A Series can be created using Series constructor. Syntax: pandas.Series(data, index, dtype, copy) Return: Series object.
1 min read
How to Convert Wide Dataframe to Tidy Dataframe with Pandas stack()?
We might sometimes need a tidy/long-form of data for data analysis. So, in python's library Pandas there are a few ways to reshape a dataframe which is in wide form into a dataframe in long/tidy form. Here, we will discuss converting data from a wide form into a long-form using the pandas function stack(). stack() mainly stacks the specified index
4 min read
Replace values of a DataFrame with the value of another DataFrame in Pandas
In this article, we will learn how we can replace values of a DataFrame with the value of another DataFrame using pandas. It can be done using the DataFrame.replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc. from a DataFrame, Values of the DataFrame method are get replaced with another value dynamically.
4 min read
Converting Pandas Dataframe To Dask Dataframe
In this article, we will delve into the process of converting a Pandas DataFrame to a Dask DataFrame in Python through several straightforward methods. This conversion is particularly crucial when dealing with large datasets, as Dask provides parallel and distributed computing capabilities, allowing for efficient handling of substantial data volume
3 min read