Open In App

How to Convert to Best Data Types Automatically in Pandas?

Prerequisite: Pandas

In pandas datatype by default are int, float and objects. When we load or create any series or dataframe in pandas, pandas by default assigns the necessary datatype to columns and series. 



We will use pandas convert_dtypes() function to convert the default assigned data-types to the best datatype automatically. There is one big benefit of using convert_dtypes()- it supports new type for missing values pd.NA along with NaN. It is supported in pandas 1.1.4 version.

Syntax:



For Series:

series_name.convert_dtypes()

For DataFrame:

dataframe_name.convert_dtypes().dtypes

The following is the implementation for both series and data frame:

Converting the datatype of a series:

Example:




# importing packages
import pandas as pd
  
# creating a series
s = pd.Series(['Geeks', 'for', 'Geeks'])
  
# printing the series
print("SERIES")
print(s)
  
print()
  
# using convert_dtypes() function
print("AFTER DATATYPE CONVERSION")
print(s.convert_dtypes())

Output:

Converting the datatype of a dataframe:

The data type of columns are changed accordingly. But the datatype of dataframe will remain object because it contains multiple columns with each column has a different datatype.

Example:




import pandas as pd
import numpy as np
  
# creating a dataframe
df = pd.DataFrame({"Roll_No.": ([1, 2, 3]),
                   "Name": ["Raj", "Ritu", "Rohan"],
                   "Result": ["Pass", "Fail", np.nan],
                   "Promoted": [True, False, np.nan],
                   "Marks": [90.33, 30.6, np.nan]})
  
# printing the dataframe
print("PRINTING DATAFRAME")
display(df)
  
# checking datatype
print()
print("PRINTING DATATYPE")
print(df.dtypes)
  
# converting datatype
print()
print("AFTER CONVERTING DATATYPE")
print(df.convert_dtypes().dtypes)

Output:

Creating the Data frame through series and specifying datatype :

Example:




import pandas as pd
import numpy as np
  
# Creating the Data frame through series
# and specifying datatype along with it
df = pd.DataFrame({"Column_1": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
                   # Column_1 datatype is int32
                     
                   "Column_2": pd.Series(["Apple", "Ball", "Cat"], 
                                         dtype=np.dtype("object")),
                   # Column_2 datatype is 0
                     
                   "Column_3": pd.Series([True, False, np.nan], 
                                         dtype=np.dtype("object")),
                   # Column_3 datatype is 0
                     
                   "Column_4": pd.Series([10, np.nan, 20], 
                                         dtype=np.dtype("float")),
                   # Column_4 datatype is float
                     
                   "Column_5": pd.Series([np.nan, 100.5, 200],
                                         dtype=np.dtype("float"))})
                   # Column_5 datatype is float
  
# printing dataframe
print("PRINTING DATAFRAME")
display(df)
  
# checking datatype
print()
print("CHECKING DATATYPE")
print(df.dtypes)
  
# convert datatype
print()
print("AFTER DATATYPE CONVERSION")
print(df.convert_dtypes().dtypes)

Output:


Article Tags :