Open In App

Pandas: Detect Mixed Data Types and Fix it

Last Updated : 06 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

The Python library commonly used for working with data sets and can help users in analyzing, exploring, and manipulating data is known as the Pandas library. When any column of the Pandas data frame doesn’t contain a single type of data, either numeric or string, but contains mixed type of data, both numeric as well as string, such column is called a mixed data type column.

What are mixed types in Pandas columns?

As you know, Pandas data frame can have multiple columns, thus when a certain column doesn’t have a specified kind of data, i.e., doesn’t have a certain data type, but contains mixed data, i.e., numeric as well as string values, then that column is tend to have mixed data type.

For example:

data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])

Here, the Age column contains string as well as the numeric type of data, the Age column has a mixed data type.

Causes of mixed data types

  • Missing Values (NaN)
  • Inconsistent Formatting
  • Data Entry Errors

Missing Values (NaN):

A floating-point value that represents undefined or unrepresentable data is known as NaN. The most common use case of NaN occurrence is the 0/0 case, which leads to mixed data types and ultimately leads to incorrect results.

Inconsistent Formatting:

The inconsistent formatting in the Pandas data frame is observed due to the cells with wrong format. Thus, it is crucial to transform each cell of column to a correct format.

Data Entry Errors:

There occurs various instances when the user makes a mistake while entering the data in a column in Pandas data frame. It can be any error, entering string data in numeric type column or leaving null value in the column or anything. Such errors can also lead to mixed data types and thus need to be fixed.

How to identify mixed types in Pandas columns

You might have used info() function to detect the data type of Pandas data frame, but using info() function is not possible in case of mixed data types. For detecting the mixed data types, you need to traverse each column of Pandas data frame, and get the data type using api.types.infer_dtypes() function.

Syntax:

for column in data_frame.columns:

print(pd.api.types.infer_dtype(data_frame[column]))

Here,

  • data_frame: It is the Pandas data frame for which you want to detect if it has mixed data types or not.

Example:

The data frame used in this example to detect mixed data type is as follows:

Python3




# Python program to detect mixed data types in Pandas data frame
 
# Import the library Pandas
import pandas as pd
   
# Create the pandas DataFrame
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
 
# Traverse data frame to detect mixed data types
for column in data_frame.columns:
    print(column,':',pd.api.types.infer_dtype(data_frame[column]))


Output:

Name : string
Age : mixed-integer

How to deal with mixed types in Pandas columns

For fixing the mixed data types in Pandas data frame, you need to convert entire column into one data type. This can be done using astype() function or to_numeric() function.

Using astype() function:

A crucial function in Pandas which is used to cast an object to a specified data type is known as astype() function. In this way, we will see how we can fix mixed data types using astype() function.

Syntax:

data_frame[column] = data_frame[column].astype(int)

Here,

  • data_frame: It is the Pandas data frame for which you want to fix mixed data types.
  • column: It defines all the columns of the Pandas data frame.
  • int: Here, int is the data type in which you want to transform type of each column of Pandas data frame. You can also use str, float, etc. here depending on which data type you want to transform.

Example:

The data frame used in this example to fix mixed data type is as follows:

Python3




# Python program to fix mixed data types using astype() in Pandas data frame
 
# Import the library Pandas
import pandas as pd
   
# Create the pandas DataFrame
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
 
# Transforming mixed data types to single data type
data_frame[column] = data_frame[column].astype(int)
 
# Traverse data frame to detect data types after fix
for column in data_frame.columns:
    print(column,':',pd.api.types.infer_dtype(data_frame[column]))


Output:

Name : string
Age : integer

Using to_numeric() function:

The to_numeric() function is used to convert an argument to a numeric data type. In this way, we will see how we can fix mixed data types using to_numeric() function.

Syntax:

data_frame[column] = data_frame[column].apply(lambda x: pd.to_numeric(x, errors = ‘ignore’))

Here,

  • data_frame: It is the Pandas data frame for which you want to fix mixed data types.
  • column: It defines all the columns of the Pandas data frame.

Example:

The data frame used in this example to fix mixed data type is as follows:

Python3




# Python program to fix mixed data types using to_numeric() in Pandas data frame
 
# Import the library Pandas
import pandas as pd
   
# Create the pandas DataFrame
data_frame = pd.DataFrame( [['tom', 10], ['nick', '15'], ['juli', 14.8]], columns=['Name', 'Age'])
 
# Transforming mixed data types to single data type
data_frame[column] = data_frame[column].apply(lambda x: pd.to_numeric(x, errors = 'ignore'))
 
# Traverse data frame to detect data types after fix
for column in data_frame.columns:
  print(pd.api.types.infer_dtype(data_frame[column]))


Output:

Name : string
Age : floating

Conclusion

Pandas columns with mixed types can cause problems when analyzing data, but they can be found and resolved using the techniques in this article. Data scientists and software developers can guarantee the accuracy and dependability of their analysis by properly cleaning and preparing the data.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads