In this article, we will see how to select columns with specific data types from a dataframe. This operation can be performed using the DataFrame.select_dtypes() method in pandas module.
Syntax: DataFrame.select_dtypes(include=None, exclude=None)
Parameters :
include, exclude : A selection of dtypes or strings to be included/excluded. At least one of these parameters must be supplied.
Return : The subset of the frame including the dtypes in include and excluding the dtypes in exclude.
Step-by-step Approach:
- First, import modules then load the dataset.
Python3
# import required module import pandas as pd # assign dataset df = pd.read_csv( "train.csv" ) |
- Then we will find types of data present in our dataset using dataframe.info() method.
Python3
# display description # of the dataset df.info() |
Output:
- Now, we will use DataFrame.select_dtypes() to select a specific datatype.
Python3
# store columns with specific data type integer_columns = df.select_dtypes(include = [ 'int64' ]).columns float_columns = df.select_dtypes(include = [ 'float64' ]).columns object_columns = df.select_dtypes(include = [ 'object' ]).columns |
- Finally, display the column having a particular data type.
Python3
# display columns print ( '\nint64 columns:\n' , integer_columns) print ( '\nfloat64 columns:\n' , float_columns) print ( '\nobject columns:\n' , object_columns) |
Output:
Below is the complete program based on the above approach:
Python3
# import required module import pandas as pd # assign dataset df = pd.read_csv( "train.csv" ) # store columns with specific data type integer_columns = df.select_dtypes(include = [ 'int64' ]).columns float_columns = df.select_dtypes(include = [ 'float64' ]).columns object_columns = df.select_dtypes(include = [ 'object' ]).columns # display columns print ( '\nint64 columns:\n' ,integer_columns) print ( '\nfloat64 columns:\n' ,float_columns) print ( '\nobject columns:\n' ,object_columns) |
Output:
Example:
Here we are going to extract columns of the below dataset:
Python3
# import required module import pandas as pd from vega_datasets import data # assign dataset df = data.seattle_weather() # display dataset df.sample( 10 ) |
Output:
Now, we are going to display all the columns having float64 as the data type.
Python3
# import required module import pandas as pd from vega_datasets import data # assign dataset df = data.seattle_weather() # display description # of dataset df.info() # store columns with specific data type columns = df.select_dtypes(include = [ 'float64' ]).columns # display columns print ( '\nColumns:\n' , columns) |
Output:
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.