How to convert Categorical features to Numerical Features in Python?
Last Updated :
26 Jan, 2022
It’s difficult to create machine learning models that can’t have features that have categorical values, such models cannot function. categorical variables have string-type values. thus we have to convert string values to numbers. This can be accomplished by creating new features based on the categories and setting values to them. In this article, we are going to see how to convert Categorical features to Numerical Features in Python
Stepwise Implementation
Step 1: Import the necessary packages and modules
Python3
import numpy as np
import pandas as pd
from sklearn import preprocessing
|
Step 2: Import the CSV file
We will use the pandas read_csv() method to import the CSV file. To view and download the CSV file used click here.
Python3
df = pd.read_csv( 'cluster_mpg.csv' )
print (df.head())
|
Output:
Step 3: Get all features with categorical values
We use df.info() to find categorical features. Categorical features have Dtype as “object”.
Output:
In the given database columns “origin” and “name” is object type.
Step 4: Convert string values of origin column to numerical values
We will fit the “origin” column using preprocessing.LabelEncoder().fit() method.
Python3
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df[ "origin" ])
|
Step 5: Get the unique values out of the categorical features
We will use label_encoder.classes_ attribute for this purpose.
classes_:ndarray of shape (n_classes,)
Holds the label for each class.
Python3
print ( list (label_encoder.classes_))
print ()
|
Output
['europe', 'japan', 'usa']
Step 6: Transforming the categorical values
Python3
print (label_encoder.transform(df[ "origin" ]))
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...