Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App

Related Articles

Label Encoding in Python

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

In machine learning projects, we usually deal with datasets having different categorical columns where some columns have their elements in the ordinal variable category for e.g a column income level having elements as low, medium, or high in this case we can replace these elements with 1,2,3. where 1 represents ‘low’  2  ‘medium’  and 3′ high’. Through this type of encoding, we try to preserve the meaning of the element where higher weights are assigned to the elements having higher priority.

Label Encoding 

Label Encoding is a technique that is used to convert categorical columns into numerical ones so that they can be fitted by machine learning models which only take numerical data. It is an important pre-processing step in a machine-learning project.

Example Of Label Encoding

Suppose we have a column Height in some dataset that has elements as Tall, Medium, and short. To convert this categorical column into a numerical column we will apply label encoding to this column. After applying label encoding, the Height column is converted into a numerical column having elements 0,1, and 2 where 0 is the label for tall, 1 is the label for medium, and 2 is the label for short height.


Example of Label Encoding

We will apply Label Encoding on the iris dataset on the target column which is Species. It contains three species Iris-setosa, Iris-versicolor, Iris-virginica


# Import libraries 
import numpy as np
import pandas as pd
# Import dataset
df = pd.read_csv('../../data/Iris.csv')


array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

After applying Label Encoding with LabelEncoder() our categorical value will replace with the numerical value[int].


# Import label encoder
from sklearn import preprocessing
# label_encoder object knows 
# how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'species'.
df['species']= label_encoder.fit_transform(df['species'])


array([0, 1, 2], dtype=int64)

Limitation of label Encoding 

Label encoding converts the categorical data into numerical ones, but it assigns a unique number(starting from 0) to each class of data. This may lead to the generation of priority issues during model training of data sets. A label with a high value may be considered to have high priority than a label having a lower value.

Example For Limitation of Label Encoding 

An attribute having output classes Mexico, Paris, Dubai. On Label Encoding, this column lets Mexico is replaced with 0, Paris is replaced with 1, and Dubai is replaced with 2. 

With this, it can be interpreted that Dubai has high priority than Mexico and Paris while training the model, But actually, there is no such priority relation between these cities here.

My Personal Notes arrow_drop_up
Last Updated : 18 Apr, 2023
Like Article
Save Article
Similar Reads
Related Tutorials