Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

Guided Ordinal Encoding Techniques

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

There are specifically two types of guided encoding techniques for categorical features, namely – target guided ordinal encoding & mean guided ordinal encoding.

Tools and Technologies needed:

  1. Understanding of pandas library
  2. Basic knowledge of how a pandas Dataframe work.
  3. Jupyter Notebook or Google Collab or any similar platform.

What is encoding?

 Encoding is the technique we use to convert categorical entry in a dataset to a numerical data. Let say we have a dataset of employees in which there is a column that contains the information about the city location of an employee. Now we want to use this data to form a model which could predict the salary of an employee based upon his/her other details. Obviously, this model doesn’t understand anything about the city name. So how will you make the model know about it? For example, an employee who lives in a metropolitan city earns more than employees of a small city. Someway we need to make the model know about this . Yes, the way you are thinking in your mind is what we will do through code. As obvious we are thinking to rank the city based upon some spec . These ways of converting a categorical data to a numerical data are our target. 

What is target guided encoding technique?

In this technique we will take help of our target variable to encode the categorical data . lets understand by an example,

Employee IdCity Highest QualificationSalary
A100delhiPhd50000
A101delhibsc30000
A102mumbaimsc45000
B101punebsc25000
B102kolkataphd48000
C100punemsc30000
D103kolkatamsc44000

Lets try to encode the city column using the target guided encoding. Here our target variable is salary.

step 1: sort the cities based upon the corresponding salary. Now to do this we will take mean of all the salaries of that particular city.

step 2: Based upon the mean of the salary  the descending order of the city is :

                                                         kolkata>mumbai>delhi>pune

step3: Based upon this order we will rank the cities.

CityRank
kolkata4
mumbai3
delhi2
pune1

(note: you can rank them in the opposite order too)

step 4 : we will use this information to encode the City column of the dataset.

Employee Id CityHighest QualificationSalary
A1002phd50000
A1012bsc30000
A1023msc45000
B1011bsc25000
B1024phd48000
C1001msc30000
D1034msc44000

This is all what target guided encoding is! simple right? Lets now explore about mean guided encoding.

What is mean guided encoding technique?

We will encode the Highest qualification column using the mean guided encoding technique.

step 1: For each highest qualification we will find the mean of all the corresponding salary.

step 2 : Instead of ranking them based upon the mean value , we will encode this mean value corresponding to the respective highest qualification

Highest QualificationMean Salary
Phd49000
Msc39666.67
Bsc27500

step 3 : We will use this to encode the Highest Qualification column

Employee IdCityHighest QualificationSalary
A10024900050000
A10122750030000
A102339666.6745000
B10112750025000
B10244900048000
C100139666.6730000
D103439666.6744000

Hence we are ready with our dataset to prepare our model.

My Personal Notes arrow_drop_up
Last Updated : 27 Sep, 2021
Like Article
Save Article
Similar Reads
Related Tutorials