# Guided Ordinal Encoding Techniques

There are specifically two types of guided encoding techniques for categorical features, namely – target guided ordinal encoding & mean guided ordinal encoding.

**Tools and Technologies needed:**

- Understanding of pandas library
- Basic knowledge of how a pandas Dataframe work.
- Jupyter Notebook or Google Collab or any similar platform.

**What is encoding?**

Encoding is the technique we use to convert categorical entry in a dataset to a numerical data. Let say we have a dataset of employees in which there is a column that contains the information about the city location of an employee. Now we want to use this data to form a model which could predict the salary of an employee based upon his/her other details. Obviously, this model doesn’t understand anything about the city name. So how will you make the model know about it? For example, an employee who lives in a metropolitan city earns more than employees of a small city. Someway we need to make the model know about this . Yes, the way you are thinking in your mind is what we will do through code. As obvious we are thinking to rank the city based upon some spec . These ways of converting a categorical data to a numerical data are our target.

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the **Machine Learning Foundation Course** at a student-friendly price and become industry ready.

**What is target guided encoding technique?**

In this technique we will take help of our target variable to encode the categorical data . lets understand by an example,

Employee Id | City | Highest Qualification | Salary |

A100 | delhi | Phd | 50000 |

A101 | delhi | bsc | 30000 |

A102 | mumbai | msc | 45000 |

B101 | pune | bsc | 25000 |

B102 | kolkata | phd | 48000 |

C100 | pune | msc | 30000 |

D103 | kolkata | msc | 44000 |

Lets try to encode the city column using the target guided encoding. Here our target variable is salary.

__step 1__: sort the cities based upon the corresponding salary. Now to do this we will take mean of all the salaries of that particular city.

__step 2__: Based upon the mean of the salary the descending order of the city is :

kolkata>mumbai>delhi>pune

__step3__: Based upon this order we will rank the cities.

City | Rank |

kolkata | 4 |

mumbai | 3 |

delhi | 2 |

pune | 1 |

(note: you can rank them in the opposite order too)

__step 4 __: we will use this information to encode the City column of the dataset.

Employee Id | City | Highest Qualification | Salary |

A100 | 2 | phd | 50000 |

A101 | 2 | bsc | 30000 |

A102 | 3 | msc | 45000 |

B101 | 1 | bsc | 25000 |

B102 | 4 | phd | 48000 |

C100 | 1 | msc | 30000 |

D103 | 4 | msc | 44000 |

*This is all what target guided encoding is! simple right? *Lets now explore about mean guided encoding.

**What is mean guided encoding technique?**

We will encode the Highest qualification column using the mean guided encoding technique.

__step 1__: For each highest qualification we will find the mean of all the corresponding salary.

__step 2__ : Instead of ranking them based upon the mean value , we will encode this mean value corresponding to the respective highest qualification

Highest Qualification | Mean Salary |

Phd | 49000 |

Msc | 39666.67 |

Bsc | 27500 |

__step 3__ : We will use this to encode the Highest Qualification column

Employee Id | City | Highest Qualification | Salary |

A100 | 2 | 49000 | 50000 |

A101 | 2 | 27500 | 30000 |

A102 | 3 | 39666.67 | 45000 |

B101 | 1 | 27500 | 25000 |

B102 | 4 | 49000 | 48000 |

C100 | 1 | 39666.67 | 30000 |

D103 | 4 | 39666.67 | 44000 |

Hence we are ready with our dataset to prepare our model.