In machine learning, we usually deal with datasets which contains multiple labels in one or more than one columns. These labels can be in the form of words or numbers. To make the data understandable or in human readable form, the training data is often labeled in words.
Label Encoding refers to converting the labels into numeric form so as to convert it into the machine-readable form. Machine learning algorithms can then decide in a better way on how those labels must be operated. It is an important pre-processing step for the structured dataset in supervised learning.
Suppose we have a column Height in some dataset.
After applying label encoding, the Height column is converted into:
where 0 is the label for tall, 1 is the label for medium and 2 is label for short height.
We apply Label Encoding on
iris dataset on the target column which is Species. It contains three species Iris-setosa, Iris-versicolor, Iris-virginica.
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)
After applying Label Encoding –
array([0, 1, 2], dtype=int64)
Limitation of label Encoding
Label encoding convert the data in machine readable form, but it assigns a unique number(starting from 0) to each class of data. This may lead to the generation of priority issue in training of data sets. A label with high value may be considered to have high priority than a label having lower value.
An attribute having output classes mexico, paris, dubai. On Label Encoding this column, let mexico is replaced with 0 , paris is replaced with 1 and dubai is replaced with 2.
With this, it can be interpreted that dubai have high priority than mexico and paris while training the model, But actually there is no such priority relation between these cities here.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- ML | One Hot Encoding of datasets in Python
- Python | Create Test DataSets using Sklearn
- Python | Generate test datasets for Machine learning
- How to use datasets.fetch_mldata() in sklearn - Python?
- Datasets in Keras
- PyQt5 – How to hide label | label.setHidden method
- PyQt5 – How to change size of the Label | label.resize method
- PyQt5 Scrollable Label - Setting tool tip to the label part
- PyQt5 Scrollable Label - Getting tool tip text of the label part
- PyQt5 Scrollable Label – Setting tool tip duration to label part
- PyQt5 Scrollable Label – Getting tool tip duration of the label part
- PyQt5 Label – Checking if label is window type
- PyQt5 Label – Checking if label is widget type
- Run Length Encoding in Python
- Python | C Strings of Doubtful Encoding | Set-2
- Python | C Strings of Doubtful Encoding | Set-1
- Python | Character Encoding
- Python | Encoding Decoding using Matrix
- Python - Golomb Encoding for b=2n and b!=2n
- response.encoding - Python requests
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : deepak_jain