Open In App

How to convert an array of indices to one-hot encoded NumPy array

Last Updated : 24 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

A very popular technique used in machine learning to transform categorical data into binary values of 0 and 1 is called the one-hot encoding technique. There are various circumstances when you need to use a one-hot encoded NumPy array rather than an array of indices, thus we can convert it using the arrange or LabelBinarizer function. In this article, we will discuss all the ways through which we can do the same.

Converting an array of indices to one-hot encoded NumPy array

  • Using arange function
  • Using LabelBinarizer function
  • Using eye() function

Using the arange function

The function that is used to generate an array with evenly spaced values within a specified interval is called arange function(). In this way, we will see how we can convert an array of indices to a one-hot encoded NumPy array using arange function.

Example: The array of indices which is to be converted to one-hot encoded NumPy array is as follows:

Screenshot-2023-09-18-231101

This code performs one-hot encoding on a NumPy array called ‘arr’. It creates a new array, ‘encoded_arr’, with the necessary dimensions for one-hot encoding. Then, it sets specific positions in ‘encoded_arr’ to 1 based on the values in ‘arr’. This results in a one-hot encoded representation of the original array, where each unique value in ‘arr’ corresponds to a unique column in ‘encoded_arr’, and a 1 is placed in the column corresponding to the value in ‘arr’.

Python3




import numpy as np
arr = np.array([4,7,2,9])
encoded_arr = np.zeros((arr.size, arr.max()+1), dtype=int)
encoded_arr[np.arange(arr.size),arr] = 1
print(encoded_arr)


Output:

Screenshot-2023-09-18-230947

Using LabelBinarizer function

The function which binarize labels in a one-vs-all fashion is known as LabelBinarizer function() in sklearn. In this way, we will see how we can convert an array of indices to one-hot encoded NumPy array using LabelBinarizer function.

Syntax: sklearn.preprocessing.LabelBinarizer(*, neg_label=0, pos_label=1, sparse_output=False)

Here,

  • neg_label: It defines the value with which negative labels must be encoded.
  • pos_label: It defines the value with which positive labels must be encoded.
  • sparse_output: It defines a boolean value which defines whether array is in sparse CSR format.

Example: The array of indices which is to be converted to one-hot encoded NumPy array is as follows:

Screenshot-2023-09-18-231101

This code demonstrates one-hot encoding using the LabelBinarizer from scikit-learn. It first initializes a LabelBinarizer and fits it to a range of values from 0 to the maximum value in the input array ‘arr’. Then, it transforms ‘arr’ into a one-hot encoded array called ‘encoded_arr’. Each unique value in ‘arr’ corresponds to a unique column in ‘encoded_arr’, and a 1 is placed in the column corresponding to the value in ‘arr’. The resulting ‘encoded_arr’ represents the one-hot encoded version of the original ‘arr’.

Python3




import numpy as np
import sklearn.preprocessing
arr = np.array([4,7,2,9])
label_binarizer = sklearn.preprocessing.LabelBinarizer()
label_binarizer.fit(range(max(arr)+1))
encoded_arr = label_binarizer.transform(arr)
print('{0}'.format(encoded_arr))


Output:

Screenshot-2023-09-18-230947

Using eye() function

The array of indices which is to be converted to one-hot encoded NumPy array is as follows:

[1 3 2]

In this code we use the eye() function to generate the one hot encoding array for the input array. Here ‘1’ is present at that particular position and at other positions ‘0’ is present.

Python




import numpy as np
list = np.array([1, 3, 2])
print(list)
print(np.eye(4)[list])


Output:

[[0. 1. 0. 0.]
[0. 0. 0. 1.]
[0. 0. 1. 0.]]


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads