Open In App

Active Learning in Data Mining

Last Updated : 24 Mar, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Active learning is an iterative type of supervised learning and this learning method is usually preferred if the data is highly available, yet the class labels are scarce or expensive to obtain. The learning algorithm queries the labels. The number of tuples that use Active learning for learning the concept is much smaller than the number required in typical supervised learning.  High-accurate models are developed by just using a few labeled instances in active learning. The cost of active learning is low compared to other learning methodologies.

Active Learning

 

Active learning gains high accuracy during the training of the data and it takes less time for training the model. Active learning supports only the labeled training set. Several strategies are developed for active learning on data . one of the efficient strategies for active learning is the pool-based approach. 

Example of the Pool-based Approach in Active Learning:

Let us consider D as the data set. The labeled data set L is a subset of D. U is the unlabeled data of the data set D. L is the initial training set the Active learner starts training with L. A query function is applied on the unlabeled data U to select one or more data samples and requests class labels for them from an oracle. The newly labeled data is added to the previous training set L, and the active learner learns the features of the labeled samples using the standard supervised algorithms. Active learning algorithms are evaluated by building the learning curves from the training and testing set and plotting the accuracy graph of the constructed model.

Example of the pool-based approach in Active learning

 

The active learning primary task is to choose the data tuples which are to be queried.  Many algorithms and methodologies are proposed to choose the data tuples. Uncertainty sampling is the most common method, where the active learner chooses to query the tuples which it is the least certain how to label. There are some other strategies to reduce the version space, in order to find out the subset of all hypotheses that are consistent with the labeled training tuples.  It is necessary to perform error detection in the training tuples to remove the noise of the data.

Active Learning Cycle

 

The selected tuples after error detection ensure  the maximum reduction in the incorrect  predictions  by reducing the expected entropy over U. But this  approach  requires more computational operations.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads