Supervised and Unsupervised learning
Supervised learning: Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically supervised learning is when we teach or train the machine using data that is well-labelled. Which means some data is already tagged with the correct answer. After that, the machine is provided with a new set of examples(data) so that the supervised learning algorithm analyses the training data(set of training examples) and produces a correct outcome from labeled data.
For instance, suppose you are given a basket filled with different kinds of fruits. Now the first step is to train the machine with all the different fruits one by one like this:
- If the shape of the object is rounded and has a depression at the top, is red in color, then it will be labeled as –Apple.
- If the shape of the object is a long curving cylinder having Green-Yellow color, then it will be labeled as –Banana.
Now suppose after training the data, you have given a new separate fruit, say Banana from the basket, and asked to identify it.
Since the machine has already learned the things from previous data and this time has to use it wisely. It will first classify the fruit with its shape and color and would confirm the fruit name as BANANA and put it in the Banana category. Thus the machine learns the things from training data(basket containing fruits) and then applies the knowledge to test data(new fruit).
Supervised learning is classified into two categories of algorithms:
- Classification: A classification problem is when the output variable is a category, such as “Red” or “blue” , “disease” or “no disease”.
- Regression: A regression problem is when the output variable is a real value, such as “dollars” or “weight”.
Supervised learning deals with or learns with “labeled” data. This implies that some data is already tagged with the correct answer.
- Logistic Regression
- Naive Bayes Classifiers
- K-NN (k nearest neighbors)
- Decision Trees
- Support Vector Machine
- Supervised learning allows collecting data and produces data output from previous experiences.
- Helps to optimize performance criteria with the help of experience.
- Supervised machine learning helps to solve various types of real-world computation problems.
- It performs classification and regression tasks.
- It allows estimating or mapping the result to a new sample.
- We have complete control over choosing the number of classes we want in the training data.
- Classifying big data can be challenging.
- Training for supervised learning needs a lot of computation time. So, it requires a lot of time.
- Supervised learning cannot handle all complex tasks in Machine Learning.
- Computation time is vast for supervised learning.
- It requires a labelled data set.
- It requires a training process.
Unsupervised learning is the training of a machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance. Here the task of the machine is to group unsorted information according to similarities, patterns, and differences without any prior training of data.
Unlike supervised learning, no teacher is provided that means no training will be given to the machine. Therefore the machine is restricted to find the hidden structure in unlabeled data by itself.
For instance, suppose it is given an image having both dogs and cats which it has never seen.
Thus the machine has no idea about the features of dogs and cats so we can’t categorize it as ‘dogs and cats ‘. But it can categorize them according to their similarities, patterns, and differences, i.e., we can easily categorize the above picture into two parts. The first may contain all pics having dogs in them and the second part may contain all pics having cats in them. Here you didn’t learn anything before, which means no training data or examples.
It allows the model to work on its own to discover patterns and information that was previously undetected. It mainly deals with unlabelled data.
Unsupervised learning is classified into two categories of algorithms:
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- Association: An association rule learning problem is where you want to discover rules that describe large portions of your data, such as people that buy X also tend to buy Y.
Types of Unsupervised Learning:-
- Exclusive (partitioning)
- Hierarchical clustering
- K-means clustering
- Principal Component Analysis
- Singular Value Decomposition
- Independent Component Analysis
Supervised vs. Unsupervised Machine Learning:
|Parameters||Supervised machine learning||Unsupervised machine learning|
|Input Data||Algorithms are trained using labeled data.||Algorithms are used against data that is not labeled|
|Computational Complexity||Simpler method||Computationally complex|
|Accuracy||Highly accurate||Less accurate|
|No. of classes||No. of classes is known||No. of classes is not known|
|Data Analysis||Uses offline analysis||Uses real-time analysis of data|
Linear and Logistics regression, Random forest,
Support Vector Machine, Neural Network, etc.
K-Means clustering, Hierarchical clustering,
Apriori algorithm, etc.
|Output||Desired output is given.||Desired output is not given.|
|Training data||Use training data to infer model.||No training data is used.|
|Complex model||It is not possible to learn larger and more complex models than with supervised learning.||It is possible to learn larger and more complex models with unsupervised learning.|
|Model||We can test our model.||We can not test our model.|
|Called as||Supervised learning is also called classification.||Unsupervised learning is also called clustering.|
|Example||Example: Optical character recognition.||Example: Find a face in an image.|
Advantages of unsupervised learning:
- It does not require training data to be labeled.
- Dimensionality reduction can be easily accomplished using unsupervised learning.
- Capable of finding previously unknown patterns in data.
- Flexibility: Unsupervised learning is flexible in that it can be applied to a wide variety of problems, including clustering, anomaly detection, and association rule mining.
- Exploration: Unsupervised learning allows for the exploration of data and the discovery of novel and potentially useful patterns that may not be apparent from the outset.
- Low cost: Unsupervised learning is often less expensive than supervised learning because it doesn’t require labeled data, which can be time-consuming and costly to obtain.
Disadvantages of unsupervised learning :
- Difficult to measure accuracy or effectiveness due to lack of predefined answers during training.
- The results often have lesser accuracy.
- The user needs to spend time interpreting and label the classes which follow that classification.
- Lack of guidance: Unsupervised learning lacks the guidance and feedback provided by labeled data, which can make it difficult to know whether the discovered patterns are relevant or useful.
- Sensitivity to data quality: Unsupervised learning can be sensitive to data quality, including missing values, outliers, and noisy data.
- Scalability: Unsupervised learning can be computationally expensive, particularly for large datasets or complex algorithms, which can limit its scalability.
This article is contributed by Shubham Bansal. If you like GeeksforGeeks and would like to contribute, you can also write an article using write.geeksforgeeks.org or mail your article to contribute@geeksforgeeksorg. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Login to comment...