Open In App

What is Machine Learning?

Last Updated : 25 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

Do you get automatic recommendations on Netflix and Amazon Prime about the movies you should watch next? Or maybe you get options for People You may know on Facebook or LinkedIn? You might also use Siri, Alexa, etc. on your phones. That’s all Machine Learning! This is a technology that is becoming more and more popular. Chances are that Machine Learning is used in almost every technology around you!

What-is-Machine-Learning?

And it is hardly a new concept. Researchers have always been fascinated by the capacity of machines to learn on their own without being programmed in detail by humans. However, this has become much easier to do with the emergence of big data in modern times. Large amounts of data can be used to create much more accurate Machine Learning algorithms that are actually viable in the technical industry. And so, Machine Learning is now a buzz word in the industry despite having existed for a long time.

But are you wondering what is Machine Learning after all? What are its various types and what are the different Machine Learning algorithms? Read on to find the answers to all your questions!

What is Machine Learning?

Machine Learning, as the name says, is all about machines learning automatically without being explicitly programmed or learning without any direct human intervention. This machine learning process starts with feeding them good quality data and then training the machines by building various machine learning models using the data and different algorithms. The choice of algorithms depends on what type of data we have and what kind of task we are trying to automate.

As for the formal definition of Machine Learning, we can say that a Machine Learning algorithm learns from experience E with respect to some type of task T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

For example, If a Machine Learning algorithm is used to play chess. Then the experience E is playing many games of chess, the task T is playing chess with many players, and the performance measure P is the probability that the algorithm will win in the game of chess.

What is the difference between Artificial Intelligence and Machine Learning?

Artificial Intelligence and Machine Learning are correlated with each other, and yet they have some differences. Artificial Intelligence is an overarching concept that aims to create intelligence that mimics human-level intelligence. Artificial Intelligence is a general concept that deals with creating human-like critical thinking capability and reasoning skills for machines. On the other hand, Machine Learning is a subset or specific application of Artificial intelligence that aims to create machines that can learn autonomously from data. Machine Learning is specific, not general, which means it allows a machine to make predictions or take some decisions on a specific problem using data. 

What are the types of Machine Learning?

Let’s see the different types of Machine Learning now:

1. Supervised Machine Learning

Imagine a teacher supervising a class. The teacher already knows the correct answers but the learning process doesn’t stop until the students learn the answers as well. This is the essence of Supervised Machine Learning Algorithms. Here, the algorithm learns from a training dataset and makes predictions that are compared with the actual output values. If the predictions are not correct, then the algorithm is modified until it is satisfactory. This learning process continues until the algorithm achieves the required level of performance. Then it can provide the desired output values for any new inputs.

2. Unsupervised Machine Learning

In this case, there is no teacher for the class and the students are left to learn for themselves! So for Unsupervised Machine Learning Algorithms, there is no specific answer to be learned and there is no teacher. In this way, the algorithm doesn’t figure out any output for input but it explores the data. The algorithm is left unsupervised to find the underlying structure in the data in order to learn more and more about the data itself.

3. Semi-Supervised Machine Learning

The students learn both from their teacher and by themselves in Semi-Supervised Machine Learning. And you can guess that from the name itself! This is a combination of Supervised and Unsupervised Machine Learning that uses a little amount of labeled data like Supervised Machine Learning and a larger amount of unlabeled data like Unsupervised Machine Learning to train the algorithms. First, the labeled data is used to partially train the Machine Learning Algorithm, and then this partially trained model is used to pseudo-label the rest of the unlabeled data. Finally, the Machine Learning Algorithm is fully trained using a combination of labeled and pseudo-labeled data.

4. Reinforcement Machine Learning

Well, here are the hypothetical students who learn from their own mistakes over time (that’s like life!). So the Reinforcement Machine Learning Algorithms learn optimal actions through trial and error. This means that the algorithm decides the next action by learning behaviors that are based on its current state and that will maximize the reward in the future. This is done using reward feedback that allows the Reinforcement Algorithm to learn which are the best behaviors that lead to maximum reward. This reward feedback is known as a reinforcement signal.

What are some popular Machine Learning algorithms?

Let’s look at some of the popular Machine Learning algorithms that are based on specific types of Machine Learning.

Supervised Machine Learning

Supervised Machine Learning includes Regression and Classification algorithms. Some of the more popular algorithms in these categories are:

1. Linear Regression Algorithm

The Linear Regression Algorithm provides the relation between an independent and a dependent variable. It demonstrates the impact on the dependent variable when the independent variable is changed in any way. So the independent variable is called the explanatory variable and the dependent variable is called the factor of interest. An example of the Linear Regression Algorithm usage is to analyze the property prices in the area according to the size of the property, number of rooms, etc.

2. Logistic Regression Algorithm

The Logistic Regression Algorithm deals in discrete values whereas the Linear Regression Algorithm handles predictions in continuous values. This means that Logistic Regression is a better option for binary classification. An event in Logistic Regression is classified as 1 if it occurs and it is classified as 0 otherwise. Hence, the probability of a particular event occurrence is predicted based on the given predictor variables. An example of the Logistic Regression Algorithm usage is in medicine to predict if a person has malignant breast cancer tumors or not based on the size of the tumors.

3. Naive Bayes Classifier Algorithm

Naive Bayes Classifier Algorithm is used to classify data texts such as a web page, a document, an email, among other things. This algorithm is based on the Bayes Theorem of Probability and it allocates the element value to a population from one of the categories that are available. An example of the Naive Bayes Classifier Algorithm usage is for Email Spam Filtering. Gmail uses this algorithm to classify an email as Spam or Not Spam.

Unsupervised Machine Learning

Unsupervised Machine Learning mainly includes Clustering algorithms. Some of the more popular algorithms in this category are:

1. K Means Clustering Algorithm

Let’s imagine that you want to search the name “Harry” on Wikipedia. Now, “Harry” can refer to Harry Potter, Prince Harry of England, or any other popular Harry on Wikipedia! So Wikipedia groups the web pages that talk about the same ideas using the K Means Clustering Algorithm (since it is a popular algorithm for cluster analysis). K Means Clustering Algorithm in general uses K number of clusters to operate on a given data set. In this manner, the output contains K clusters with the input data partitioned among the clusters.

2. Apriori Algorithm

The Apriori Algorithm uses the if-then format to create association rules. This means that if a certain event 1 occurs, then there is a high probability that a certain event 2 also occurs. For example: IF someone buys a car, THEN there is a high chance they buy car insurance as well. The Apriori Algorithm generates this association rule by observing the number of people who bought car insurance after buying a car. For example, Google auto-complete uses the Apriori Algorithm. When a word is typed in Google, the Apriori Algorithm looks for the associated words that are usually typed after that word and displays the possibilities.

What is Deep Learning?

Deep Learning is a subset of Machine Learning. It is based on learning by example, just like humans do, using Artificial Neural Networks. These Artificial Neural Networks are created to mimic the neurons in the human brain so that Deep Learning algorithms can learn much more efficiently. Deep Learning is so popular now because of its wide range of applications in modern technology. From self-driving cars to image, speech recognition, and natural language processing, Deep Learning is used to achieve results that were not possible before.

What are Artificial Neural Networks?

Artificial Neural Networks are modeled after the neurons in the human brain. They contain artificial neurons which are called units. These units are arranged in a series of layers that together constitute the whole Artificial Neural Networks in a system. A layer can have only a dozen units or millions of units as this depends on the complexity of the system. Commonly, Artificial Neural Networks have an input layer, output layer as well as hidden layers. The input layer receives data from the outside world which the neural network needs to analyze or learn about. Then this data passes through one or multiple hidden layers that transform the input into data that is valuable for the output layer. Finally, the output layer provides an output in the form of a response of the Artificial Neural Networks to input data provided.

In the majority of neural networks, units are interconnected from one layer to another. Each of these connections has weights that determine the influence of one unit on another unit. As the data transfers from one unit to another, the neural network learns more and more about the data which eventually results in an output from the output layer.

What is Machine Learning used for?

Machine Learning is used in almost all modern technologies and this is only going to increase in the future. In fact, there are applications of Machine Learning in various fields ranging from smartphone technology to healthcare to social media, and so on.

Smartphones use personal voice assistants like Siri, Alexa, Cortana, etc. These personal assistants are an example of ML-based speech recognition that uses Natural Language Processing to interact with the users and formulate a response accordingly. Machine Learning is also used in social media. Let’s take Facebook’s ‘People you may know’ as an example. It is mind-boggling how social media platforms can guess the people you might be familiar with in real life. And they are right most of the time!!! This is done by using Machine Learning algorithms that analyze your profile, your interests, your current friends, and also their friends and various other factors to calculate the people you might potentially know.

Machine Learning is also very important in healthcare diagnosis as it can be used to diagnose a variety of problems in the medical field. For example, Machine Learning is used in oncology to train algorithms that can identify cancerous tissue at the microscopic level at the same accuracy as trained physicians. Another famous application of Machine Learning is Google Maps. The Google Maps algorithm automatically picks the best route from one point to another by relying on the projections of different timeframes and keeping in mind various factors like traffic jams, roadblocks, etc. In this way, you can see that the applications of Machine Learning are limitless. If anything, they are only increasing and Machine Learning may one day be used in almost all fields of study!

Machine Learning Problem Categories:

1.Classification:

  • Classification is a type of supervised learning technique used to predict the class or category of a new observation based on labeled training data.
  • Classification is a way to identify a grouping technique for a given dataset in such away that depending on a value of the target or output attribute, the entire dataset can be qualified to belong to a class. 
  • This technique helps in identifying the data behavior patterns. This is, in short, a discrimination mechanism.
  • For example, a spam filter is a common example of classification, where the algorithm is trained to classify an email as spam or not spam based on previous labeled examples.
  • Examples:
  • Email spam filtering: Classifying emails as either spam or not spam based on the contents of the email.
  • Image classification: Identifying objects in an image and labeling them according to their category (e.g., dog, cat, bird, etc.).
  • Fraud detection: Identifying whether a transaction is fraudulent or legitimate based on historical data and other factors.

2.Clustering:

  • Clustering is an unsupervised learning technique used to group similar observations into clusters based on their features or attributes. 
  • The goal of clustering is to identify natural groupings within the data that can be used to gain insights or make predictions.
  • In short, clustering is a classification analysis that does not start with a specific target in mind (good/bad, will buy/will not buy).
  • For example, clustering can be used to group customers based on their purchasing behavior, or to group patients based on their health status.
  • Examples:
  • Customer segmentation: Grouping customers with similar purchase patterns together to better understand their behavior.
  • Image segmentation: Separating an image into different regions based on their similarity or dissimilarity.
  • Anomaly detection: Identifying unusual patterns or behaviors in data, such as identifying a defective product in a manufacturing line.

3.Regression:

  • Regression is a type of supervised learning in which an algorithm learns to predict a continuous numerical value or output based on a set of input features. 
  • The goal of regression is to create a model that accurately predicts the output value of new, unseen data.
  • For example, in housing price prediction, regression can be used to predict the price of a house based on features such as its location, size, and number of rooms.
  • Examples:
  • Housing price prediction: Predicting the price of a house based on its location, size, and other features.
  • Stock market prediction: Predicting the future price of a stock based on its historical performance and other factors.
  • Medical diagnosis: Predicting a patient’s future health status based on their medical history and other factors.

4.Simulation:

  • Simulation is a technique used to generate synthetic data based on a set of assumptions and rules. 
  • Simulations can be used to model complex systems or processes that are difficult or expensive to study in the real world. 
  • For example, in traffic simulation, a computer model can be used to simulate traffic patterns and optimize traffic flow through a city. 
  • Similarly, climate modeling can be used to simulate the Earth’s climate and predict future climate patterns.
  • Examples:
  • Traffic simulation: Simulating traffic patterns to better understand how traffic flows through a city and to optimize traffic flow.
  • Climate modeling: Simulating the Earth’s climate to better understand the effects of climate change and to predict future climate patterns.
  • Financial risk modeling: Simulating different financial scenarios to better understand risk and to make informed investment decisions.

5.Optimization:

  • Optimization is a technique used to find the best solution to a problem, given a set of constraints and objectives. 
  • Optimization problems can involve maximizing or minimizing an objective function subject to certain constraints. 
  • For example, in supply chain optimization, the goal is to optimize the logistics of a supply chain to reduce costs and improve efficiency. 
  • Portfolio optimization is another example in which the goal is to optimize the allocation of investments to maximize returns while minimizing risk.
  • Resource allocation is yet another example in which the goal is to optimize the allocation of resources, such as personnel, time, and equipment, to maximize efficiency and minimize costs.

Machine Learning Life Cycle:

Machine learning life cycle involves seven major steps, which are given below:

  1.  Gathering Data
  2.  Data preparation
  3.  Data Wrangling
  4.  Analyze Data
  5.  Train the model
  6.  Test the model
  7.  Deployment

1.Gathering Data:

  •  In this step, we need to identify the different data sources, as data can be collected from various sources such as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle. The quantity and quality of the collected data will determine the efficiency of the output. The more will be the data, the more accurate will be the prediction.

This step includes the below tasks:

  •  Identify various data sources
  •  Collect data
  •  Integrate the data obtained from different sources

2.Data preparation:

In this step, first, we put all data together, and then randomize the ordering of data.

This step can be further divided into two processes:

 Data exploration:

  •  It is used to understand the nature of data that we have to work with. We need to understand the characteristics, format, and quality of data.
  • A better understanding of data leads to an effective outcome. In this, we find Correlations, general trends, 
  • and outliers.

 Data pre-processing:

  • Now the next step is preprocessing of data for its analysis.

3.Data Wrangling:

  •  Data wrangling is the process of cleaning and converting raw data into a useable format. 
  •  It is the process of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make it more suitable for analysis in the next step.
  • It is one of the most important steps of the complete process. Cleaning of data is required to address the quality issues.
  • It is not necessary that data we have collected is always of our use as some of the data may not be useful. 

   In real-world applications, collected data may have various issues, including:

  • Missing Values
  • Duplicate data
  • Invalid data

4.Analyse Data:

  •  Now the cleaned and prepared data is passed on to the analysis step. This step involves:
  •  Selection of analytical techniques
  •  Building models
  •  Review the result
  • The aim of this step is to build a machine learning model to analyze the data using various analytical techniques and review the outcome. 
  • It starts with the determination of the type of the problems, where we select the machine learning techniques such as Classification, Regression, Cluster analysis, Association, etc. then build the model using prepared data, and evaluate the model

5.Train the model:

  • Now the next step is to train the model, in this step we train our model to improve its performance for better outcome of the problem.
  • We use datasets to train the model using various machine learning algorithms. 

6.Test the model:

  •  Once our machine learning model has been trained on a given dataset, then we test the model. In this step, we check for the accuracy of our model by providing a test dataset to it.

7.Deployment:

  •  The last step of machine learning life cycle is deployment, where we deploy the model in the real-world system

Are Machine Learning Algorithms totally objective?

Machine Learning Algorithms are trained using data sets. And unfortunately, sometimes the data may be biased and so the ML algorithms are not totally objective. This is because the data may include human biases, historical inequalities, or different metrics of judgement based on gender, race, nationality, sexual orientation, etc. For example, Amazon found out that their Machine Learning based recruiting algorithm was biased against women. This may have occurred as the recruiting algorithm was trained to analyze the candidates’ resumes by studying Amazon’s response to the resumes that were submitted in the past 10 years. However, the human recruiters who analyzed these resumes in the past were mostly men with an inherent bias against women candidates that were passed on to the AI algorithm.

This means that some Machine Learning Algorithms used in the real world may not be objective due to biased data. However, companies are working on making sure that only objective algorithms are used. One way to do this is to preprocess the data so that the bias is eliminated before the ML algorithm is trained on the data. Another way is to post-process the ML algorithm after it is trained on the data so that it satisfies an arbitrary fairness constant that can be decided beforehand.

Which Cloud Computing Platforms offer Machine Learning?

There are many Cloud Computing Platforms that offer Machine Learning services to other companies. The most popular among them are:

1. Amazon Web Services (AWS)

Some of the products that Amazon Web Services provides include Amazon SageMaker for creating and training machine learning models, Amazon Forecast to increase the forecast accuracy, Amazon Translate for language translation using natural language processing, Amazon Polly for converting text into life-like speech, etc.

2.  Microsoft Azure

Some of the products that Microsoft Azure provides include Microsoft Azure Machine Learning for creating and deploying machine learning models, Cognitive Service for providing smart cognitive services, Databricks for Apache Spark-based analytics, Bot Service for smart and intelligent bot services, etc.

3. Google Cloud

Some of the products that Google Cloud provides include Google Cloud AutoML for training an AutoML machine learning model, Vision AI for cloud vision, Speech-to-Text for transmitting from speech to text, Text-to-Speech for transmitting from text to speech, Natural Language for natural language processing, etc.

4. IBM Watson Cloud

Some of the products that IBM Watson Cloud provides include IBM Watson Studio for building machine learning and artificial intelligence models, Speech-to-Text for transmitting from speech to text, Text-to-Speech for transmitting from text to speech, Assistant for creating and managing virtual assistants, Natural Language Understanding for natural language processing, etc.

Advantages of Machine Learning :

There are several advantages of using machine learning, including:

  1. Improved accuracy: Machine learning algorithms can analyze large amounts of data and identify patterns that may not be apparent to humans. This can lead to more accurate predictions and decisions.
  2. Automation: Machine learning models can automate tasks that would otherwise be done by humans, freeing up time and resources.
  3. Real-time performance: Machine learning models can analyze data in real time, allowing for quick decision making.
  4. Scalability: Machine learning models can be easily scaled up or down to handle changes in the amount of data.
  5. Cost-effectiveness: Machine learning can reduce the need for human labor, which can lead to cost savings over time.
  6. Ability to learn from experience: Machine learning models can improve over time as they are exposed to more data, which enables them to learn from their mistakes and improve their performance.
  7. Better predictions: Machine learning models can make predictions with greater accuracy than traditional statistical models.
  8. Predictive Maintenance: Machine learning models can help identify patterns in sensor data that are indicative of equipment failure, allowing for preventative maintenance to be scheduled before an issue occurs.

Disadvantaged of Machine Learning:

While there are many advantages to using machine learning, there are also some potential disadvantages to consider, including:

  1. Complexity: Machine learning algorithms can be complex and difficult to understand, which can make it difficult for non-experts to use or interpret the results.
  2. Data requirements: Machine learning algorithms require large amounts of data to train and be accurate, which can be difficult to collect and preprocess.
  3. Biased data: Machine learning models are only as good as the data they are trained on, and if the data is biased, the model will also be biased.
  4. Overfitting: Machine learning algorithms can be overfit to the training data, which means they will not perform well on new, unseen data.
  5. Limited interpretability: Some machine learning models, particularly deep learning models, can be difficult to interpret, making it hard to understand how they reached a particular decision.
  6. Lack of transparency: Some machine learning models are considered black boxes, meaning it is difficult or impossible to understand how they arrived at a particular decision.
  7. Privacy concerns: Machine learning models can process sensitive data that could be used to discriminate or make privacy-intrusive decisions if not used responsibly.
  8. Requirements of experts: Machine learning requires experts such as data scientists, engineers, statisticians who can develop, train and deploy models which can be costly.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads