Open In App

Bayes’ Theorem in Data Mining

Last Updated : 04 Jul, 2021
Like Article

Bayes’ Theorem describes the probability of an event, based on precedent knowledge of conditions which might be related to the event. In other words, Bayes’ Theorem is the add-on of Conditional Probability. 

With the help of Conditional Probability, one can find out the probability of X given H, and it is denoted by P(X | H). Now Bayes’ Theorem states that if we know Conditional Probability (P(X | H)) then we can find out P(H | X), given the condition that P(X) and P(H) are already known to us.

Bayes’ Theorem is named after Thomas Bayes. He first makes use of conditional probability to provide an algorithm which uses evidence to calculate limits on an unknown parameter. Bayes’ Theorem has two types of probabilities :

  1. Prior Probability [P(H)]
  2. Posterior Probability [P(H/X)]


  • X – X is a data tuple.
  • H – H is some Hypothesis.

1. Prior Probability

Prior Probability is the probability of occurring an event before the collection of new data. It is the best logical evaluation of the probability of an outcome which is based on the present knowledge of the event before the inspection is performed.

2. Posterior Probability

When new data or information is collected then the Prior Probability of an event will be revised to produce a more accurate measure of a possible outcome. This revised probability becomes the Posterior Probability and is calculated using Bayes’ theorem. So, the Posterior Probability is the probability of an event X occurring given that event H has occurred.

For example 

Suppose, three bags have the labels A, B, and C. One bag has a red ball in it, while the other two do not. The prior probability of red ball found in bag B is one-third or 0.333. But when bag C is seen, and the result shows that there is no red ball in that bag, then the posterior probability of red ball found in bag A and B becomes 0.5, as each bag has one out of two chances.


Bayes’ Theorem, can be mathematically represented by the equation given below :

P(H/X) =P(X/H)P(H)/P(X)


  • H and X are the events and,
  • P (X) ≠ 0
  • P(H/X) Conditional probability of H.

                          Given that X occurs.

  • P(X/H) – Conditional probability of X.

                          Given that H occurs.

  • P(H) and P(X) – Prior Probabilities of occurring H and X independent of each other.

                                     This is called the marginal probability.

Formula Derivation of Bayes’ Theorem

According to conditional probability, we know that
P(X|H) = P(X and H)/P(H)

P(X and H) = P(X|H) * P(H) ---------- [1]

P(H|X) = P(H and X)/P(X)
       = P(X and H)/P(X) [Order does not matter in Joint Probability]

P(X and H) = P(H|X) * P(X) --------- [2]

Now from equation [1] and [2],
P(X|H) * P(H) = P(H|X) * P(X)

⇒ P(X|H) = P(H|X) * P(X)/P(H)

It means that if we know P(X|H), then we can find out P(H | X), 
given the condition that P(X) and P(H) are already known to us.


Now, let us consider X1, X2, X3…..Xk be a group of events having probability P(Xi), i = 1, 2, 3…..k and for any event H where P(H) > 0.

P(Xi|H) = P(Xi and H) / P(H)
        = P(H|Xi)*P(Xi) / ∑[P(H|Xi)*P(Xi) 


Tree representation of Bayes’ Theorem

To find Reverse Probabilities : Bayes' Theorem
P(X1|H) = P(H|X1)*P(X1) / P(H)

- P(X1) and P(H) are called marginal probabilities.
- P(X1) and P(H|X1) is already given.

Therefore, P(H) can be calculated as given below :
P(H) = P(H|X1)*P(X1) + P(H|X2)*P(X2) + P(H|X3)*P(X3)
       (This is also known as Total Probability)
To find Reverse Probabilities : Bayes' Theorem
P(X1|H') = P(H'|X1)*P(X1) / P(H')

Now, P(H) can be calculated as
P(H') = P(H'|X1)*P(X1) + P(H'|X2)*P(X2) + P(H'|X3)*P(X3)

Applications of Bayes’ Theorem 

In the real world, there are plenty of applications of the Bayes’ Theorem. Some applications are given below : 

  • It can also be used as a building block and starting point for more complex methodologies, For example, The popular Bayesian networks.
  • Used in classification problems and other probability-related questions.
  • Bayesian inference, a particular approach to statistical inference.
  • In genetics, Bayes’ theorem can be used to calculate the probability of an individual having a specific genotype.


1.  SpamAssassin works as a mail filter to identify the spam in which users train the system. In emails, it considers patterns in the words which are  marked as spam by the users. For Example, it may have learned that the word “release” is marked as spam in 30% of the emails. Concluding 0.8% of non-spam mails which includes the word “release” and 40% of all emails which are received by the user is spam. Find the probability that a mail is a spam if the word “release” seems in it.

Solution :

P(Release | Spam) = 0.30
P(Release | Non Spam) = 0.008
P(Spam) = 0.40 
        => P(Non Spam) = 0.40
P(Spam | Release) = ?

Now, using Bayes’ Theorem:
P(Spam | Release) = P(Release | Spam) * P(Spam) / P(Release)
                  = 0.30 * 0.40 / (0.40 * 0.30 + 0.30 * 0.008)
                  = 0.980
Hence, the required probability is 0.980.

2.  Bag1 contains 4 white and 8 black balls and Bag2 contains 5 white and 3 black balls. From one of the bag one ball is drawn at random and the ball which is drawn comes out as black. Find the probability that the ball is drawn from Bag1.


Let E1, E2 and A be the three events where,
E1 = Event of selecting Bag1
E2 = Event of selecting Bag2
A = Event of drawing black ball

P(E1) = P(E2) = 1/2
P(drawing a black ball from Bag1) = P(A|E1) = 8/12 = 2/3
P(drawing a black ball from Bag2) = P(A|E2) = 3/8 

By using Bayes' Theorem, the probability of drawing a black ball from Bag1,
P(E1|A) = P(A|E1) * P(E1) / P(A|E1) * P(E1) + P(A|E2) * P(E2) 
                [P(A|E1) * P(E1) + P(A|E2) * P(E2) = Total Probability]
        = (2/3 * 1/2) / (2/3 * 1/2 + 3/8 * 1/2)
        = 16/25
Hence, the probability that the ball is drawn from Bag1 is 16/25


Similar Reads

Frequent Pattern Mining in Data Mining
Frequent pattern mining in data mining is the process of identifying patterns or associations within a dataset that occur frequently. This is typically done by analyzing large datasets to find items or sets of items that appear together frequently. Frequent pattern extraction is an essential mission in data mining that intends to uncover repetitive
10 min read
Mining Collective Outliers Data Mining
A database may contain data objects that do not comply with the general behavior or model of the data. These data objects are Outliers. The investigation of OUTLIER data is known as OUTLIER MINING. An outlier may be detected using statistical tests which assume a distribution or probability model for the data, or using distance measures where objec
5 min read
Generalized Sequential Pattern (GSP) Mining in Data Mining
GSP is a very important algorithm in data mining. It is used in sequence mining from large databases. Almost all sequence mining algorithms are basically based on a prior algorithm. GSP uses a level-wise paradigm for finding all the sequence patterns in the data. It starts with finding the frequent items of size one and then passes that as input to
7 min read
Text Mining in Data Mining
In this article, we will learn about the main process or we should say the basic building block of any NLP-related tasks starting from this stage of basically Text Mining. What is Text Mining?Text mining is a component of data mining that deals specifically with unstructured text data. It involves the use of natural language processing (NLP) techni
10 min read
Different Types of Data in Data Mining
Introduction : In general terms, “Mining” is the process of extraction. In the context of computer science, Data Mining can be referred to as knowledge mining from data, knowledge extraction, data/pattern analysis, data archaeology, and data dredging. There are other kinds of data like semi-structured or unstructured data which includes spatial dat
7 min read
Data Cube or OLAP approach in Data Mining
What is OLAP?OLAP stands for Online Analytical Processing, which is a technology that enables multi-dimensional analysis of business data. It provides interactive access to large amounts of data and supports complex calculations and data aggregation. OLAP is used to support business intelligence and decision-making processes. Grouping of data in a
4 min read
Difference between Data Profiling and Data Mining
1. Data Mining :Data mining can be defined as the process of identifying the patterns in a prebuilt database. It extracts aberrant patterns, interconnection between the huge datasets to get the correct outcomes.Data mining, sometimes known as “Knowledge discovery in databases”. We can say that it is a combination of three scientific disciplines i.e
5 min read
Data Mining - Time-Series, Symbolic and Biological Sequences Data
Data mining refers to extracting or mining knowledge from large amounts of data. In other words, Data mining is the science, art, and technology of discovering large and complex bodies of data in order to discover useful patterns. Theoreticians and practitioners are continually seeking improved techniques to make the process more efficient, cost-ef
3 min read
Data Mining For Financial Data Analysis
Data Mining is a quite strong field to execute advanced examination of data as well as it carries off techniques and mechanisms from statistics and machine learning. Business intelligence and advanced analytics applications use the information which is generated by it which involves the analysis of verified data. Financial analysis of data is very
3 min read
Clustering High-Dimensional Data in Data Mining
Clustering is basically a type of unsupervised learning method. An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labeled responses. Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar t
3 min read
Article Tags :