In ** natural language processing** and

**, the**

**machine learning****approach is a potent and popular method for classifying text documents. This method classifies documents into predetermined types based on the likelihood of a word occurring, utilizing the concepts of the Bayes theorem. This article aims to implement Document Classification using NaÃ¯ve Bayes using Python.**

**NaÃ¯ve Bayes**## Text Classification using Naive Bayes

A probabilistic classification technique, the naÃ¯ve Bayes algorithm is predicated on robust, if naÃ¯ve, independence assumptions in its probability models. Despite their simplicity, these presumptions serve as the algorithm’s foundation. Even if it frequently deviates from reality, the independence assumption adds to its “naive” characterization.

The Naive Bayes algorithm uses Thomas Bayes’ Bayes’ theorem, which forms the basis for probability model creation. The method can be trained using these probability models in supervised learning.

### Naive Bayes Algorithm

The Naive Bayes algorithm is a probabilistic classification method that bases its predictions on the Bayes theorem. Based on observable data, the Bayes theorem determines a hypothesis’s probability. When using Naive Bayes, an instance’s features serve as the evidence, while the class to which the instance belongs serves as the hypothesis.

The algorithm employing the Bayes theory is broken down as follows:

### Bayes Theorem

- P(C|F): Probability of the instance belonging to a specific class given its features.
- P(F|C): Probability of observing the features given the class.
- P(C): Prior probability of the class.
- P(F): Probability of observing the features.

The assumption of feature independence is what gives Naive Bayes its “naive” quality. It is computationally efficient since this makes calculations simpler.

Using the Bayes theorem to combine observable data (features) with previous information (prior probabilities) and assume feature independence, Naive Bayes provides predictions. Naive Bayes is efficient in a variety of classification tasks despite its simplicity, particularly in text classification and natural language processing.

### When to use Naive Bayes

There are several instances in which Naive Bayes can be applied with great effectiveness. Here are some of those scenarios:

Naive Bayes excels in text-based tasks such as spam filtering, sentiment analysis, and document categorization due to its simplicity and efficiency with high-dimensional data.**Text Classification:**: Naive Bayes can perform well with limited training data, making it valuable when dealing with small datasets or situations where collecting extensive labeled data is challenging.**Limited Training Data**: When a quick and simple solution is needed for prototyping or baseline performance, Naive Bayes is a suitable choice due to its ease of implementation.**Simple and Quick Prototyping**

## Implementation to classify text documents using Naive Bayes

### Importing Libraries

## Python3

`#importing libraries ` `import` `prettytable` |

The “prettytable” library is imported by the code snippet, indicating a desire to provide tabular data that is aesthetically pleasing. This library is frequently used to present structured data in a table with formatting. Once imported, you can use its features to improve how tabular data is presented in your Python code.

### Classification using Naive Bayes

## Python3

`print` `(` `'\n *-----* Classification using NaÃ¯ve bayes *-----* \n'` `)` `total_documents ` `=` `int` `(` `input` `(` `"Enter the Total Number of documents: "` `))` `doc_class ` `=` `[]` `i ` `=` `0` `keywords ` `=` `[]` `while` `not` `i ` `=` `=` `total_documents:` ` ` `doc_class.append([])` ` ` `text ` `=` `input` `(f` `"\nEnter the text of Doc-{i+1} : "` `).lower()` ` ` `clas ` `=` `input` `(f` `"Enter the class of Doc-{i+1} : "` `)` ` ` `doc_class[i].append(text.split())` ` ` `doc_class[i].append(clas)` ` ` `keywords.extend(text.split())` ` ` `i ` `=` `i` `+` `1` `keywords ` `=` `set` `(keywords)` `keywords ` `=` `list` `(keywords)` `keywords.sort()` `to_find ` `=` `input` `(` ` ` `"\nEnter the Text to classify using Naive Bayes: "` `).lower().split()` `probability_table ` `=` `[]` `for` `i ` `in` `range` `(total_documents):` ` ` `probability_table.append([])` ` ` `for` `j ` `in` `keywords:` ` ` `probability_table[i].append(` `0` `)` `doc_id ` `=` `1` `for` `i ` `in` `range` `(total_documents):` ` ` `for` `k ` `in` `range` `(` `len` `(keywords)):` ` ` `if` `keywords[k] ` `in` `doc_class[i][` `0` `]:` ` ` `probability_table[i][k] ` `+` `=` `doc_class[i][` `0` `].count(keywords[k])` `print` `(` `'\n'` `)` |

**Output:**

*-----* Classification using NaÃ¯ve bayes *-----*

Enter the Total Number of documents: 3

Enter the text of Doc-1 : I watched the movie.

Enter the class of Doc-1 : +

Enter the text of Doc-2 : I hated the movie.

Enter the class of Doc-2 : -

Enter the text of Doc-3 : poor acting.

Enter the class of Doc-3 : +

Enter the Text to classify using Naive Bayes: I hated the acting.

This code starts a basic Naive Bayes text classification. The user is prompted to enter the total number of documents, after which it collects details about each document, such as its text and class. After gathering the unique terms (keywords) that appear in every document, a probability table is created to count how many times each keyword appears in every document. When the user submits a text for classification, the likelihood that it belongs in each class is calculated based on the frequency of the term in the training materials. There’s a probability table with the outcomes.

### Probability of Documents

## Python3

`import` `prettytable` `keywords.insert(` `0` `, ` `'Document ID'` `)` `keywords.append(` `"Class"` `)` `Prob_Table ` `=` `prettytable.PrettyTable()` `Prob_Table.field_names ` `=` `keywords` `Prob_Table.title ` `=` `'Probability of Documents'` `x ` `=` `0` `for` `i ` `in` `probability_table:` ` ` `i.insert(` `0` `, x` `+` `1` `)` ` ` `i.append(doc_class[x][` `1` `])` ` ` `Prob_Table.add_row(i)` ` ` `x ` `=` `x` `+` `1` `print` `(Prob_Table)` `print` `(` `'\n'` `)` `for` `i ` `in` `probability_table:` ` ` `i.pop(` `0` `)` `totalpluswords ` `=` `0` `totalnegwords ` `=` `0` `totalplus ` `=` `0` `totalneg ` `=` `0` `vocabulary ` `=` `len` `(keywords)` `-` `2` `for` `i ` `in` `probability_table:` ` ` `if` `i[` `len` `(i)` `-` `1` `] ` `=` `=` `"+"` `:` ` ` `totalplus ` `+` `=` `1` ` ` `totalpluswords ` `+` `=` `sum` `(i[` `0` `:` `len` `(i)` `-` `1` `])` ` ` `else` `:` ` ` `totalneg ` `+` `=` `1` ` ` `totalnegwords ` `+` `=` `sum` `(i[` `0` `:` `len` `(i)` `-` `1` `])` `keywords.pop(` `0` `)` `keywords.pop(` `len` `(keywords)` `-` `1` `)` |

**Output:**

+---------------------------------------------------------------------------+

| Probability of Documents |

+-------------+---------+-------+---+--------+------+-----+---------+-------+

| Document ID | acting. | hated | i | movie. | poor | the | watched | Class |

+-------------+---------+-------+---+--------+------+-----+---------+-------+

| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | + |

| 2 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | - |

| 3 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | + |

+-------------+---------+-------+---+--------+------+-----+---------+-------+

This code generates and shows a probability table by using the “prettytable” package. The keywords are arranged with ‘Document ID’ at the start and ‘Class’ at the conclusion. Field names are set to keywords when creating a PrettyTable object, and a title is supplied. Next, document IDs and class labels are added to the table together with the probability values from the probability_table. The code determines the total number of occurrences and words for each class (‘+’ and ‘-‘) after printing the probability table. For additional examination, it modifies the vocabulary size and eliminates pointless components from the list of keywords.

### Positive Class

## Python3

`# For positive class` `temp ` `=` `[]` `for` `i ` `in` `to_find:` ` ` `count ` `=` `0` ` ` `x ` `=` `keywords.index(i)` ` ` `for` `j ` `in` `probability_table:` ` ` `if` `j[` `len` `(j)` `-` `1` `] ` `=` `=` `"+"` `:` ` ` `count ` `=` `count` `+` `j[x]` ` ` `temp.append(count)` ` ` `count ` `=` `0` `for` `i ` `in` `range` `(` `len` `(temp)):` ` ` `temp[i] ` `=` `format` `((temp[i]` `+` `1` `)` `/` `(vocabulary` `+` `totalpluswords), ` `".4f"` `)` `print` `()` `temp ` `=` `[` `float` `(f) ` `for` `f ` `in` `temp]` `print` `(` `"Probabilities of Each word to be in '+' class are: "` `)` `h ` `=` `0` `for` `i ` `in` `to_find:` ` ` `print` `(f` `"P({i}/+) = {temp[h]}"` `)` ` ` `h ` `=` `h` `+` `1` `print` `()` `pplus ` `=` `float` `(` `format` `((totalplus)` `/` `(totalplus` `+` `totalneg), ` `".8f"` `))` `for` `i ` `in` `temp:` ` ` `pplus ` `=` `pplus` `*` `i` `pplus ` `=` `format` `(pplus, ` `".8f"` `)` `print` `(` `"probability of Given text to be in '+' class is :"` `, pplus)` `print` `()` |

**Output: **

Probabilities of Each word to be in '+' class are:

P(i/+) = 0.1429

P(hated/+) = 0.0714

P(the/+) = 0.1429

P(acting/+) = 0.1429

probability of Given text to be in '+' class is : 0.00013890

With the input text, this code calculates the likelihood that each word belongs to the positive class (‘+’). Iteratively going over each word in “to_find,” it determines how often each word occurs in the positive class based on the probability table and uses Laplace smoothing to obtain the conditional probabilities. After that, the results are written out, displaying the probability of each word receiving the positive class. Lastly, it uses these word probabilities to compute the overall chance that the input text belongs to the positive class, and it prints the outcome. Non-zero probabilities for unseen words are guaranteed by the Laplace smoothing.

### Negative class

## Python3

`# For Negative class` `temp ` `=` `[]` `for` `i ` `in` `to_find:` ` ` `count ` `=` `0` ` ` `x ` `=` `keywords.index(i)` ` ` `for` `j ` `in` `probability_table:` ` ` `if` `j[` `len` `(j)` `-` `1` `] ` `=` `=` `"-"` `:` ` ` `count ` `=` `count` `+` `j[x]` ` ` `temp.append(count)` ` ` `count ` `=` `0` `for` `i ` `in` `range` `(` `len` `(temp)):` ` ` `temp[i] ` `=` `format` `((temp[i]` `+` `1` `)` `/` `(vocabulary` `+` `totalnegwords), ` `".4f"` `)` `print` `()` `temp ` `=` `[` `float` `(f) ` `for` `f ` `in` `temp]` `print` `(` `"Probabilities of Each word to be in '-' class are: "` `)` `h ` `=` `0` `for` `i ` `in` `to_find:` ` ` `print` `(f` `"P({i}/-) = {temp[h]}"` `)` ` ` `h ` `=` `h` `+` `1` `print` `()` `pneg ` `=` `float` `(` `format` `((totalneg)` `/` `(totalplus` `+` `totalneg), ` `".8f"` `))` `for` `i ` `in` `temp:` ` ` `pneg ` `=` `pneg` `*` `i` `pneg ` `=` `format` `(pneg, ` `".8f"` `)` `print` `(` `"probability of Given text to be in '-' class is :"` `, pneg)` `print` `(` `'\n'` `)` |

**Output: **

Probabilities of Each word to be in '-' class are:

P(i/-) = 0.1667

P(hated/-) = 0.1667

P(the/-) = 0.1667

P(acting/-) = 0.0833

probability of Given text to be in '-' class is : 0.00012863

The probability that each word in the input text belongs to the negative class (‘-‘) are calculated by this code. Iterating through every word in “to_find,” it determines each word’s occurrences in the negative class using the probability table, and then computes conditional probabilities using Laplace smoothing, just like the positive class computation does. The probability of each word being assigned to the negative class is then printed along with the findings. Lastly, it uses these word probabilities to compute the overall chance that the input text belongs to the negative class, and it prints the result. In both positive and negative class calculations, the Laplace smoothing guarantees non-zero probabilities for unseen words.

### Prediction

## Python3

`if` `pplus > pneg:` ` ` `print` `(` ` ` `f` `"Using Naive Bayes Classification, We can clearly say that the given text belongs to '+' class with probability {pplus}"` `)` `else` `:` ` ` `print` `(` ` ` `f` `"Using Naive Bayes Classification, We can clearly say that the given text belongs to '-' class with probability {pneg}"` `)` `print` `(` `'\n'` `)` |

**Output:**

Probabilities of Each word to be in '+' class are:

P(i/+) = 0.1538

P(hated/+) = 0.0769

P(the/+) = 0.1538

P(acting./+) = 0.1538

probability of Given text to be in '+' class is : 0.00018651

Probabilities of Each word to be in '-' class are:

P(i/-) = 0.1818

P(hated/-) = 0.1818

P(the/-) = 0.1818

P(acting./-) = 0.0909

probability of Given text to be in '-' class is : 0.00018206

Using Naive Bayes Classification, We can clearly say that the given text belongs to '+' class with probability 0.00018651

The probabilities computed for the positive and negative classes are the basis for this code’s ultimate judgment. It prints a statement proposing a positive class prediction together with the corresponding probability if the likelihood of the text falling into the positive class (pplus) is higher than the likelihood of it falling into the negative class (pneg). If not, a message with the corresponding probability and a negative class forecast is printed.

### Also Check:

- ML | Naive Bayes Scratch Implementation using Python
- Naive Bayes Classifier in R Programming
- Applying Multinomial Naive Bayes to NLP Problems

## Frequently Asked Questions (FAQs)

**1. What is Naive Bayes classification in the context of text documents?**

**1. What is Naive Bayes classification in the context of text documents?**

Text documents are categorized into predetermined classes using the probabilistic Naive Bayes classification method. It is especially useful for text-based applications like sentiment analysis, spam detection, and document classification since it applies Bayes’ theorem under the naive assumption of feature independence.

**2. How does Naive Bayes handle the issue of feature independence in text classification?**

**2. How does Naive Bayes handle the issue of feature independence in text classification?**

Because Naive Bayes implies feature independence, every feature (word) is treated independently of all other features given the same class name. Even with this simplification, Naive Bayes is frequently effective in text classification, particularly in cases when the features (words) are conditionally independent with respect to the class.

**3. Can Naive Bayes be used for real-time text classification?**

**3. Can Naive Bayes be used for real-time text classification?**

Yes, because of its computational efficiency, Naive Bayes is a good choice for real-time text classification. It is appropriate for applications needing quick decision-making because of its speedy processing and classification of fresh text instances.

**4. Is Naive Bayes suitable for large datasets of text documents?**

**4. Is Naive Bayes suitable for large datasets of text documents?**

Naive Bayes is effective and works well with big text document datasets, yeah. It’s a good option for managing large amounts of textual data because of its speed and simplicity.

**5. How does Naive Bayes handle the presence of irrelevant words in text documents?**

**5. How does Naive Bayes handle the presence of irrelevant words in text documents?**

Naive Bayes is susceptible to words that aren’t relevant. Even though it frequently works well in noisy environments, having too many superfluous features could reduce its accuracy. Techniques for feature selection or preprocessing can lessen this sensitivity.