Naive Bayes vs. SVM for Text Classification

Text classification is a fundamental task in natural language processing (NLP), with applications ranging from spam detection to sentiment analysis and document categorization.

Two popular machine learning algorithms for text classification are Naive Bayes classifier (NB) and Support Vector Machines (SVM). Both approaches have their strengths and weaknesses, making them suitable for different types of text classification tasks. In this article, we’ll explore and compare Naive Bayes and SVM for text classification, highlighting their key differences, advantages, and limitations.

Naive Bayes Classifier (NB)

The Naive Bayes (NB) classifier is a probabilistic machine learning model widely used for text classification tasks. Despite its seemingly simplistic name, its effectiveness stems from its strong theoretical foundation and ability to efficiently handle high-dimensional text data.It’s particularly effective with high-dimensional data and can handle large datasets efficiently. The algorithm’s simplicity, speed, and ability to work well with limited data make it a popular choice, especially when computational resources are a consideration in real-world applications. For Probabilistic Foundation, NB leverages Bayes’ theorem, calculating the probability of a text belonging to a particular class based on the individual probabilities of its constituent words appearing in that class.

Naivety Assumption: The “naive” aspect lies in its assumption that word occurrences are independent of each other within a class. While this assumption rarely holds perfectly true, it surprisingly leads to surprisingly strong performance in many real-world scenarios.
Flexibility: NB works well with both multinomial and Bernoulli word representations, adapting to different text characteristics. Multinomial captures word frequency within a document, while Bernoulli considers mere presence or absence.
NB requires minimal feature engineering and training time, making it ideal for applications requiring fast predictions and quick adaptation to new data.

Support Vector Machines (SVM)

Support Vector Machines are a powerful supervised learning algorithm excels at distinguishing between different text categories, making it valuable for tasks like sentiment analysis, topic labeling, and spam detection. At its heart, SVM aims to find the optimal hyperplane a decision boundary within a high-dimensional space that cleanly separates different text classes. Imagine plotting each text document as a point based on its extracted features (e.g., word presence, frequency). SVM seeks the hyperplane that maximizes the margin between these classes, ensuring clear distinction even for unseen data.

The SVM model is trained on labeled data, where each document belongs to a specific category. The model learns the optimal hyperplane that best separates these categories in the feature space.For validating, based on their feature vectors, the model predicts the class they belong to by placing them on the appropriate side of the hyperplane.

image

While SVMs work with linear hyperplanes by default, the ‘kernel trick’ allows them to handle non-linear relationships between features. This is crucial for text, where complex semantic relationships exist between words.
SVMs often exhibit high accuracy on text classification tasks, for smaller datasets. They can effectively handle sparse data inherent in text, where many features might be absent in individual documents.

Naive Bayes and SVM for Text Classification

Criteria	Naive Bayes	Support Vector Machine
Advantages	Simple and easy to implement. Computationally efficient. Works well with small datasets.	Effective in high-dimensional spaces. Robust to overfitting. Flexibility in choosing kernel functions. Can capture complex relationships.
Efficiency	Fast training and prediction.	Training can be computationally expensive. Slower training but faster prediction.
Performance	Good for simple classification tasks. Can handle noisy data well.	Better performance in complex tasks. Sensitive to noisy data, especially if it affects the positioning of the decision boundary.
Scalability	Scales well with large datasets and features.	Less scalable with large datasets. Memory-intensive for large feature spaces.
Interpretability	Provides straightforward interpretability. Directly calculates class probabilities.	Provides less interpretability. Decision boundaries are harder to interpret. Provides little insight into feature importance.
Robustness	Sensitive to feature distribution. Can be sensitive to violations of independence assumption.	More robust to outliers and noise.
Limitations	Feature dependence challenges validity, impacting Naive Bayes’ performance. Naive Bayes’ simplicity compromise accuracy on intricate relationships. Feature distribution deviations impair Naive Bayes’ performance assumptions.	SVM training demands significant computational resources for large datasets. SVM success relies on precise tuning of kernel and regularization. SVMs lack interpretability, especially in text classification with numerous features.

Naive Bayes and SVM: Python Implementation

Let’s perform text classification with Naive Bayes and Support Vector Machines (SVM) using Python and scikit-learn. For these, I’ll use the popular 20 Newsgroups dataset, which consists of newsgroup documents categorized into 20 different topics. First we’ll have to import necessary libraries.

Here, we import necessary libraries:

fetch_20newsgroups: To load the 20 Newsgroups dataset.
TfidfVectorizer: To convert text data into TF-IDF feature vectors.
train_test_split: To split the dataset into training and testing sets.
MultinomialNB: Naïve Bayes classifier implementation.
SVC: Support Vector Machine classifier implementation.
classification_report: To generate a classification report containing various evaluation metrics.

Python3

from sklearn.datasets import fetch_20newsgroups

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import MultinomialNB

from sklearn.svm import SVC

from sklearn.metrics import classification_report

Step 2 Loading Dataset:

We specify the categories of newsgroups we want to include in our dataset. Then, we load the training and testing subsets of the 20 Newsgroups dataset containing documents from these categories.

categories = [‘alt.atheism’, ‘soc.religion.christian’, ‘comp.graphics’, ‘sci.med’]: Defining the categories of newsgroups that we want to include in our dataset.
newsgroups_train = fetch_20newsgroups(subset=’train’, categories=categories): Loading the training subset of the 20 Newsgroups dataset, including only the specified categories.
newsgroups_test = fetch_20newsgroups(subset=’test’, categories=categories): Loading the testing subset of the 20 Newsgroups dataset, including only the specified categories.
print(“Sample Document:”, newsgroups_train.data[0]): Printing a sample document from the training set to see what the data looks like.
print(“Label:”, newsgroups_train.target_names[newsgroups_train.target[0]]): Printing the label (category) of the sample document.

Python3

# Load the 20 Newsgroups dataset

categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']

newsgroups_train = fetch_20newsgroups(subset='train', categories=categories)

newsgroups_test = fetch_20newsgroups(subset='test', categories=categories)
 
# Sample data from the dataset

print("Sample Document:", newsgroups_train.data[0])

print("Label:", newsgroups_train.target_names[newsgroups_train.target[0]])

Output:

Sample Document: From: sd345@city.ac.uk (Michael Collier)
Subject: Converting images to HP LaserJet III?
Nntp-Posting-Host: hampton
Organization: The City University
Lines: 14
Does anyone know of a good way (standard PC application/PD utility) to
convert tif/img/tga files into LaserJet III format.  We would also like to
do the same, converting to HPGL (HP plotter) files.
Please email any response.
Thank you,
- Michael.
Label: comp.graphics

Step 3 Feature Extraction:

We initialize a TF-IDF vectorizer and use it to transform the text data into TF-IDF feature vectors. X_train and X_test contain the feature vectors for the training and testing data, respectively. y_train and y_test contain the corresponding target labels.

tfidf_vectorizer = TfidfVectorizer(): Initializing a TF-IDF vectorizer object without any custom parameters.
X_train = tfidf_vectorizer.fit_transform(newsgroups_train.data): Transforming the raw text data from the training set into TF-IDF features and storing it in the variable X_train.
X_test = tfidf_vectorizer.transform(newsgroups_test.data): Transforming the raw text data from the testing set into TF-IDF features using the same vectorizer fitted to the training data, and storing it in the variable X_test.
y_train = newsgroups_train.target: Storing the target labels (categories) of the training set in the variable y_train.
y_test = newsgroups_test.target: Storing the target labels (categories) of the testing set in the variable y_test.

Python3

tfidf_vectorizer = TfidfVectorizer()

X_train = tfidf_vectorizer.fit_transform(newsgroups_train.data)

X_test = tfidf_vectorizer.transform(newsgroups_test.data)

y_train = newsgroups_train.target

y_test = newsgroups_test.target

Step 4 Training Classifiers:

We instantiate Multinomial Naïve Bayes and SVM classifiers and train them using the training data (X_train, y_train).

nb_classifier = MultinomialNB(): Initializing a Naïve Bayes classifier object of the MultinomialNB class.
nb_classifier.fit(X_train, y_train): Training the Naïve Bayes classifier using the TF-IDF features (X_train) and the corresponding target labels (y_train).
svm_classifier = SVC(kernel=’linear’): Initializing an SVM classifier object of the SVC class with a linear kernel.
svm_classifier.fit(X_train, y_train): Training the SVM classifier using the TF-IDF features (X_train) and the corresponding target labels (y_train).

Python3

# Train Naïve Bayes classifier

nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)
 
# Train SVM classifier

svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

Step 5 Model Evaluation and Prediction:

We use the trained classifiers to make predictions on the testing data. We print classification reports containing various evaluation metrics such as precision, recall, and F1-score for both Naïve Bayes and SVM classifiers using the classification_report function.

nb_predictions = nb_classifier.predict(X_test): Making predictions on the testing data using the trained Naïve Bayes classifier and storing the predictions in the variable nb_predictions.
svm_predictions = svm_classifier.predict(X_test): Making predictions on the testing data using the trained SVM classifier and storing the predictions in the variable svm_predictions.
print(classification_report(y_test, nb_predictions, target_names=newsgroups_test.target_names)): Printing the classification report for the Naïve Bayes classifier, which includes precision, recall, F1-score, and support for each class.
print(classification_report(y_test, svm_predictions, target_names=newsgroups_test.target_names)): Printing the classification report for the SVM classifier, similar to the one printed for the Naïve Bayes classifier.

Python3

# Evaluate classifiers

nb_predictions = nb_classifier.predict(X_test)

svm_predictions = svm_classifier.predict(X_test)
 
# Print classification reports

print("Naïve Bayes Classification Report:")

print(classification_report(y_test, nb_predictions, target_names=newsgroups_test.target_names))
 
print("\nSVM Classification Report:")

print(classification_report(y_test, svm_predictions, target_names=newsgroups_test.target_names))

Output:

Naïve Bayes Classification Report:
                        precision    recall  f1-score   support
           alt.atheism       0.97      0.60      0.74       319
         comp.graphics       0.96      0.89      0.92       389
               sci.med       0.97      0.81      0.88       396
soc.religion.christian       0.65      0.99      0.78       398
              accuracy                           0.83      1502
             macro avg       0.89      0.82      0.83      1502
          weighted avg       0.88      0.83      0.84      1502
SVM Classification Report:
                        precision    recall  f1-score   support
           alt.atheism       0.96      0.83      0.89       319
         comp.graphics       0.90      0.96      0.93       389
               sci.med       0.94      0.91      0.93       396
soc.religion.christian       0.89      0.96      0.93       398
              accuracy                           0.92      1502
             macro avg       0.93      0.92      0.92      1502
          weighted avg       0.92      0.92      0.92      1502

Naive Bayes:

Achieves an accuracy of 83%, indicating that 83% of the documents were correctly classified across all categories.
Shows strong performance in most categories, except for alt.atheism, where it struggles with lower recall (60%). Recall measures the ratio of correctly predicted positive observations to the all observations in the actual class.
F1-scores for comp.graphics, sci.med, and soc.religion.christian are relatively high, indicating a balance between precision and recall for these categories. The F1-score is the weighted average of precision and recall, where an F1 score reaches its best value at 1 and worst at 0.

SVM:

Performs impressively with high accuracy (92%).
Demonstrates consistently high precision, recall, and F1-scores across all categories.
Outperforms Naïve Bayes in all aspects, showcasing superior classification capability.

The output presents classification reports for Naive Bayes and SVM classifiers applied to the 20 Newsgroups dataset. Both classifiers perform well, with SVM achieving higher accuracy and F1-scores across categories. However, Naive Bayes exhibits slightly lower performance, particularly in categories like alt.atheism.

Conclusion

Both Naive Bayes and SVM are popular choices for text classification tasks, each with its own set of advantages and limitations. Naive Bayes is simple, efficient, and performs well under certain conditions, particularly with small datasets and when the feature independence assumption holds true.

On the other hand, SVMs offer better performance in complex classification tasks with high-dimensional feature spaces, albeit with higher computational complexity and less interpretability.

The choice between Naïve Bayes and SVM ultimately depends on the specific characteristics of the dataset, the complexity of the classification task, and computational considerations.

Article Tags :

AI-ML-DS

Machine Learning