Maximum Margin Separating Hyperplane in Scikit Learn

Last Updated : 30 Jan, 2023

In sci-kit learn, the SVM (support vector machine) class provides a method for finding the MMSH. The SVM model is a supervised learning algorithm that can be used for both classification and regression tasks. When used for classification, the SVM model finds the MMSH that separates different classes of data points. The SVM algorithm works by mapping the data points to a higher-dimensional space, where a linear boundary can be found to separate the classes. The SVM then finds the optimal hyperplane that separates the classes in this higher-dimensional space and projects it back to the original space.

In sci-kit-learn, the SVM class has several options for kernel functions, which can be used to map the data points to a higher-dimensional space. The most commonly used kernel functions are the linear kernel, the polynomial kernel, and the radial basis function (RBF) kernel. The linear kernel is used when the data is linearly separable, the polynomial kernel is used when the data is not linearly separable, and the RBF kernel is used when the data is not linearly separable and the classes have different densities.

What is the Maximum Margin Separating Hyperplane

Maximum Margin Separating Hyperplane (MMSH) is a concept in machine learning that refers to a line (in 2D), a plane (in 3D), or a hyperplane (in higher dimensions) that separates different classes of data points with the largest possible margin. The margin is the distance between the hyperplane and the closest data points from each class, and the goal of MMSH is to find the hyperplane that maximizes this distance.

Example 1

The LinearSVC class also has a number of hyperparameters that you can adjust to control the behavior of the model. For example, you can use the C hyperparameter to control the regularization strength, which determines how much the model is allowed to overfit the training data.

Python3

from sklearn.svm import LinearSVC 
from sklearn.datasets import make_classification 
from sklearn.model_selection import train_test_split 
  
  
def load_data(): 
    # Load your data here 
    X, y = make_classification(n_samples=1000,  
                               n_features=4, random_state=42) 
  
    # Split the data into training and test sets 
    X_train, X_test, y_train, y_test = train_test_split( 
        X, y, test_size=0.2, random_state=42) 
  
    return X_train, X_test, y_train, y_test 
  
  
# Load the data and split it into training and test sets 
X_train, X_test, y_train, y_test = load_data() 
  
# Create the model 
model = LinearSVC() 
  
# Fit the model to the training data 
model.fit(X_train, y_train) 
  
# Evaluate the model on the test data 
accuracy = model.score(X_test, y_test) 
print("Test accuracy: {:.2f}".format(accuracy)) 

Output :

Example 2

In this example, we use the make_classification function to generate some synthetic data with 4 features and 1000 samples. We then split the data into training and test sets, and fit the model to the training data using the fit method. Finally, we evaluate the model on the test data using the scoring method, which returns the mean accuracy of the model.

Python3

from sklearn.svm import LinearSVC 
from sklearn.datasets import make_classification 
from sklearn.model_selection import train_test_split 
  
  
def load_data(): 
    # Load your data here 
    X, y = make_classification(n_samples=1000, 
                               n_features=4, random_state=42) 
  
    # Split the data into training and test sets 
    X_train, X_test, y_train, y_test = train_test_split( 
        X, y, test_size=0.2, random_state=42) 
  
    return X_train, X_test, y_train, y_test 
  
  
# Generate some synthetic data 
X, y = make_classification(n_samples=1000, 
                           n_features=4, random_state=42) 
  
# Split the data into training and test sets 
X_train, X_test, y_train, y_test = train_test_split( 
    X, y, test_size=0.2, random_state=42) 
  
# Create the model 
model = LinearSVC(random_state=42) 
  
# Fit the model to the training data 
model.fit(X_train, y_train) 
  
# Evaluate the model on the test data 
accuracy = model.score(X_test, y_test) 
print("Test accuracy: {:.2f}".format(accuracy)) 

Output :

Example 3

The SVM algorithm also has a regularization parameter, called C, which controls the trade-off between maximizing the margin and minimizing the misclassification error. A smaller value of C will result in a larger margin but may allow for more misclassifications, while a larger value of C will result in a smaller margin but fewer misclassifications.

Here, we use the load_iris function to load the iris dataset and split it into training and test sets. We then use GridSearchCV to perform a grid search over a range of values for the C hyperparameter. Finally, we evaluate the model on the test data using the score method, as before.

Python

from sklearn.svm import LinearSVC 
from sklearn.model_selection import GridSearchCV 
from sklearn.datasets import make_classification 
from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
  
  
def load_data(): 
    # Load your data here 
    X, y = make_classification(n_samples=1000, 
                         n_features=4, random_state=42) 
  
    # Split the data into training and test sets 
    X_train, X_test, y_train, y_test = train_test_split( 
        X, y, test_size=0.2, random_state=42) 
  
    return X_train, X_test, y_train, y_test 
  
  
# Load the iris dataset 
X, y = load_iris(return_X_y=True) 
  
# Split the data into training and test sets 
X_train, X_test, y_train, y_test = train_test_split( 
    X, y, test_size=0.2, random_state=42) 
  
# Create a parameter grid for grid search 
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100]} 
  
# Create the model 
model = LinearSVC(random_state=42) 
  
# Create the grid search object 
grid_search = GridSearchCV(model, param_grid, cv=5) 
  
# Fit the grid search object to the training data 
grid_search.fit(X_train, y_train) 
  
# Print the best parameters and score 
print("Best parameters:", grid_search.best_params_) 
print("Best score: {:.2f}".format(grid_search.best_score_)) 
  
# Evaluate the model on the test data 
accuracy = grid_search.score(X_test, y_test) 
print("Test accuracy: {:.2f}".format(accuracy)) 

Output :

In summary, Maximum Margin Separating Hyperplane (MMSH) is a concept in machine learning which is used to find the hyperplane that separates different classes of data points with the largest possible margin. Scikit-learn’s SVM class provides a method for finding MMSH. The SVM algorithm works by mapping the data points to a higher-dimensional space, where a linear boundary can be found to separate the classes and then projecting it back to the original space. The SVM algorithm also has a regularization parameter, called C, which controls the trade-off between maximizing the margin and minimizing the misclassification error. SVM class has several options for kernel functions, which can be used to map the data points to a higher-dimensional space.

Suggest improvement

Color Quantization using K-Means in Scikit Learn

Share your thoughts in the comments

Maximum Margin Separating Hyperplane in Scikit Learn

What is the Maximum Margin Separating Hyperplane

Example 1

Python3

Example 2

Python3

Example 3

Python

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?