ML – Decision Function

Last Updated : 18 May, 2022

Decision function is a method present in classifier{ SVC, Logistic Regression } class of sklearn machine learning framework. This method basically returns a Numpy array, In which each element represents whether a predicted sample for x_test by the classifier lies to the right or left side of the Hyperplane and also how far from the HyperPlane. It also tells us that how confidently each value predicted for x_test by the classifier is Positive ( large-magnitude Positive value ) or Negative ( large-magnitude Negative value). Math behind the Decision Function method: Let’s consider the SVM for linearly-separable binary class classification problem: Cost Function: $\left.\min _{\theta} C \sum_{i=1}^{m}\left[u^{(0)} \operatorname{cost}_{1}\left(\theta^{T} x^{(\theta)}\right)+\left(1-y^{(0)}\right) \operatorname{cost}_{0}\left(\theta^{T} x^{(0)}\right)\right]+\frac{1}{2} \sum_{i=1}^{n} \theta\right\}$ Hypothesis for this Linearly Separable Binary class classification: $h_{\theta}(x)=\theta^{T} x$ The optimization Algorithm minimizes the cost function to find the best value of the model parameter for the hypothesis such that: $\begin{array}{ll} \theta^{T} x^{(i)} \geq 1 & \text { if } y^{(i)}=1 \\ \theta^{T} x^{(i)} \leq-1 & \text { if } y^{(i)}=0 \end{array}$ What Actually happens when we pass a data instance to Decision Function method ? This data sample is substituted in this hypothesis whose model parameters have been found by minimizing the cost function and returns the value outputted by this hypothesis which would be >1 if actual output is 1 or <-1 if the actual output is 0. This returned value indeed represents on which side of the hyperplane and also how far from it the given data sample lie. Code: create our own data set and plot the input.

python3

# This code may not run on GFG IDE
# As required modules are not available.
 
# Create a simple data set
# Binary-Class Classification.
 
# Import Required Modules.
import matplotlib.pyplot as plt
import numpy as np
 
# Input Feature X.
x = np.array([[2, 1.5], [-2, -1], [-1, -1], [2, 1],
              [1, 5], [0.5, 0.5], [-2, 0.5]])
 
# Input Feature Y.
y = np.array([0, 0, 1, 1, 1, 1, 0])
 
# Training set Feature x_train.
x_train = np.array([[2, 1.5], [-2, -1], [-1, -1], [2, 1]])
 
# Training set Target Variable y_train.
y_train = np.array([0, 0, 1, 1])
 
# Test set Feature x_test.
x_test = np.array([[1, 5], [0.5, 0.5], [-2, 0.5]])
 
# Test set Target Variable y_test
y_test = np.array([1, 1, 0])
 
# Plot the obtained data
plt.scatter(x[:, 0], x[:, 1], c = y)
plt.xlabel('Feature 1 --->')
plt.ylabel('Feature 2 --->')
plt.title('Created Data')

Output: Code: train our model

python3

# This code may not run on GFG IDE
# As required modules are not available.
 
# Import SVM Class from sklearn.
from sklearn.svm import SVC
clf = SVC()
 
# Train the model on the training set.
clf.fit(x_train, y_train) 
 
# Predict on Test set
predict = clf.predict(x_test)
print('Predicted Values from Classifier:', predict)
print('Actual Output is:', y_test)
print('Accuracy of the model is:', clf.score(x_test, y_test))

Output:

Predicted Values from Classifier: [0 1 0]
Actual Output is: [1 1 0]
Accuracy of the model is: 0.6666666666666666

Code: decision function method

python3

# This code may not run on GFG IDE
# As required modules are not available.
 
# Using Decision Function Method Present in svc class
Decision_Function = clf.decision_function(x_test)
print('Output of Decision Function is:', Decision_Function)
print('Prediction for x_test from classifier is:', predict)

Output:

Output of Decision Function is: [-0.04274893  0.29143233 -0.13001369]
Prediction for x_test from classifier is: [0 1 0]

From the above output, we can conclude that the decision function output represents whether a predicted sample for x_test by the classifier lies to the right side or left side of hyperplane and also how far from it. It also tells us how confidently each value predicted for x_test by the classifier is Positive ( large-magnitude Positive value ) or Negative ( large-magnitude Negative value) Code: Decision Boundary

python3

# This code may not run on GFG IDE
# As required modules are not available.
 
# To Plot the Decision Boundary.
arr1 = np.arange(x[:, 0].min()-1, x[:, 0].max()+1, 0.01)
arr2 = np.arange(x[:, 1].min()-1, x[:, 1].max()+1, 0.01)
 
xx, yy = np.meshgrid(arr1, arr2)
input_array = np.array([xx.ravel(), yy.ravel()]).T
labels = clf.predict(input_array)
 
plt.figure(figsize =(10, 7))
plt.contourf(xx, yy, labels.reshape(xx.shape), alpha = 0.1)
plt.scatter(x_test[:, 0], x_test[:, 1], c = y_test.ravel(), alpha = 1)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Decision Boundary')

Let’s Visualize the above conclusion. The advantage of Decision Function output is to set DECISION THRESHOLD and predict a new output for x_test, such that we get desired precision or recall value If our project is precision-oriented or recall-oriented respectively.