Open In App

Orthogonal distance regression using SciPy

Regression basically involves determining the relationship between a dependent variable and one or more independent variables. It generally involves finding the best fit line that minimizes the sum of squared errors for each point. Based on the implementation procedures, regression algorithms are classified as linear regression, ridge regression, lasso regression, polynomial regression, and so on. In this article let us discuss Orthogonal distance regression and see the practical implementation of Orthogonal distance regression in Scipy. 

Orthogonal regression

Unlike normal regression problems where errors are measured vertically with respect to the fitted line, orthogonal distance regression involves in calculating the orthogonal distance of the points with respect to the fitted line, this allows taking into account the errors in measurements for both independent and dependent variables along the x and y-axis, as shown in the figure. This way of calculating the perpendicular distance adds more robustness to the model. Orthogonal Distance regression minimizes the sum of squared perpendicular distances, unlike the sum of least squared distances.



Orthogonal regression is generally applied when both Y and X are susceptible to error and can also be applied to the transformable non-linear model. Orthogonal regression assumes that there is a linear relationship between the true values of the dependent and independent variables. The observed values of Y and X have a small error added to them. Given n pairs of measured values, Orthogonal regression involves finding a line that minimized the below equation.



Here, ε, μ are errors in the measure value σ denotes the variance of errors.

The orthogonal distance regression is implemented using ODRPACK which is a FORTRAN – 77 based library. The scipy.odr package provides an OOPS interface to ODRPACK.

Approach

Code:




# import the necessary python packages
import numpy as np
import matplotlib.pyplot as plt
 
# odr function from scipy package
# is used to perform ODR regression
from scipy import odr 
import random as r
 
# Create a sample feature array and a target array
feature = np.array(np.arange(1, 11))
# shuffle the created array
np.random.shuffle(feature)
# create a target array of random numbers
target = np.array([0.65, -.75, 0.90, -0.5, 0.14,
                   0.84, 0.99, -0.95, 0.41, -0.28])
 
# Define a function (quadratic in our case)
# to fit the data with.
# odr initially assumes a linear function
def target_function(p, x):
    m, c = p
    return m*x + c
 
#  model fitting.
odr_model = odr.Model(target_function)
 
# Create a Data object using sample data created.
data = odr.Data(feature, target)
 
# Set ODR with the model and data.
ordinal_distance_reg = odr.ODR(data, odr_model,
                               beta0=[0.2, 1.])
 
# Run the regression.
out = ordinal_distance_reg.run()
 
# print the results
out.pprint()

Output:

Beta:                    [-0.01059931  0.2032962 ]

Beta Std Error:        [0.08421527 0.52254163]

Beta Covariance:       [[ 0.01212265 -0.06667458]

                        [-0.06667458  0.46672142]]

Residual Variance:      0.5850379776588954

Inverse Condition #:    0.06924525890982118

Reason(s) for Halting:
  Sum of squares convergence

The odr algorithm will return the beta value, std error, and covariance of beta values which can be used to fit the regression line. 


Article Tags :