Open In App

Implementation of Polynomial Regression

Polynomial Regression is a form of linear regression in which the relationship between the independent variable x and dependent variable y is modelled as an nth-degree polynomial. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y | x). In this article, we’ll go in-depth about polynomial regression.

What is a Polynomial Regression? 

Why Polynomial Regression?

Polynomial regression is a type of regression analysis used in statistics and machine learning when the relationship between the independent variable (input) and the dependent variable (output) is not linear. While simple linear regression models the relationship as a straight line, polynomial regression allows for more flexibility by fitting a polynomial equation to the data.

When the relationship between the variables is better represented by a curve rather than a straight line, polynomial regression can capture the non-linear patterns in the data.



How does a Polynomial Regression work?

If we observe closely then we will realize that to evolve from linear regression to polynomial regression. We are just supposed to add the higher-order terms of the dependent features in the feature space. This is sometimes also known as feature engineering but not exactly.

When the relationship is non-linear, a polynomial regression model introduces higher-degree polynomial terms.

The general form of a polynomial regression equation of degree n is:

where,

The basic goal of regression analysis is to model the expected value of a dependent variable y in terms of the value of an independent variable x. In simple linear regression, we used the following equation – 

y = a + bx + e

Here y is a dependent variable, a is the y-intercept, b is the slope and e is the error rate. In many cases, this linear model will not work out For example if we analyze the production of chemical synthesis in terms of the temperature at which the synthesis takes place in such cases we use a quadratic model.

Here,

In general, we can model it for the nth value. 

Since the regression function is linear in terms of unknown variables, hence these models are linear from the point of estimation. Hence through the Least Square technique, response value (y) can be computed.

By including higher-degree terms (quadratic, cubic, etc.), the model can capture the non-linear patterns in the data.

  1. The choice of the polynomial degree (n) is a crucial aspect of polynomial regression. A higher degree allows the model to fit the training data more closely, but it may also lead to overfitting, especially if the degree is too high. Therefore, the degree should be chosen based on the complexity of the underlying relationship in the data.
  2. The polynomial regression model is trained to find the coefficients that minimize the difference between the predicted values and the actual values in the training data.
  3. Once the model is trained, it can be used to make predictions on new, unseen data. The polynomial equation captures the non-linear patterns observed in the training data, allowing the model to generalize to non-linear relationships.

Polynomial Regression Real-Life Example

Let’s consider a real-life example to illustrate the application of polynomial regression. Suppose you are working in the field of finance, and you are analyzing the relationship between the years of experience (in years) an employee has and their corresponding salary (in dollars). You suspect that the relationship might not be linear and that higher degrees of the polynomial might better capture the salary progression over time.

Years of Experience

Salary (in dollars)

1

50,000

2

55,000

3

65,000

4

80,000

5

110,000

6

150,000

7

200,000

Now, let’s apply polynomial regression to model the relationship between years of experience and salary. We’ll use a quadratic polynomial (degree 2) for this example.

The quadratic polynomial regression equation is:

Salary= ×Experience+​×Experience^2+

Now, to find the coefficients that minimize the difference between the predicted salaries and the actual salaries in the dataset we can use a method of least squares. The objective is to minimize the sum of squared differences between the predicted values and the actual values.

Polynomial Regression implementations using Python

To get the Dataset used for the analysis of Polynomial Regression, click here. Import the important libraries and the dataset we are using to perform Polynomial Regression. 

Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
 
# Importing the dataset
datas = pd.read_csv('data.csv')
datas

                    

Output:

First Five rows of the dataset

 Our feature variable that is X will contain the Column between 1st and the target variable that is y will contain the 2nd column. 

X = datas.iloc[:, 1:2].values
y = datas.iloc[:, 2].values

                    

Now let’s fit a linear regression model on the data at hand.

# Features and the target variables
X = datas.iloc[:, 1:2].values
y = datas.iloc[:, 2].values
 
# Fitting Linear Regression to the dataset
from sklearn.linear_model import LinearRegression
lin = LinearRegression()
 
lin.fit(X, y)

                    

Fitting the Polynomial Regression model on two components X and y. 

# Fitting Polynomial Regression to the dataset
from sklearn.preprocessing import PolynomialFeatures
 
poly = PolynomialFeatures(degree=4)
X_poly = poly.fit_transform(X)
 
poly.fit(X_poly, y)
lin2 = LinearRegression()
lin2.fit(X_poly, y)

                    

In this step, we are Visualising the Linear Regression results using a scatter plot.

# Visualising the Linear Regression results
plt.scatter(X, y, color='blue')
 
plt.plot(X, lin.predict(X), color='red')
plt.title('Linear Regression')
plt.xlabel('Temperature')
plt.ylabel('Pressure')
 
plt.show()

                    

Output:

Scatter plot of feature and the target variable.

Visualize the Polynomial Regression results using a scatter plot.

# Visualising the Polynomial Regression results
plt.scatter(X, y, color='blue')
 
plt.plot(X, lin2.predict(poly.fit_transform(X)),
         color='red')
plt.title('Polynomial Regression')
plt.xlabel('Temperature')
plt.ylabel('Pressure')
 
plt.show()

                    

Output:

Implementation of Polynomial Regression

Predict new results with both Linear and Polynomial Regression. Note that the input variable must be in a Numpy 2D array.

# Predicting a new result with Linear Regression
# after converting predict variable to 2D array
pred = 110.0
predarray = np.array([[pred]])
lin.predict(predarray)

                    

Output:

array([0.20675333])
# Predicting a new result with Polynomial Regression
# after converting predict variable to 2D array
pred2 = 110.0
pred2array = np.array([[pred2]])
lin2.predict(poly.fit_transform(pred2array))

                    

Output:

array([0.43295877])

Overfitting Vs Under-fitting

While dealing with the polynomial regression one thing that we face is the problem of overfitting this happens because while we increase the order of the polynomial regression to achieve better and better performance model gets overfit on the data and does not perform on the new data points.

Due to this reason only while using the polynomial regression, do we try to penalize the weights of the model to regularize the effect of the overfitting problem. Regularization techniques like Lasso regression and Ridge regression methodologies are used whenever we deal with a situation in which the model may overfit the data at hand.

Bias Vs Variance Tradeoff

This technique is the generalization of the approach that is used to avoid the problem of overfitting and underfitting. Here as well this technique helps us to avoid the problem of overfitting by helping us select the appropriate value for the degree of the polynomial we are trying to fit our data on. For example, this is achieved when after increasing the degree of polynomial after a certain level the gap between the training and the validation metrics starts increasing.

Application of Polynomial Regression

The reason behind the vast use cases of the polynomial regression is that approximately all of the real-world data is non-linear in nature and hence when we fit a non-linear model on the data or a curvilinear regression line then the results that we obtain are far better than what we can achieve with the standard linear regression. Some of the use cases of the Polynomial regression are as stated below: 

Advantages & Disadvantages of using Polynomial Regression

Advantages of using Polynomial Regression

Disadvantages of using Polynomial Regression 

Conclusion

Polynomial regression, a versatile tool, finds applications in diverse domains. While addressing non-linear relationships, it requires careful consideration of overfitting and model complexity.

Frequently Asked Questions(FAQs)

1.What is a real life example of polynomial regression?

Predicting car fuel efficiency based on engine power—capturing non-linear patterns in the relationship.

2.What is polynomial regression pipeline?

A sequence of data processing steps, like polynomial feature creation and linear regression, streamlining polynomial regression modeling.

3.What is polynomial regression Excel?

Excel can perform polynomial regression using the “LINEST” function or the “Trendline” feature in a scatter plot.

4.What is polynomial regression with example?

Modeling stock prices over time using a quadratic polynomial to capture potential non-linear trends in the data.

5.What is the purpose of polynomial regression?

Captures non-linear relationships in data, providing a flexible model for complex patterns beyond linear regression’s scope.



Article Tags :