ML | Locally weighted Linear Regression

Prerequisite: ML | Linear Regression

Linear regression is a supervised learning algorithm used for computing linear relationships between input (X) and output (Y).

The steps involved in ordinary linear regression are:

Training phase: Compute \theta to minimize the cost.
 J(\theta) = $\sum_{i=1}^{m} (\theta^Tx^{(i)} - y^{(i)})^2

Predict output: for given query point x,
 return: \theta^Tx




As evident from the image below, this algorithm cannot be used for making predictions when there exists a non-linear relationship between X and Y. In such cases, locally weighted linear regression is used.

Locally Weighted Linear Regression:

Locally weighted linear regression is a non-parametric algorithm, that is, the model does not learn a fixed set of parameters as is done in ordinary linear regression. Rather parameters \theta are computed individually for each query point x. While computing \theta, a higher “preference” is given to the points in the training set lying in the vicinity of x than the points lying far away from x.

The modified cost function is:  J(\theta) = $\sum_{i=1}^{m} w^{(i)}(\theta^Tx^{(i)} - y^{(i)})^2

where, w^{(i)} is a non-negative “weight” associated with training point x^{(i)}.
For x^{(i)}s lying closer to the query point x, the value of w^{(i)} is large, while for x^{(i)}s lying far away from x the value of w^{(i)} is small.
 
A typical choice of w^{(i)} is:  w^{(i)} = exp(\frac{-(x^{(i)} - x)^2}{2\tau^2})
where, \tau is called the bandwidth parameter and controls the rate at which w^{(i)} falls with distance from x

Clearly, if |x^{(i)} - x| is small w^{(i)} is close to 1 and if |x^{(i)} - x| is large w^{(i)} is close to 0.

Thus, the training-set-points lying closer to the query point x contribute more to the cost J(\theta) than the points lying far away from x.

For example –

Consider a query point x = 5.0 and let x^{(1)} and x^{(2) be two points in the training set such that x^{(1)} = 4.9 and x^{(2)} = 3.0.
Using the formula  w^{(i)} = exp(\frac{-(x^{(i)} - x)^2}{2\tau^2}) with \tau = 0.5:
 w^{(1)} = exp(\frac{-(4.9 - 5.0)^2}{2(0.5)^2}) = 0.9802
 w^{(2)} = exp(\frac{-(3.0 - 5.0)^2}{2(0.5)^2}) = 0.000335
 So, \ J(\theta) = 0.9802*(\theta^Tx^{(1)} - y^{(1)}) + 0.000335*(\theta^Tx^{(2)} - y^{(2)})
Thus, the weights fall exponentially as the distance between x and  x^{(i)} increases and so does the contribution of error in prediction for x^{(i)} to the cost.

Consequently, while computing \theta, we focus more on reducing (\theta^Tx^{(i)} - y^{(i)})^2 for the points lying closer to the query point (having larger value of w^{(i)}).

Steps involved in locally weighted linear regression are:

Compute \theta to minimize the cost.  J(\theta) = $\sum_{i=1}^{m} w^{(i)}(\theta^Tx^{(i)} - y^{(i)})^2
Predict Output: for given query point x,
 return: \theta^Tx

Points to remember:

  • Locally weighted linear regression is a supervised learning algorithm.
  • It a non-parametric algorithm.
  • There exists No training phase. All the work is done during the testing phase/while making predictions.


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.