Least Square Regression Line

Given a set of coordinates in the form of (X, Y), the task is to find the least regression line that can be formed.
 

In statistics, Linear Regression is a linear approach to model the relationship between a scalar response (or dependent variable), say Y, and one or more explanatory variables (or independent variables), say X. 
Regression Line: If our data shows a linear relationship between X and Y, then the straight line which best describes the relationship is the regression line. It is the straight line that covers the maximum points in the graph.

Examples: 

Input: X = [95, 85, 80, 70, 60] 
Y = [90, 80, 70, 65, 60] 
Output: Y = 5.685 + 0.863*X 
Explanation: 
The graph of the data given below is: 
X = [95, 85, 80, 70, 60] 
Y = [90, 80, 70, 65, 60] 
The regression line obtained is Y = 5.685 + 0.863*X 
 

regression line



The graph shows that the regression line is the line that covers the maximum of the points.

Input: X = [100, 95, 85, 80, 70, 60] 
Y = [90, 95, 80, 70, 65, 60] 
Output: Y = 4.007 + 0.89*X 
 

Approach: 

A regression line is given as Y = a + b*X where the formula of b and a are given as:
b = (nΣ(xiyi) – Σ(xi)Σ(yi)) ÷ (nΣ(xi2)-Σ(xi)2)
a = ȳ – b.x̄

where x̄ and ȳ are mean of x and y respectively. 
 

  1. To find regression line, we need to find a and b.
  2. Calculate a, which is given by a = (\sum yi)/n - b * (\sum xi)/n
  3. Calculate b, which is given by 
     b = (n*\sum(xi*yi) - \sum (xi)* \sum (yi))/(n*\sum (xi)^{2}-(\sum xi)^{2})
  4. Put value of a and b in the equation of regression line.

Below is the implementation of the above approach.

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program to find the
// regression line
#include<bits/stdc++.h>
using namespace std;
  
// Function to calculate b
double calculateB(int x[], int y[], int n)
{
      
    // sum of array x
    int sx = accumulate(x, x + n, 0);
  
    // sum of array y
    int sy = accumulate(y, y + n, 0);
  
    // for sum of product of x and y
    int sxsy = 0;
  
    // sum of square of x
    int sx2 = 0;
    for(int i = 0; i < n; i++) 
    {
        sxsy += x[i] * y[i];
         sx2 += x[i] * x[i];
    }
    double b = (double)(n * sxsy - sx * sy) /
                       (n * sx2 - sx * sx);
  
    return b;
}
  
// Function to find the
// least regression line
void leastRegLine( int X[], int Y[], int n)
{
  
    // Finding b
    double b = calculateB(X, Y, n);
  
    int meanX = accumulate(X, X + n, 0) / n;
    int meanY = accumulate(Y, Y + n, 0) / n;
  
    // Calculating a
    double a = meanY - b * meanX;
  
    // Printing regression line
    cout << ("Regression line:") << endl;
    cout << ("Y = ");
    printf("%.3f + ", a);
    printf("%.3f *X", b);
}
  
// Driver code
int main()
{
      
    // Statistical data
    int X[] = { 95, 85, 80, 70, 60 };
    int Y[] = { 90, 80, 70, 65, 60 };
      
    int n = sizeof(X) / sizeof(X[0]);
      
    leastRegLine(X, Y, n);
}
  
// This code is contributed by PrinciRaj1992 

chevron_right


Java

filter_none

edit
close

play_arrow

link
brightness_4
code

// Java program to find the
// regression line
  
import java.util.Arrays;
  
public class GFG {
  
    // Function to calculate b
    private static double calculateB(
        int[] x, int[] y)
    {
        int n = x.length;
  
        // sum of array x
        int sx = Arrays.stream(x).sum();
  
        // sum of array y
        int sy = Arrays.stream(y).sum();
  
        // for sum of product of x and y
        int sxsy = 0;
  
        // sum of square of x
        int sx2 = 0;
        for (int i = 0; i < n; i++) {
            sxsy += x[i] * y[i];
            sx2 += x[i] * x[i];
        }
        double b = (double)(n * sxsy - sx * sy)
                   / (n * sx2 - sx * sx);
  
        return b;
    }
  
    // Function to find the
    // least regression line
    public static void leastRegLine(
        int X[], int Y[])
    {
  
        // Finding b
        double b = calculateB(X, Y);
  
        int n = X.length;
        int meanX = Arrays.stream(X).sum() / n;
        int meanY = Arrays.stream(Y).sum() / n;
  
        // calculating a
        double a = meanY - b * meanX;
  
        // Printing regression line
        System.out.println("Regression line:");
        System.out.print("Y = ");
        System.out.printf("%.3f", a);
        System.out.print(" + ");
        System.out.printf("%.3f", b);
        System.out.print("*X");
    }
  
    // Driver code
    public static void main(String[] args)
    {
        // statistical data
        int X[] = { 95, 85, 80, 70, 60 };
        int Y[] = { 90, 80, 70, 65, 60 };
  
        leastRegLine(X, Y);
    }
}

chevron_right


C#

filter_none

edit
close

play_arrow

link
brightness_4
code

// C# program to find the
// regression line
using System;
using System.Linq;
  
class GFG{
  
// Function to calculate b
private static double calculateB(int[] x, 
                                 int[] y)
{
    int n = x.Length;
  
    // Sum of array x
    int sx = x.Sum();
  
    // Sum of array y
    int sy = y.Sum();
  
    // For sum of product of x and y
    int sxsy = 0;
  
    // Sum of square of x
    int sx2 = 0;
    for(int i = 0; i < n; i++)
    {
        sxsy += x[i] * y[i];
         sx2 += x[i] * x[i];
    }
    double b = (double)(n * sxsy - sx * sy) / 
                       (n * sx2 - sx * sx);
  
    return b;
}
  
// Function to find the
// least regression line
public static void leastRegLine(int []X, int []Y)
{
      
    // Finding b
    double b = calculateB(X, Y);
  
    int n = X.Length;
    int meanX = X.Sum() / n;
    int meanY = Y.Sum() / n;
  
    // Calculating a
    double a = meanY - b * meanX;
  
    // Printing regression line
    Console.WriteLine("Regression line:");
    Console.Write("Y = ");
    Console.Write("{0:F3}",a );
    Console.Write(" + ");
    Console.Write("{0:F3}", b);
    Console.Write("*X");
}
  
// Driver code
public static void Main(String[] args)
{
      
    // Statistical data
    int []X = { 95, 85, 80, 70, 60 };
    int []Y = { 90, 80, 70, 65, 60 };
  
    leastRegLine(X, Y);
}
}
  
// This code is contributed by gauravrajput1 

chevron_right


Output: 

Regression line:
Y = 5.685 + 0.863*X

 

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.



Article Tags :
Practice Tags :


4


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.