Related Articles

Related Articles

Least Square Regression Line
  • Difficulty Level : Medium
  • Last Updated : 17 Dec, 2020

Given a set of coordinates in the form of (X, Y), the task is to find the least regression line that can be formed.
 

In statistics, Linear Regression is a linear approach to model the relationship between a scalar response (or dependent variable), say Y, and one or more explanatory variables (or independent variables), say X. 
Regression Line: If our data shows a linear relationship between X and Y, then the straight line which best describes the relationship is the regression line. It is the straight line that covers the maximum points in the graph.

Examples: 
 

Input: X = [95, 85, 80, 70, 60] 
Y = [90, 80, 70, 65, 60] 
Output: Y = 5.685 + 0.863*X 
Explanation: 
The graph of the data given below is: 
X = [95, 85, 80, 70, 60] 
Y = [90, 80, 70, 65, 60] 
The regression line obtained is Y = 5.685 + 0.863*X 
 

regression line



The graph shows that the regression line is the line that covers the maximum of the points.
Input: X = [100, 95, 85, 80, 70, 60] 
Y = [90, 95, 80, 70, 65, 60] 
Output: Y = 4.007 + 0.89*X 
 

Approach: 
 

A regression line is given as Y = a + b*X where the formula of b and a are given as: 
b = (nΣ(xiyi) – Σ(xi)Σ(yi)) ÷ (nΣ(xi2)-Σ(xi)2
a = ȳ – b.x̄ 
where x̄ and ȳ are mean of x and y respectively. 
 

  1. To find regression line, we need to find a and b.
  2. Calculate a, which is given by a = (\sum yi)/n - b * (\sum xi)/n
  3. Calculate b, which is given by 
     b = (n*\sum(xi*yi) - \sum (xi)* \sum (yi))/(n*\sum (xi)^{2}-(\sum xi)^{2})
  4. Put value of a and b in the equation of regression line.

Below is the implementation of the above approach.
 

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program to find the
// regression line
#include<bits/stdc++.h>
using namespace std;
 
// Function to calculate b
double calculateB(int x[], int y[], int n)
{
     
    // sum of array x
    int sx = accumulate(x, x + n, 0);
 
    // sum of array y
    int sy = accumulate(y, y + n, 0);
 
    // for sum of product of x and y
    int sxsy = 0;
 
    // sum of square of x
    int sx2 = 0;
    for(int i = 0; i < n; i++)
    {
        sxsy += x[i] * y[i];
         sx2 += x[i] * x[i];
    }
    double b = (double)(n * sxsy - sx * sy) /
                       (n * sx2 - sx * sx);
 
    return b;
}
 
// Function to find the
// least regression line
void leastRegLine( int X[], int Y[], int n)
{
 
    // Finding b
    double b = calculateB(X, Y, n);
 
    int meanX = accumulate(X, X + n, 0) / n;
    int meanY = accumulate(Y, Y + n, 0) / n;
 
    // Calculating a
    double a = meanY - b * meanX;
 
    // Printing regression line
    cout << ("Regression line:") << endl;
    cout << ("Y = ");
    printf("%.3f + ", a);
    printf("%.3f *X", b);
}
 
// Driver code
int main()
{
     
    // Statistical data
    int X[] = { 95, 85, 80, 70, 60 };
    int Y[] = { 90, 80, 70, 65, 60 };
     
    int n = sizeof(X) / sizeof(X[0]);
     
    leastRegLine(X, Y, n);
}
 
// This code is contributed by PrinciRaj1992

chevron_right


Java

filter_none

edit
close

play_arrow

link
brightness_4
code

// Java program to find the
// regression line
 
import java.util.Arrays;
 
public class GFG {
 
    // Function to calculate b
    private static double calculateB(
        int[] x, int[] y)
    {
        int n = x.length;
 
        // sum of array x
        int sx = Arrays.stream(x).sum();
 
        // sum of array y
        int sy = Arrays.stream(y).sum();
 
        // for sum of product of x and y
        int sxsy = 0;
 
        // sum of square of x
        int sx2 = 0;
        for (int i = 0; i < n; i++) {
            sxsy += x[i] * y[i];
            sx2 += x[i] * x[i];
        }
        double b = (double)(n * sxsy - sx * sy)
                   / (n * sx2 - sx * sx);
 
        return b;
    }
 
    // Function to find the
    // least regression line
    public static void leastRegLine(
        int X[], int Y[])
    {
 
        // Finding b
        double b = calculateB(X, Y);
 
        int n = X.length;
        int meanX = Arrays.stream(X).sum() / n;
        int meanY = Arrays.stream(Y).sum() / n;
 
        // calculating a
        double a = meanY - b * meanX;
 
        // Printing regression line
        System.out.println("Regression line:");
        System.out.print("Y = ");
        System.out.printf("%.3f", a);
        System.out.print(" + ");
        System.out.printf("%.3f", b);
        System.out.print("*X");
    }
 
    // Driver code
    public static void main(String[] args)
    {
        // statistical data
        int X[] = { 95, 85, 80, 70, 60 };
        int Y[] = { 90, 80, 70, 65, 60 };
 
        leastRegLine(X, Y);
    }
}

chevron_right


Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python program to find the
# regression line
 
# Function to calculate b
def calculateB(x, y, n):
   
    # sum of array x
    sx = sum(x)
 
    # sum of array y
    sy = sum(y)
     
    # for sum of product of x and y
    sxsy = 0
 
    # sum of square of x
    sx2 = 0
 
    for i in range(n):
        sxsy += x[i] * y[i]
        sx2 += x[i] * x[i]
    b = (n * sxsy - sx * sy)/(n * sx2 - sx * sx)
    return b
 
# Function to find the
# least regression line
def leastRegLine(X,Y,n):
     
    # Finding b
    b = calculateB(X, Y, n)
    meanX = int(sum(X)/n)
    meanY = int(sum(Y)/n)
 
    # Calculating a
    a = meanY - b * meanX
 
    # Printing regression line
    print("Regression line:")
    print("Y = ", '%.3f'%a, " + ", '%.3f'%b, "*X", sep="")
 
# Driver code
 
# Statistical data
X = [95, 85, 80, 70, 60 ]
Y = [90, 80, 70, 65, 60 ]
n = len(X)
leastRegLine(X, Y, n)
 
# This code is contributed by avanitrachhadiya2155

chevron_right


C#

filter_none

edit
close

play_arrow

link
brightness_4
code

// C# program to find the
// regression line
using System;
using System.Linq;
 
class GFG{
 
// Function to calculate b
private static double calculateB(int[] x,
                                 int[] y)
{
    int n = x.Length;
 
    // Sum of array x
    int sx = x.Sum();
 
    // Sum of array y
    int sy = y.Sum();
 
    // For sum of product of x and y
    int sxsy = 0;
 
    // Sum of square of x
    int sx2 = 0;
    for(int i = 0; i < n; i++)
    {
        sxsy += x[i] * y[i];
         sx2 += x[i] * x[i];
    }
    double b = (double)(n * sxsy - sx * sy) /
                       (n * sx2 - sx * sx);
 
    return b;
}
 
// Function to find the
// least regression line
public static void leastRegLine(int []X, int []Y)
{
     
    // Finding b
    double b = calculateB(X, Y);
 
    int n = X.Length;
    int meanX = X.Sum() / n;
    int meanY = Y.Sum() / n;
 
    // Calculating a
    double a = meanY - b * meanX;
 
    // Printing regression line
    Console.WriteLine("Regression line:");
    Console.Write("Y = ");
    Console.Write("{0:F3}",a );
    Console.Write(" + ");
    Console.Write("{0:F3}", b);
    Console.Write("*X");
}
 
// Driver code
public static void Main(String[] args)
{
     
    // Statistical data
    int []X = { 95, 85, 80, 70, 60 };
    int []Y = { 90, 80, 70, 65, 60 };
 
    leastRegLine(X, Y);
}
}
 
// This code is contributed by gauravrajput1

chevron_right


Output: 

Regression line:
Y = 5.685 + 0.863*X

 

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready.




My Personal Notes arrow_drop_up
Recommended Articles
Page :