Represent a given set of points by the best possible straight line

4.4

Find the value of m and c such that a straight line y = mx + c, best represents the equation of a given set of points (x_1, y_1 ), (x_2, y_2 ), (x_3, y_3 ), ……., (x_n, y_n ), given n >=2.

Examples:

Input : n = 5
        x_1 = 1, x_2 = 2, x_3 = 3, 
        x_4 = 4, x_5 = 5
        y_1 = 14, y_2 = 27, y_3 = 40, 
        y_4 = 55, y_5 = 68   
Output : m = 13.6
        c = 0 
If we take any pair of number ( x_i, y_i ) 
from the given data, these value of m and c
should make it best fit into the equation 
for a straight line, y = mx + c. Take x_1 = 1 
and y_1 = 14, then using values
of m and c from the output, and putting it 
in the following equation,
y = mx + c,
L.H.S.: y = 14, R.H.S: mx + c = 13.6 x 1 + 0 = 13.6
So, they are approximately equal.
Now, take x_3 = 3 and y_3 = 40,
L.H.S.: y = 40, R.H.S: mx + c = 13.6 x 3 + 0 = 40.8
So, they are also approximately equal, and so on
for all other values.

Input : n = 6
        x_1 = 1, x_2 = 2, x_3 = 3, 
        x_4 = 4, x_5 = 5, x_6 = 6
        y_1 = 1200, y_2 = 900, y_3 = 600, 
        y_4 = 200, y_5 = 110, y_6 = 50
Output : m = -243.42
        c = 1361.97

Approach

To best fit a set of points in an equation for a straight line, we need to find the value of two variables, m and c. Now, since there are 2 unknown variables and depending upon the value of n, two cases are possible –
Case 1 – When n = 2 : There will be two equations and two unknown variables to find, so, there will be a unique solution .
Case 2 – When n > 2 : In this case, there may or may not exist values of m and c, which satisfy all the n equations, but we can find the best possible values of m and c which can fit a straight line in the given points .

So, if we have n different pairs of x and y, then, we can form n no. of equations from them for a straight line, as follows

f_1 = mx_1 + c,
f_2 = mx_2 + c,
f_3 = mx_3 + c,
......................................,
......................................,
f_n = mx_n + c,
where, f_i, is the value 
obtained by putting x_i in equation 
mx + c. 

Then, since ideally f_i should be same as y_i, but still we can find the f_i closest to y_i in all the cases, if we take a new quantity, U = Σ(y_i – f_i )^2, and make this quantity minimum for all value of i from 1 to n.
Note:(y_i – f_i )^2 is used in place of (y_i – f_i ), as we want to consider both the cases when f_i or when y_i is greater, and we want their difference to be minimum, so if we would not square the term, then situations in which f_i
is greater and situation in which y_i is greater will ancel each other to an extent, and this is not what we want. So, we need to square the term.

Now, for U to be minimum, it must satisfy the following two equations –

\frac{\partial U}{\partial m} = 0 and  
\frac{\partial U}{\partial c} = 0. 

On solving the above two equations, we get two equations, as follows :

Σy = nc + mΣx, and
Σxy = cΣx + mΣx^2, which can be rearranged as - 
m = (n * Σxy - ΣxΣy) / (n * Σx^2 - (Σx)^2), and
c = (Σy - mΣx) / n, 

So, this is how values of m and c for both the cases are obtained, and we can represent a given set of points, by the best possible straight line.
The following code implements the above given algorithm –

C

// C Program to find m and c for a straight line given,
// x and y
#include<stdio.h>

// function to calculate m and c that best fit points
// represented by x[] and y[]
void bestApproximate(int x[], int y[], int n)
{
    int i, j;
    float m, c, sum_x = 0, sum_y = 0, sum_xy = 0, sum_x2 = 0;
    for (i = 0; i < n; i++)
    {        
        sum_x += x[i];            
        sum_y += y[i];             
        sum_xy += x[i] * y[i];
        sum_x2 += (x[i] * x[i]);
    }

    m = (n * sum_xy - sum_x * sum_y)/(n * sum_x2 - (sum_x * sum_x));
    c = (sum_y - m * sum_x)/n;

    printf("m =% f", m);
    printf("\nc =% f", c);
}

// Driver main function
int main()
{
    int x[] = {1, 2, 3, 4, 5};
    int y[] = {14, 27, 40, 55, 68};
    int n = sizeof(x)/sizeof(x[0]);
    bestApproximate(x, y, n);
    return 0;
}

C++

// C++ Program to find m and c for a straight line given,
// x and y
#include<iostream>
#include<cmath>
using namespace std;

// function to calculate m and c that best fit points
// represented by x[] and y[]
void bestApproximate(int x[], int y[], int n)
{
    float m, c, sum_x = 0, sum_y = 0, sum_xy = 0, sum_x2 = 0;
    for (int i = 0; i < n; i++)
    {        
        sum_x += x[i];            
        sum_y += y[i];             
        sum_xy += x[i] * y[i];
        sum_x2 += pow(x[i], 2);
    }

    m = (n * sum_xy - sum_x * sum_y)/(n * sum_x2 - pow(sum_x, 2));
    c = (sum_y - m * sum_x)/n;

    cout << "m =" << m;
    cout << "\nc =" << c;
}

// Driver main function
int main()
{   
    int x[] = {1, 2, 3, 4, 5};
    int y[] = {14, 27, 40, 55, 68};
    int n = sizeof(x)/sizeof(x[0]);
    bestApproximate(x, y, n);
    return 0;
}

Java

// Java Program to find m and c for a straight line given,
// x and y
import java.io.*;
import static java.lang.Math.pow;

public class A
{
    // function to calculate m and c that best fit points
    // represented by x[] and y[]
    static void bestApproximate(int x[], int y[])
    {
        int n = x.length;
        double m, c, sum_x = 0, sum_y = 0,
        sum_xy = 0, sum_x2 = 0;
        for (int i = 0; i < n; i++)
        {
            sum_x += x[i];            
            sum_y += y[i];             
            sum_xy += x[i] * y[i];
            sum_x2 += pow(x[i], 2);
        }

        m = (n * sum_xy - sum_x * sum_y)/(n * sum_x2 - pow(sum_x, 2));
        c = (sum_y - m * sum_x)/n;

        System.out.println("m = " + m);
        System.out.println("c = " + c);
    }

    // Driver main function
    public static void main(String args[])
    {
        int x[] = {1, 2, 3, 4, 5};
        int y[] = {14, 27, 40, 55, 68};
        bestApproximate(x, y);        
    }        
}


Output:

m=13.6
c=0.0

Analysis of above code-
Auxiliary Space : O(1)
Time Complexity : O(n). We have one loop which iterates n times, and each time it performs constant no. of computations.

Reference-
1-Higher Engineering Mathematics by B.S. Grewal.

This article is contributed by Mrigendra Singh. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.

GATE CS Corner    Company Wise Coding Practice

Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.

Recommended Posts:



4.4 Average Difficulty : 4.4/5.0
Based on 5 vote(s)










Writing code in comment? Please use ide.geeksforgeeks.org, generate link and share the link here.