Open In App

Calculating Sum Of Squared Deviations In R

Last Updated : 07 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Statistics plays an important role in data handling and analysis. Many such concepts are used to understand the nature of data, one of which is the Sum of Squared Deviations. It is a fundamental quantity in stats that helps in understanding the variability in our dataset.

In this article, we will understand how to calculate SSD mathematically and in R Programming Language.

Understanding Sum of Squared Deviations

The Sum of Squared Deviation measures the deviation of data points from the mean data point. This helps us understand the dispersion of our data points. The mathematical formula to calculate SSD is:

SSD= ∑ni=1 (xi -xmean )2

where:

  • SSD: sum of squared deviations
  • n: number of data points in the dataset
  • xi: represents each individual data point. These points are the individual observations in our dataset.
  • xmean: mean data point of the dataset. This is the average of the data points. The formula is given by:
    xmean= 1/n∑ i=1nxi
  • Deviation from the Mean (xi –xmean): This is the deviation of individual data points from the mean point. This explains how far the data point is from the mean point.
  • Squared Deviations ((xi -xmean )2): We square the deviation so that we can cancel out negative and positive deviations.

Interpreting SSD

  • High SSD : High SSD means that the data points are widely spread and there is variability in the dataset. The dataset is diverse.
  • Low SSD: Low SSD means the data points are close to the mean of the dataset. This suggest that the dataset is consistent.

Need to calculate SSD

There are many objectives behind calculating SSD. They are mentioned below:

  • Quantifying Variability: As SSD measures how data points are deviated from the mean value this helps in analysing the dispersion of the data points.
  • Assessing Data Spread: SSD helps in understanding the spread of data points.
  • Identifying Outliers: It also helps in identifying outliers and managing them so that they do not alter the prediction and reduce accuracy
  • Evaluating Model Fit: It helps in checking model accuracy.
  • Statistical Hypothesis Testing: It helps in managing the hypothesis testing whether we should reject it or accept it.

Verification of SSD using R and mathematics

Consider a simplified dataset: [2,4,4,4,5]. We will calculate SSD mathematically as well as using R language. Firstly, we need to calculate the mean.

  • xmean = 2+4+4+4+5/5= 19/5=3.8
  • SSD= (2-3.8)2 + (4- 3.8)2 + (4-3.8)2 +(4-3.8)2 + (5-3.8)2 = 4.8

We can also verify it using R:

R




# Manually create the dataset
example_data <- c(2, 4, 4, 4, 5)
 
# Calculate the mean
mean_example <- mean(example_data)
 
# Calculate the sum of squared deviations
ssd_example <- sum((example_data - mean_example)^2)
 
# Print the result
print(paste("Sum of Squared Deviations (SSD) for example data:", ssd_example))


Output:

[1] "Sum of Squared Deviations (SSD) for example data: 4.8"

Calculating SSD of Temperature

In this example, we will use multiple ways to calculate SSD of a fictional dataset. This dataset represents daily temperature of a certain city over a month.

R




# Creating a fictional dataset for daily temperatures
set.seed(123)  # For reproducibility
days <- 1:365  # Day numbers for a year
temperatures <- rep(75, 365) + rnorm(365, mean = 0, sd = 5) 
temperature_data <- data.frame(Day = days, Temperature = temperatures)
 
# Displaying the dataset
print(head(temperature_data))


Output:

 Day Temperature
1   1    72.19762
2   2    73.84911
3   3    82.79354
4   4    75.35254
5   5    75.64644
6   6    83.57532

Calculating SSD using formula

Now to calculate SSD with the help of formula we will consider the following given code. We can calculate SSD directly but here we will also print the mean values.

R




# Calculate the mean of daily temperatures
mean_temperature <- mean(temperature_data$Temperature)
 
# Calculate the Sum of Squared Deviations (SSD)
ssd_temperature <- sum((temperature_data$Temperature - mean_temperature)^2)
 
# Print the results
print(paste("Mean Daily Temperature:", mean_temperature))
print(paste("Sum of Squared Deviations (SSD) for Daily Temperature:", ssd_temperature))


Output:

[1] "Mean Daily Temperature: 75.1593605854571"
[1] "Sum of Squared Deviations (SSD) for Daily Temperature: 8520.02456165882"

Calculating SSD using Matrix Algebra

We can calculate SSD using matrix algebra as well, it will give the same value.

R




ssd_matrix <- t(temperature_data$Temperature - mean(temperature_data$Temperature)) %*%
                (temperature_data$Temperature - mean(temperature_data$Temperature))
 
print(paste("Matrix Algebra SSD:", ssd_matrix))


Output:

[1] "Matrix Algebra SSD: 8520.02456165883"

Calculating SSD of mtcars dataset

We can calculate the SSD of the famous in-built dataset in R. This dataset contains information about different cars and their models.

R




# Load the mtcars dataset
data(mtcars)
 
# Calculate the mean of the dataset
mean_value <- mean(mtcars$mpg)
 
# Calculate the sum of squared deviations
ssd <- sum((mtcars$mpg - mean_value)^2)
 
# Print the result
print(paste("Sum of Squared Deviations (SSD):", ssd))


Output:

[1] "Sum of Squared Deviations (SSD): 1126.0471875"

Here, we calculated the SSD value for mpg column present in our dataset. We can also visualize these points on a scatter plot using ggplot2 library.

R




# Create a scatter plot with the SSD highlighted
ggplot(mtcars, aes(x = mpg, y = (mpg - mean_value)^2)) +
  geom_point(color = "blue", size = 3) +
  geom_hline(yintercept = ssd, linetype = "dashed", color = "red", linewidth = 1) +
  labs(title = "Sum of Squared Deviations in mtcars Dataset",
       x = "mpg",
       y = "Squared Deviations from Mean") +
  theme_minimal()


Output:

gh

Calculating Sum Of Squared Deviations In R

Conclusion

In this article, we calculated SSD using different datasets and we also verified it mathematically.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads