Open In App

Calculating Sum Of Squared Deviations In R

Statistics plays an important role in data handling and analysis. Many such concepts are used to understand the nature of data, one of which is the Sum of Squared Deviations. It is a fundamental quantity in stats that helps in understanding the variability in our dataset.

In this article, we will understand how to calculate SSD mathematically and in R Programming Language.



Understanding Sum of Squared Deviations

The Sum of Squared Deviation measures the deviation of data points from the mean data point. This helps us understand the dispersion of our data points. The mathematical formula to calculate SSD is:

SSD= ∑ni=1 (xi -xmean )2

where:



Interpreting SSD

Need to calculate SSD

There are many objectives behind calculating SSD. They are mentioned below:

Verification of SSD using R and mathematics

Consider a simplified dataset: [2,4,4,4,5]. We will calculate SSD mathematically as well as using R language. Firstly, we need to calculate the mean.

We can also verify it using R:




# Manually create the dataset
example_data <- c(2, 4, 4, 4, 5)
 
# Calculate the mean
mean_example <- mean(example_data)
 
# Calculate the sum of squared deviations
ssd_example <- sum((example_data - mean_example)^2)
 
# Print the result
print(paste("Sum of Squared Deviations (SSD) for example data:", ssd_example))

Output:

[1] "Sum of Squared Deviations (SSD) for example data: 4.8"

Calculating SSD of Temperature

In this example, we will use multiple ways to calculate SSD of a fictional dataset. This dataset represents daily temperature of a certain city over a month.




# Creating a fictional dataset for daily temperatures
set.seed(123)  # For reproducibility
days <- 1:365  # Day numbers for a year
temperatures <- rep(75, 365) + rnorm(365, mean = 0, sd = 5) 
temperature_data <- data.frame(Day = days, Temperature = temperatures)
 
# Displaying the dataset
print(head(temperature_data))

Output:

 Day Temperature
1   1    72.19762
2   2    73.84911
3   3    82.79354
4   4    75.35254
5   5    75.64644
6   6    83.57532

Calculating SSD using formula

Now to calculate SSD with the help of formula we will consider the following given code. We can calculate SSD directly but here we will also print the mean values.




# Calculate the mean of daily temperatures
mean_temperature <- mean(temperature_data$Temperature)
 
# Calculate the Sum of Squared Deviations (SSD)
ssd_temperature <- sum((temperature_data$Temperature - mean_temperature)^2)
 
# Print the results
print(paste("Mean Daily Temperature:", mean_temperature))
print(paste("Sum of Squared Deviations (SSD) for Daily Temperature:", ssd_temperature))

Output:

[1] "Mean Daily Temperature: 75.1593605854571"
[1] "Sum of Squared Deviations (SSD) for Daily Temperature: 8520.02456165882"

Calculating SSD using Matrix Algebra

We can calculate SSD using matrix algebra as well, it will give the same value.




ssd_matrix <- t(temperature_data$Temperature - mean(temperature_data$Temperature)) %*%
                (temperature_data$Temperature - mean(temperature_data$Temperature))
 
print(paste("Matrix Algebra SSD:", ssd_matrix))

Output:

[1] "Matrix Algebra SSD: 8520.02456165883"

Calculating SSD of mtcars dataset

We can calculate the SSD of the famous in-built dataset in R. This dataset contains information about different cars and their models.




# Load the mtcars dataset
data(mtcars)
 
# Calculate the mean of the dataset
mean_value <- mean(mtcars$mpg)
 
# Calculate the sum of squared deviations
ssd <- sum((mtcars$mpg - mean_value)^2)
 
# Print the result
print(paste("Sum of Squared Deviations (SSD):", ssd))

Output:

[1] "Sum of Squared Deviations (SSD): 1126.0471875"

Here, we calculated the SSD value for mpg column present in our dataset. We can also visualize these points on a scatter plot using ggplot2 library.




# Create a scatter plot with the SSD highlighted
ggplot(mtcars, aes(x = mpg, y = (mpg - mean_value)^2)) +
  geom_point(color = "blue", size = 3) +
  geom_hline(yintercept = ssd, linetype = "dashed", color = "red", linewidth = 1) +
  labs(title = "Sum of Squared Deviations in mtcars Dataset",
       x = "mpg",
       y = "Squared Deviations from Mean") +
  theme_minimal()

Output:

Calculating Sum Of Squared Deviations In R

Conclusion

In this article, we calculated SSD using different datasets and we also verified it mathematically.


Article Tags :