Open In App

Compute Summary Statistics In R

Summary statistics provide a concise overview of the characteristics of a dataset, offering insights into its central tendency, dispersion, and distribution. R Programming Language with its variety of packages, offers several methods to compute summary statistics efficiently. Here we'll explore various techniques to compute summary statistics in R. Here are some Techniques to Compute Summary Statistics:

1. Descriptive Statistics Functions in Base R

2. Using External Packages

3.Grouping Data for Summary Statistics

4.Summarising Multiple Variables

5.Additional Statistical Summary Functions

Compute Summary Statistics In R

Step 1: Install required packages

install.packages(c("dplyr", "data.table"))
install.packages("e1071")
library(e1071)
library(dplyr)
library(data.table)

Step 2: Load the Dataset

# Load the mtcars dataset
data(mtcars)

Step 3: Summary Statistics of Ungrouped Data

Computing summary statistics for the entire dataset. We'll use base R functions like summary(), mean(), median(), etc.

# Summary statistics for ungrouped data
cat("Summary statistics for mpg variable:\n")
summary(mtcars$mpg)
cat("\nMean of mpg:", mean(mtcars$mpg), "\n")
cat("Median of mpg:", median(mtcars$mpg), "\n")
cat("Minimum value of mpg:", min(mtcars$mpg), "\n")
cat("Maximum value of mpg:", max(mtcars$mpg), "\n")
cat("Quantiles of mpg:", quantile(mtcars$mpg), "\n")
cat("Standard deviation of mpg:", sd(mtcars$mpg), "\n")
cat("Variance of mpg:", var(mtcars$mpg), "\n")

Output:

Summary statistics for mpg variable:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.43   19.20   20.09   22.80   33.90 

Mean of mpg: 20.09062 

Median of mpg: 19.2 

Minimum value of mpg: 10.4 

Maximum value of mpg: 33.9 

Quantiles of mpg: 10.4 15.425 19.2 22.8 33.9 

Standard deviation of mpg: 6.026948 

Variance of mpg: 36.3241 

Step 4: Summary Statistics of Grouped Data by one Variable

# Group by one variable (cylinders) and compute summary statistics
mtcars %>%
  group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    median_mpg = median(mpg),
    sd_mpg = sd(mpg)
  )

Output:

# A tibble: 3 × 4
cyl mean_mpg median_mpg sd_mpg
<dbl> <dbl> <dbl> <dbl>
1 4 26.7 26 4.51
2 6 19.7 19.7 1.45
3 8 15.1 15.2 2.56

Group by Multiple Variables

# Summarise multiple variables
mtcars %>%
  summarise(
    mean_mpg = mean(mpg),
    mean_disp = mean(disp),
    sd_hp = sd(hp),
    var_wt = var(wt)
  )

Output:

  mean_mpg mean_disp    sd_hp   var_wt
1 20.09062  230.7219 68.56287 0.957379

Step 5: Additional Summary Functions

Additional functions for computing useful summary statistics, such as skewness, kurtosis, and interquartile range (IQR).

# Additional statistical summary functions
print("Computing skewness for the mpg variable...")
skewness(mtcars$mpg)

print("Computing kurtosis for the mpg variable...")
kurtosis(mtcars$mpg)

print("Computing interquartile range (IQR) for the mpg variable...")
IQR(mtcars$mpg)

Output:

[1] "Computing skewness for the mpg variable..."
[1] 0.610655

[1] "Computing kurtosis for the mpg variable..."
[1] -0.372766

[1] "Computing interquartile range (IQR) for the mpg variable..."
[1] 7.375

Computing summary statistics in R is essential for understanding the characteristics of a dataset. Whether it's ungrouped or grouped data, R provides powerful tools like dplyr and data.table to compute these statistics efficiently. By exploring these techniques, analysts can gain valuable insights into their data, aiding in decision-making and further analysis.

Article Tags :