Compute Summary Statistics In R

Last Updated : 30 Apr, 2024

Summary statistics provide a concise overview of the characteristics of a dataset, offering insights into its central tendency, dispersion, and distribution. R Programming Language with its variety of packages, offers several methods to compute summary statistics efficiently. Here we’ll explore various techniques to compute summary statistics in R. Here are some Techniques to Compute Summary Statistics:

1. Descriptive Statistics Functions in Base R

Base R provides several built-in functions for computing summary statistics, including summary(), mean(), median(), min(), max(), quantile(), sd(), and var().
These functions offer basic summary statistics such as minimum, maximum, median, mean, standard deviation, and variance for a given dataset.

2. Using External Packages

R offers various external packages that extend the functionality of base R for computing summary statistics.
The psych package provides the describe() function, which offers a more comprehensive summary of the dataset, including measures like skewness, kurtosis, and interquartile range (IQR).
Packages like dplyr and data.table offer functions for computing summary statistics for grouped data and performing complex data manipulation tasks efficiently.

3.Grouping Data for Summary Statistics

Grouping data allows us to compute summary statistics for subsets of the dataset based on one or more grouping variables.
We can use the group_by() function from the dplyr package to group data by one or multiple variables and then compute summary statistics for each group using functions like summarise().

4.Summarising Multiple Variables

Sometimes, we may want to summarise multiple variables simultaneously.
The summarise() function from the dplyr package allows us to compute summary statistics for multiple variables at once, such as mean, median, standard deviation, etc.

5.Additional Statistical Summary Functions

R offers additional functions for computing useful summary statistics beyond the basic measures provided by base R functions.
Functions like skewness(), kurtosis(), and IQR() compute measures of skewness, kurtosis, and interquartile range (IQR), respectively, providing deeper insights into the distribution of the data.

Compute Summary Statistics In R

Step 1: Install required packages

install.packages(c("dplyr", "data.table"))
install.packages("e1071")
library(e1071)
library(dplyr)
library(data.table)

Step 2: Load the Dataset

# Load the mtcars dataset
data(mtcars)

Step 3: Summary Statistics of Ungrouped Data

Computing summary statistics for the entire dataset. We’ll use base R functions like summary(), mean(), median(), etc.

# Summary statistics for ungrouped data
cat("Summary statistics for mpg variable:\n")
summary(mtcars$mpg)
cat("\nMean of mpg:", mean(mtcars$mpg), "\n")
cat("Median of mpg:", median(mtcars$mpg), "\n")
cat("Minimum value of mpg:", min(mtcars$mpg), "\n")
cat("Maximum value of mpg:", max(mtcars$mpg), "\n")
cat("Quantiles of mpg:", quantile(mtcars$mpg), "\n")
cat("Standard deviation of mpg:", sd(mtcars$mpg), "\n")
cat("Variance of mpg:", var(mtcars$mpg), "\n")

Output:

Summary statistics for mpg variable:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.43   19.20   20.09   22.80   33.90 

Mean of mpg: 20.09062 

Median of mpg: 19.2 

Minimum value of mpg: 10.4 

Maximum value of mpg: 33.9 

Quantiles of mpg: 10.4 15.425 19.2 22.8 33.9 

Standard deviation of mpg: 6.026948 

Variance of mpg: 36.3241

Step 4: Summary Statistics of Grouped Data by one Variable

# Group by one variable (cylinders) and compute summary statistics
mtcars %>%
  group_by(cyl) %>%
  summarise(
    mean_mpg = mean(mpg),
    median_mpg = median(mpg),
    sd_mpg = sd(mpg)
  )

Output:

# A tibble: 3 × 4
    cyl mean_mpg median_mpg sd_mpg
  <dbl>    <dbl>      <dbl>  <dbl>
1     4     26.7       26     4.51
2     6     19.7       19.7   1.45
3     8     15.1       15.2   2.56

Group by Multiple Variables

# Summarise multiple variables
mtcars %>%
  summarise(
    mean_mpg = mean(mpg),
    mean_disp = mean(disp),
    sd_hp = sd(hp),
    var_wt = var(wt)
  )

Output:

  mean_mpg mean_disp    sd_hp   var_wt
1 20.09062  230.7219 68.56287 0.957379

Step 5: Additional Summary Functions

Additional functions for computing useful summary statistics, such as skewness, kurtosis, and interquartile range (IQR).

# Additional statistical summary functions
print("Computing skewness for the mpg variable...")
skewness(mtcars$mpg)

print("Computing kurtosis for the mpg variable...")
kurtosis(mtcars$mpg)

print("Computing interquartile range (IQR) for the mpg variable...")
IQR(mtcars$mpg)

Output:

[1] "Computing skewness for the mpg variable..."
[1] 0.610655

[1] "Computing kurtosis for the mpg variable..."
[1] -0.372766

[1] "Computing interquartile range (IQR) for the mpg variable..."
[1] 7.375

Computing summary statistics in R is essential for understanding the characteristics of a dataset. Whether it’s ungrouped or grouped data, R provides powerful tools like dplyr and data.table to compute these statistics efficiently. By exploring these techniques, analysts can gain valuable insights into their data, aiding in decision-making and further analysis.

Suggest improvement

How to Add Group-Level Summary Statistic as a New Column in Pandas?

How to solve Error in Confusion Matrix

Share your thoughts in the comments

Compute Summary Statistics In R

1. Descriptive Statistics Functions in Base R

2. Using External Packages

3.Grouping Data for Summary Statistics

4.Summarising Multiple Variables

5.Additional Statistical Summary Functions

Compute Summary Statistics In R

Step 1: Install required packages

Step 2: Load the Dataset

Step 3: Summary Statistics of Ungrouped Data

Step 4: Summary Statistics of Grouped Data by one Variable

Step 5: Additional Summary Functions

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?