Open In App

Survey Package in R

The “survey” package in R is a powerful tool for analyzing complex survey data. It provides functions and methods for handling survey design features, such as stratification, clustering, and weighting. This package is particularly useful when working with data collected from complex survey designs, like those from large-scale social surveys or health studies. Below, I’ll provide a brief explanation of survey analysis theory and examples using the “survey” package.

Survey Analysis Theory

In R Programming Language the survey package has some features that are discussed below.



Applications of survey package

The package “survey” in R is widely used because it helps to analyze complex data collected from surveys. When it comes to handling all the complicated parts of surveys like stratification, clustering, and unequal probabilities of selection, “survey” does it all. However, there are some things you should know about it.

Example 1: Loading and Handling Survey Data




# Load the "survey" package
library(survey)
  
# Load a sample survey dataset included with the package
data(api)
  
# Create a survey design object
api_design <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, 
                        fpc = ~fpc)
  
# Calculate weighted descriptive statistics
# Calculate the weighted mean of enrollment
svymean(~enroll, design = api_design) 

Output:



         mean     SE
enroll 595.28 18.509

library(survey): This line loads the “survey” package, which is essential for handling complex survey data and conducting survey analysis.

design = api_design: The design argument specifies the survey design object that we created earlier, api_design. This design object is used to apply the survey weights and account for the survey’s complex design features.




# Create a survey table
svytable(~stype + meals, design = api_design)

Output:

     meals
stype 0 1 2 3 4 5 6 7 8 9 10
E 0.00 44.21 0.00 0.00 88.42 0.00 88.42 132.63 0.00 44.21 88.42
H 15.10 45.30 0.00 15.10 0.00 30.20 15.10 0.00 30.20 30.20 0.00
M 0.00 0.00 20.36 0.00 40.72 0.00 20.36 0.00 0.00 0.00 0.00
meals
stype 11 12 13 14 15 17 18 19 20 21 23
E 44.21 0.00 44.21 132.63 44.21 0.00 44.21 0.00 88.42 0.00 0.00
H 0.00 15.10 15.10 0.00 15.10 0.00 15.10 30.20 45.30 30.20 60.40
M 0.00 0.00 40.72 0.00 0.00 20.36 20.36 20.36 0.00 20.36 0.00
meals
stype 24 25 26 28 29 31 32 33 34 35 36
E 88.42 132.63 44.21 44.21 0.00 44.21 0.00 88.42 88.42 44.21 88.42
H 0.00 0.00 0.00 15.10 15.10 30.20 0.00 15.10 15.10 15.10 30.20
M 81.44 0.00 0.00 0.00 20.36 20.36 20.36 20.36 0.00 0.00 40.72
meals
stype 37 38 39 40 41 42 43 44 45 46 47
E 0.00 132.63 88.42 44.21 44.21 88.42 44.21 0.00 132.63 44.21 44.21
H 15.10 15.10 15.10 0.00 0.00 0.00 0.00 15.10 0.00 0.00 15.10
M 0.00 20.36 0.00 0.00 0.00 0.00 0.00 20.36 20.36 20.36 40.72
meals
stype 48 49 50 51 52 54 56 57 58 59 60
E 44.21 44.21 0.00 88.42 0.00 88.42 44.21 0.00 44.21 0.00 0.00
H 0.00 0.00 15.10 15.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00
M 0.00 0.00 0.00 20.36 40.72 20.36 20.36 20.36 0.00 20.36 40.72
meals
stype 61 63 64 66 67 69 71 72 73 74 75
E 44.21 0.00 44.21 0.00 88.42 132.63 44.21 88.42 0.00 132.63 132.63
H 0.00 15.10 0.00 15.10 0.00 0.00 0.00 15.10 0.00 0.00 0.00
M 0.00 0.00 40.72 40.72 20.36 20.36 0.00 20.36 20.36 0.00 20.36
meals
stype 76 77 78 79 80 82 83 85 86 88 89
E 88.42 44.21 88.42 0.00 44.21 44.21 132.63 0.00 0.00 44.21 0.00
H 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 15.10 0.00 15.10
M 0.00 20.36 20.36 20.36 0.00 0.00 0.00 20.36 0.00 0.00 0.00
meals
stype 91 92 93 95 96 97 98 99 100
E 0.00 44.21 44.21 88.42 44.21 44.21 221.05 44.21 132.63
H 0.00 0.00 0.00 0.00 0.00 0.00 15.10 0.00 15.10
M 20.36 0.00 0.00 0.00 0.00 0.00 0.00 20.36 0.00

svytable(…): This function is used to create a survey table. It helps you understand the distribution and relationship between variables in your complex survey data.

~stype + meals: This specifies the variables you want to cross-tabulate in the survey table. In this example, you are creating a table to explore the relationship between the “stype” variable (school type) and the “meals” variable (percentage of students eligible for free meals).




# Fit a weighted linear regression model
model <- svyglm(api00 ~ meals + mobility, design = api_design)
  
# Summarize the regression results
summary(model)

Output:

Call:
svyglm(formula = api00 ~ meals + mobility, design = api_design)
Survey design:
svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat,
fpc = ~fpc)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 821.2318 9.9265 82.731 <2e-16 ***
meals -3.4068 0.1717 -19.847 <2e-16 ***
mobility 0.3105 0.3887 0.799 0.425
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for gaussian family taken to be 5217.241)
Number of Fisher Scoring iterations: 2

svyglm(…): This function fits a weighted linear regression model to your complex survey data. In this case, you are trying to predict the variable “api00” (academic performance) based on the predictors “meals” (percentage of students eligible for free meals) and “mobility” (percentage of students who changed schools).

The summary of the model provides insights into the relationships between the predictor variables (meals and mobility) and the response variable (api00) while considering the complex survey design and weights. It helps you interpret the results of the regression analysis and draw conclusions about the predictors’ impact on the academic performance variable.

Make Predictions

To make predictions using a fitted model, you can use the predict.survey.design function. Here’s an example of how to make predictions.




# Predict the outcome variable (api00) using the model
predictions <- predict(model, newdata = apistrat)
  
# Print the first few predictions
head(predictions)
  
# Create a data frame with the new input data
new_data <- data.frame(meals = 2.5, mobility = 0.8)  
# Make predictions using the fitted model
predictions <- predict(model, newdata = new_data)
  
# Print the predictions
print("Predicted API00:")
print(predictions)

Output:

       1        2        3        4        5        6 
712.2228 495.4380 608.4748 542.5033 742.2811 801.1104
link SE
1 812.96 9.504

In this code, we are using the predict function with our fitted model (model) and specifying the dataset (apistrat) for which we want to make predictions. we can use the head function to view the first few predicted values.

Create Visualizations

We can create various visualizations to understand the results of your survey analysis. Let’s create a scatterplot of the actual vs. predicted values.




# Load the "ggplot2" library for visualization
library(ggplot2)
  
# Combine the actual and predicted values into a data frame
prediction_data <- data.frame(Actual = apistrat$api00, Predicted = predictions)
  
# Create a scatterplot of actual vs. predicted values
ggplot(prediction_data, aes(x = Actual, y = Predicted.link)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(x = "Actual API00", y = "Predicted API00", title = "Actual vs. Predicted Values")

Output:

survey package in R

First fits a weighted linear regression model to survey data, predicting “api00” based on “meals” and “mobility.” Predictions are made and compared to actual values. The scatterplot shows how closely predictions align with actual values, aiding in assessing the model’s performance. If data points closely follow the blue line, it suggests accurate predictions; scattered points indicate potential inaccuracies. This visual assessment helps evaluate the model’s effectiveness and areas needing improvement.


Article Tags :