How to Use the coeftest() Function in R

In R Programming language, we use coeftest() function to perform hypothesis tests and construct confidence intervals for regression coefficients. It is used after fitting regression models using functions like lm() (for linear regression), glm() (for generalized linear models), or any other function that returns a suitable object with coefficient estimates and their standard errors.

Syntax: coeftest(x,vcov. = NULL,df = NULL,)
where:

x: Name of the fitted regression model
vcov.: Covariance matrix of the estimated coefficients
df: Degrees of freedom to be used.

In this article, we will learn to use the coeftest() Function. For that, we first install the required packages(the lmtest(for linear regression) and sandwich packages). The sandwich package provides heteroskedasticity-consistent covariance matrix estimators.

After installing, load these packages in R environment using the library() function:

install.packages("lmtest")
install.packages("sandwich")

# Load the required library
library(lmtest)
library(sandwich)

How to Use coeftest() Function in R

In this example we create a data frame in R that shows relationship between a car's fuel efficiency and its engine size and weight. We'll use a hypothetical dataset with the following variables:

mpg: Miles per gallon (fuel efficiency)
engine_size: Engine size in liters
weight: Weight of the car in pounds

# Create a hypothetical dataset
car_data <- data.frame(
  mpg = c(21, 23, 18, 25, 19, 22, 20, 24, 17, 26),
  engine_size = c(2.0, 2.2, 2.5, 1.8, 2.3, 2.1, 2.4, 1.9, 2.2, 2.6),
  weight = c(3000, 3200, 3500, 2800, 3300, 3100, 3400, 2900, 3600, 2700)
)
#view data frame
car_data

Output:

   mpg engine_size weight
1   21         2.0   3000
2   23         2.2   3200
3   18         2.5   3500
4   25         1.8   2800
5   19         2.3   3300
6   22         2.1   3100
7   20         2.4   3400
8   24         1.9   2900
9   17         2.2   3600
10  26         2.6   2700

Now, We'll fit a multiple linear regression model to the data in R. We can use the lm() function to fit this model.

# Fit a multiple linear regression model
car_model <- lm(mpg ~ engine_size + weight, data = car_data)

We can then use the coeftest() function to perform a t-test for each fitted regression coefficient in the model.

# Create a hypothetical dataset
car_data <- data.frame(
  mpg = c(21, 23, 18, 25, 19, 22, 20, 24, 17, 26),
  engine_size = c(2.0, 2.2, 2.5, 1.8, 2.3, 2.1, 2.4, 1.9, 2.2, 2.6),
  weight = c(3000, 3200, 3500, 2800, 3300, 3100, 3400, 2900, 3600, 2700)
)

# Fit a multiple linear regression model
car_model <- lm(mpg ~ engine_size + weight, data = car_data)

# Load the lmtest package
library(lmtest)

# Perform t-test for each coefficient in the model
coeftest(car_model)

Output:

t test of coefficients:
              Estimate Std. Error t value  Pr(>|t|)    
(Intercept) 50.2325103  4.5374997 11.0705  1.09e-05 ***
engine_size  0.6687243  1.5965455  0.4189 0.6878736    
weight      -0.0095885  0.0013615 -7.0424 0.0002038 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Interpretation of the coeftest( ) Output

The coeftest() function outputs four columns.

The first column Estimate provides the estimated coefficients for the intercept and each predictor variable.
The second column Std. Error gives the standard error for each coefficient.
The third column t value (or z value for models estimated with robust standard errors) gives the test statistic value.
The fourth column Pr(>|t|) (or Pr(>|z|) for models estimated with robust standard errors) provides the two-tailed p-value for the hypothesis test.

The t test statistic and corresponding p-value is shown for each t-test:

Intercept: t = 11.0705 , p = <0.000
engine_size: t = 0.4189 , p = 0.6878736
weight: t = -7.0424, p = 0.0002038

The p-value tests the null hypothesis that each coefficient is equal to zero, given the other predictors in the model. A small p-value (typically less than 0.05) leads us to reject the null hypothesis, suggesting that the predictor is statistically significant.

The p-value for the weight coefficient (0.0002038) is less than the significance level (e.g., 0.05),thus reject the null hypothesis that the coefficient is zero. Therefore, we conclude that the weight variable has a statistically significant effect on fuel efficiency.
Conversely, for the engine_size variable, the p-value (0.6878736) is greater than the significance level. Thus, we fail to reject the null hypothesis for this coefficient, indicating that there is not enough evidence to suggest a statistically significant relationship between engine size and fuel efficiency.

Let us visualize the results, we can create a bar plot showing the coefficients along with their confidence intervals. Here's how.

# Extract coefficients and their standard errors
coefficients <- coef(car_model)
std_errors <- sqrt(diag(vcov(car_model)))

# Create a data frame for plotting
plot_data <- data.frame(
  Coefficient = names(coefficients),
  Estimate = coefficients,
  Std_Error = std_errors
)

# Plot coefficients and confidence intervals
library(ggplot2)

ggplot(plot_data, aes(x = Coefficient, y = Estimate, ymin = Estimate - 1.96 * Std_Error,
                                           ymax = Estimate + 1.96 * Std_Error)) +
  geom_point(size = 3) +
  geom_errorbar(width = 0.2) +
  labs(title = "Coefficients and Confidence Intervals",
       x = "Coefficient",
       y = "Estimate") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Output:

coeftest() Function in R

Conclusion

The coeftest() function in R is a powerful tool for performing hypothesis testing on the coefficients of a fitted model. This function provides detailed information on each predictor’s significance, allowing data analysts and researchers to make informed decisions based on their models.

Article Tags :

Dev Scripter

R Language

Dev Scripter 2024

R Statistics-Function