Open In App

How to Calculate the P-Value of an F-Statistic in R

Last Updated : 28 Mar, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

F-test is a statistical test and it produces the F-statistic which possesses F distribution under the null hypothesis. This article focuses on how we can compute the P-value of an F-statistic in R Programming Language.

Finding P-value of an F statistic in R

R provides us pf() function using which we can determine the p-value associated with the F-Statistic. The function has the following syntax:

Syntax: pf(F_statistic, dataframe1, dataframe2, lower.tail = FALSE)

Parameters:

  • F_statistic: It represents the value of the f-statistic
  • dataframe1: It represents the degrees of freedom 1
  • dataframe2: It represents the degrees of freedom 2
  • lower.tail = TRUE: Returns the probability associated with the lower tail of the F distribution.
  • lower.tail = FALSE: Doesn’t return the probability associated with the lower tail of the F distribution.

Example:

Consider an example of having the following parameters:

  • fstat: 7
  • df1: 4
  • df2: 5
  • lower.tail = FALSE

R




pf(7, 4, 5, lower.tail = FALSE)


Output:

 

Hence, the p-value associated with F-statistic comes out to be equal to 0.027. F-test is also used to test the overall significance of a regression model.

Computing p-value from F-statistic for a regression model

Consider that we have a dataset that shows the total distance traveled, total emission generated, mileage obtained at the end:

R




# Create a dataset
dataset <- data.frame(distance = c(112, 217, 92, 98, 104),
                   emission = c(4.5, 9.8, 12.1, 3.2, 7.6),
                   mileage = c(15, 12, 16, 19, 21))
  
# Display the dataset
dataset


Output:

Now, we can fit a linear regression model to this data using distance and mileage as the predictor variables and mileage as the response variable. To fit a regression model, R provides us lm() using which we can fit the linear regression model easily. It has the following syntax:

Syntax: lm( formula, dataframe )

Parameters:

  • formula: It represents the formula for the linear model.
  • dataframe: It represents a data frame that contains the data.

To print the summary of the linear model, we can use the summary() function. This function has the following syntax:

Syntax: summary(model)

Parameters: model: It represents a model

The complete source code is given below:

R




# Create a dataset
dataset <- data.frame(distance = c(112, 217, 92, 98, 104),
                   emission = c(4.5, 9.8, 12.1, 3.2, 7.6),
                   mileage = c(15, 12, 16, 19, 21))
  
# Fit a regression model
model <- lm(mileage ~ distance + emission, data = dataset)
  
# Display the output of the model
summary(model)


Output:

The F-statistic for the overall regression model comes out to be equal to 1.321. This F-statistic has 2 degrees of freedom for the numerator as well as for the denominator. The p-value for this F-statistic is equal to 0.4309.

We can calculate this equivalent p-value with the help of the following code:

R




# Compute the p-value
pf(1.321, 2, 2, lower.tail = FALSE)


Output:

 

As you can see in the output, we got an almost similar result.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads