Open In App

Find the Regression Output in R

In R Programming Language we can Interpret Regression Output by using various functions depending on the type of regression analysis you are conducting. The two most common types of regression analysis are linear regression and logistic regression. Here, I’ll provide examples of how to find the regression output for both types.

Linear regression is used to understand and model the relationship between a dependent variable (Y) and one or more independent variables (X1, X2, etc.).



The goal is to find the best-fitting linear equation that describes how changes in the independent variables are associated with changes in the dependent variable.

Linear Equation

In simple linear regression, with one independent variable, the linear equation takes the form:



Y = β0 + β1 * X + ε

Parameters Estimation

The goal is to estimate the values of β0 and β1 that minimize the sum of squared differences between the observed values of Y and the predicted values of Y (the “least squares” criterion).

Model Interpretation

The β1 coefficient represents the change in the dependent variable for a one-unit change in the independent variable, assuming all other variables are held constant.

The β0 intercept represents the value of the dependent variable when the independent variable is zero (sometimes it may not have a meaningful interpretation).

Model Assumptions

Linear regression assumes that the relationship between variables is linear.

Assumptions also include homoscedasticity (constant variance of errors), independence of errors, and normally distributed errors.

Model Evaluation

Model performance is often assessed using metrics like R-squared (the proportion of variance explained by the model), Mean Squared Error (MSE), or Root Mean Squared Error (RMSE).

Residual analysis (plotting the differences between observed and predicted values) helps identify potential issues.

Types of Linear Regression

  1. Simple Linear Regression: Involves one independent variable.
  2. Multiple Linear Regression: Involves two or more independent variables.
  3. Polynomial Regression: Uses polynomial equations to fit curves rather than straight lines.

Linear regression is used in various fields for tasks such as predicting sales based on advertising expenditure, estimating house prices based on features, and understanding the relationship between variables in scientific research.

Implementation in R

In R, you can use the lm() function to fit linear regression models. The summary() function provides detailed output with coefficients, p-values, and goodness-of-fit statistics.

Linear Regression Output

For linear regression, you typically use the lm() function to fit a linear model, and then you can use the summary() function to obtain detailed regression output. Here’s a step-by-step guide:

Fit the Linear Model:

Use the lm() function to fit a linear regression model. For example:

# Create a simple linear regression model
model <- lm(Y ~ X1 + X2, data = your_data)

Replace Y, X1, X2, and our_data with your specific response variable, predictor variables, and data.

View Regression Summary:

To view the regression summary, use the summary() function on the fitted model:

# View the regression summary
summary(model)

This will display detailed information about the linear regression model, including coefficients, p-values, R-squared, and more.

Logistic Regression Output:

For logistic regression, you use the glm() function (generalized linear model) to fit the model, and you can use the summary() function as well. Here’s how to do it:

Fit the Logistic Model:

Use the glm() function to fit a logistic regression model. For example:

# Create a logistic regression model
model <- glm(Y ~ X1 + X2, data = your_data, family = binomial)

Replace Y, X1, X2, your_data, and binomial with your specific response variable, predictor variables, data, and family distribution (e.g., binomial for logistic regression).

View Regression Summary:

To view the regression summary, use the summary() function on the fitted model:

# View the regression summary
summary(model)

This will display detailed information about the logistic regression model, including coefficients, p-values, and goodness-of-fit statistics.

Interpret Regression Output for Simple Linear Regression




# Create a sample dataframe for linear and logistic regression
set.seed(123)
# number of samples
n <- 100
 
# Linear regression variables
Advertising <- rnorm(n, mean = 50, sd = 10)
Price <- rnorm(n, mean = 100, sd = 20)
Sales <- 30 + 2 * Advertising - 3 * Price + rnorm(n, mean = 0, sd = 5)
 
# Logistic regression variables
Age <- rnorm(n, mean = 40, sd = 10)
Gender <- sample(c("Male", "Female"), n, replace = TRUE)
Outcome <- rbinom(n, size = 1, prob = 0.7)
 
# Create the dataframe
sample_data <- data.frame(Advertising, Price, Sales, Age, Gender, Outcome)
head(sample_data)

Output:


Advertising Price Sales Age Gender Outcome
1 44.39524 85.79187 -127.5911 32.84758 Male 0
2 47.69823 105.13767 -183.4545 32.47311 Male 1
3 65.58708 95.06616 -125.3500 30.61461 Female 0
4 50.70508 93.04915 -145.0213 29.47487 Female 1
5 51.29288 80.96763 -112.3888 35.62840 Male 1
6 67.15065 99.09945 -135.3783 43.31179 Male 1

Create model for Interpret Regression Output




# Fit a linear regression model
linear_model <- lm(Sales ~ Advertising + Price, data = sample_data)
  
# View the regression summary
summary(linear_model)

Output:

Call:
lm(formula = Sales ~ Advertising + Price, data = sample_data)

Residuals:
Min 1Q Median 3Q Max
-9.3651 -3.3037 -0.6222 3.1068 10.3991

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.40933 3.72226 8.976 2.18e-14 ***
Advertising 1.93341 0.05243 36.873 < 2e-16 ***
Price -2.99405 0.02475 -120.978 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.756 on 97 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.994
F-statistic: 8239 on 2 and 97 DF, p-value: < 2.2e-16

We use the lm() function to fit a linear regression model, where Sales is our dependent variable and Advertising and Price are independent variables.

The summary() function provides detailed output, including coefficients (intercept, Advertising, and Price), standard errors, t-values, p-values, and R-squared.

Call:
lm(formula = Sales ~ Advertising + Price, data = sample_data)

We are trying to figure out how changes in two factors, advertising and price, affect sales.
The term “Sales” is what we’re trying to predict, and “Advertising” and “Price” are the things we think might influence it.
The data we’re using for this analysis comes from a dataset named sample_data. The goal is to understand the relationships between advertising, price, and sales in the real world.

Residuals:
Min 1Q Median 3Q Max
-9.3651 -3.3037 -0.6222 3.1068 10.3991
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.40933 3.72226 8.976 2.18e-14 ***
Advertising 1.93341 0.05243 36.873 < 2e-16 ***
Price -2.99405 0.02475 -120.978 < 2e-16 ***

The model estimates three coefficients:

Residual standard error: 4.756 on 97 degrees of freedom
Multiple R-squared: 0.9941, Adjusted R-squared: 0.994
F-statistic: 8239 on 2 and 97 DF, p-value: < 2.2e-16

Interpret Regression Output for Logistic Regression




# Create a sample dataframe for linear and logistic regression
set.seed(123)
# number of samples
n <- 100 
  
# Linear regression variables
Advertising <- rnorm(n, mean = 50, sd = 10)
Price <- rnorm(n, mean = 100, sd = 20)
Sales <- 30 + 2 * Advertising - 3 * Price + rnorm(n, mean = 0, sd = 5)
  
# Logistic regression variables
Age <- rnorm(n, mean = 40, sd = 10)
Gender <- sample(c("Male", "Female"), n, replace = TRUE)
Outcome <- rbinom(n, size = 1, prob = 0.7)
  
# Create the dataframe
sample_data <- data.frame(Advertising, Price, Sales, Age, Gender, Outcome)
 
head(sample_data)

Output:

  Advertising     Price     Sales      Age Gender Outcome
1 44.39524 85.79187 -127.5911 32.84758 Male 0
2 47.69823 105.13767 -183.4545 32.47311 Male 1
3 65.58708 95.06616 -125.3500 30.61461 Female 0
4 50.70508 93.04915 -145.0213 29.47487 Female 1
5 51.29288 80.96763 -112.3888 35.62840 Male 1
6 67.15065 99.09945 -135.3783 43.31179 Male 1

Create model for Interpret Regression Output




# Fit a logistic regression model
logistic_model <- glm(Outcome ~ Age + Gender, data = sample_data, family = binomial)
  
# View the regression summary
summary(logistic_model)

Output:

Call:
glm(formula = Outcome ~ Age + Gender, family = binomial, data = sample_data)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.26053 0.96378 2.345 0.019 *
Age -0.02311 0.02229 -1.037 0.300
GenderMale -0.72437 0.47008 -1.541 0.123
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 120.43 on 99 degrees of freedom
Residual deviance: 116.49 on 97 degrees of freedom
AIC: 122.49

Number of Fisher Scoring iterations: 4

We use the glm() function for logistic regression. Here, Outcome is the binary outcome variable, and we predict it based on Age and Gender. The family = binomial argument specifies logistic regression.

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 120.43 on 99 degrees of freedom
Residual deviance: 116.49 on 97 degrees of freedom
AIC: 122.49

Number of Fisher Scoring iterations: 4

In both examples, the output provides information to interpret the relationship between variables, assess the significance of predictors, and understand the model’s performance. You can use these results to make predictions, conduct hypothesis tests, and draw conclusions about the relationships in your data.


Article Tags :