# Factor Analysis in R programming

Last Updated : 12 Jun, 2023

Factor Analysis (FA) is a statistical method that is used to analyze the underlying structure of a set of variables. It is a method of data reduction that seeks to explain the correlations among many variables in terms of a smaller number of unobservable (latent) variables, known as factors. In R Programming Language, the psych package provides a variety of functions for performing factor analysis.

### Factor analysis involves several steps:

1. Data preparation: The data are usually standardized (i.e., scaled) to make sure that the variables are on a common scale and have equal weight in the analysis.
2. Factor Extraction: The factors are identified based on their ability to explain the variance in the data. There are several methods for extracting factors, including principal components analysis (PCA), maximum likelihood estimate(MLE), and minimum residuals (MR).
3. Factor Rotation: The factors are usually rotated to make their interpretation easier. The most common method of rotation is Varimax rotation, which tries to maximize the variance of the factor loadings.
4. Factor interpretation: The final step involves interpreting the factors and their loadings (i.e., the correlation between each variable and each factor). The loadings represent the degree to which each variable is associated with each factor.

First, we need to load the data that we want to analyze. For this example, we will use the iris dataset that comes with R. This dataset contains measurements of the sepal length, sepal width, petal length, and petal width of three different species of iris flowers.

## R

 `# Load the dataset ` `data``(iris) ` ` `  `# View the first few rows of the dataset ` `head``(iris) `

Output:

First five rows of the dataset

### Data Preparation

Before conducting factor analysis, we need to prepare the data by scaling the variables to have a mean of zero and a standard deviation of one. This is important because factor analysis is sensitive to differences in scale between variables.

## R

 `# Scale the data ` `iris_scaled <- ``scale``(iris[,1:4]) `

### Determining the Number of Factors

The next step is to determine the number of factors to extract from the data. This can be done using a variety of methods, such as the Kaiser criterion, scree plot, or parallel analysis. In this example, we will use the Kaiser criterion, which suggests extracting factors with eigenvalues greater than one.

## R

 `# Perform factor analysis ` `library``(psych) ` `fa <- ``fa``(r = iris_scaled, ` `         ``nfactors = 4, ` `         ``rotate = ``"varimax"``) ` `summary``(fa) `

Output:

```Factor analysis with Call: fa(r = iris_scaled, nfactors = 4, rotate = "varimax")

Test of the hypothesis that 4 factors are sufficient.
The degrees of freedom for the model is -4  and the objective function was  0
The number of observations was  150  with Chi Square =  0  with prob <  NA

The root mean square of the residuals (RMSA) is  0
The df corrected root mean square of the residuals is  NA

Tucker Lewis Index of factoring reliability =  1.009```

The output of the summary() function shows the results of the factor analysis, including the number of factors extracted, the eigenvalues for each factor, and the percentage of variance explained by each factor.

This summary shows that the factor analysis extracted 2 factors, and provides the standardized loadings (or factor loadings) for each variable on each factor. It also shows the eigenvalues and proportion of variance explained by each factor, as well as the results of a test of the hypothesis that 2 factors are sufficient. The goodness of fit statistic is also reported.

## Interpreting the Results of Factor Analysis

Once the factor analysis is complete, we can interpret the results by examining the factor loadings, which represent the correlations between the observed variables and the extracted factors. In general, loadings greater than 0.4 or 0.5 are considered significant.

## R

 `# View the factor loadings ` `fa\$loadings `

Output:

```Loadings:
MR1    MR2    MR3    MR4
Sepal.Length  0.997
Sepal.Width  -0.108  0.757
Petal.Length  0.861 -0.413  0.288
Petal.Width   0.801 -0.317  0.492

MR1   MR2   MR3   MR4
Proportion Var 0.597 0.211 0.083 0.000
Cumulative Var 0.597 0.808 0.891 0.891```

The output of the loadings function shows the factor loadings for each variable and factor. We can interpret these loadings to identify the underlying factors that explain the correlations among the observed variables. In this example, it appears that the first factor is strongly associated with petal length and petal width, while the second factor is strongly associated with sepal length and sepal width.

## Validating the Results of Factor Analysis

Finally, it is important to validate the results of the factor analysis by checking the assumptions of the technique, such as normality and linearity. Additionally, it is important to examine the factor structure for different subsets of the data to ensure that the results are consistent and stable.

## R

 `# examine factor structure for  ` `# different subsets of the data ` `subset1 <- ``subset``(iris[,1:4], ` `                  ``iris\$Sepal.Length < ``mean``(iris\$Sepal.Length)) ` `fa1 <- ``fa``(subset1, nfactors = 4) ` `print``(fa1) `

Output:

```Factor Analysis using method =  minres
Call: fa(r = subset1, nfactors = 4)
MR1  MR2   MR3 MR4   h2    u2 com
Sepal.Length  0.66 0.61 -0.12   0 0.82 0.178 2.1
Sepal.Width  -0.68 0.61  0.11   0 0.85 0.150 2.0
Petal.Length  1.00 0.00  0.00   0 1.00 0.005 1.0
Petal.Width   0.97 0.01  0.16   0 0.97 0.031 1.1

MR1  MR2  MR3  MR4
Proportion Var        0.71 0.18 0.01 0.00
Cumulative Var        0.71 0.90 0.91 0.91
Proportion Explained  0.78 0.20 0.01 0.00
Cumulative Proportion 0.78 0.99 1.00 1.00

Mean item complexity =  1.5
Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are  6  and the objective function was
4.57 with Chi Square of  351.02
The degrees of freedom for the model are -4  and the objective function was  0

The root mean square of the residuals (RMSR) is  0
The df corrected root mean square of the residuals is  NA

The harmonic number of observations is  80 with the empirical chi square  0  with prob <  NA
The total number of observations was  80  with Likelihood Chi Square =  0  with prob <  NA

Tucker Lewis Index of factoring reliability =  1.018
Fit based upon off diagonal values = 1
MR1  MR2   MR3 MR4
Correlation of (regression) scores with factors   1.00 0.91  0.69   0
Multiple R square of scores with factors          1.00 0.82  0.47   0
Minimum correlation of possible factor scores     0.99 0.64 -0.05  -1```

## R

 `subset2 <- ``subset``(iris[,1:4], ` `                  ``iris\$Sepal.Length >= ``mean``(iris\$Sepal.Length)) ` `fa2 <- ``fa``(subset2, nfactors = 4) ` `print``(fa2) `

Output:

```Factor Analysis using method =  minres
Call: fa(r = subset2, nfactors = 4)
MR1   MR2   MR3 MR4   h2    u2 com
Sepal.Length 0.76 -0.37  0.26   0 0.78 0.222 1.7
Sepal.Width  0.50  0.36  0.34   0 0.49 0.507 2.6
Petal.Length 0.95 -0.23 -0.22   0 1.00 0.005 1.2
Petal.Width  0.82  0.39 -0.20   0 0.86 0.144 1.6

MR1  MR2  MR3  MR4
Proportion Var        0.60 0.12 0.07 0.00
Cumulative Var        0.60 0.71 0.78 0.78
Proportion Explained  0.76 0.15 0.09 0.00
Cumulative Proportion 0.76 0.91 1.00 1.00

Mean item complexity =  1.8
Test of the hypothesis that 4 factors are sufficient.

The degrees of freedom for the null model are  6  and the objective function was
1.97 with Chi Square of  131.96
The degrees of freedom for the model are -4  and the objective function was  0

The root mean square of the residuals (RMSR) is  0
The df corrected root mean square of the residuals is  NA

The harmonic number of observations is  70 with the empirical chi square  0  with prob <  NA
The total number of observations was  70  with Likelihood Chi Square =  0  with prob <  NA

Tucker Lewis Index of factoring reliability =  1.05
Fit based upon off diagonal values = 1
MR1  MR2  MR3 MR4
Correlation of (regression) scores with factors   0.98 0.86 0.75   0
Multiple R square of scores with factors          0.96 0.75 0.57   0
Minimum correlation of possible factor scores     0.92 0.49 0.14  -1```

## R

 `# display variance explained by each factor ` `print``(fa\$Vaccounted)`

Output:

```                            MR1       MR2        MR3          MR4
Proportion Var        0.7213402 0.1454084 0.02454873 1.000000e-30
Cumulative Var        0.7213402 0.8667486 0.89129733 8.912973e-01
Proportion Explained  0.8093149 0.1631424 0.02754269 1.121960e-30
Cumulative Proportion 0.8093149 0.9724573 1.00000000 1.000000e+00
```

### Factor Analysis using factanal( ) function:

The factanal() function is used to perform factor analysis on a data set. The factanal() function takes several arguments described below

Syntax:

factanal(x, factors, rotation, scores, covmat)

where,

• x – The data set to be analyzed.
• factors – The number of factors to extract.
• rotation – The rotation method to use. Popular rotation methods include varimax, oblimin, and promax.
• scores – Whether to compute factor scores for each observation.
• covmat – A covariance matrix to use instead of the default correlation matrix.

The output of factanal() function includes several pieces of information, including:

• Uniquenesses: The amount of variance in each variable that is not accounted for by the factors.
• Communalities: The amount of variance in each variable that is accounted for by the factors.
• Eigenvalues: The amount of variance explained by each factor.
• Factor Correlations: The correlations between the factors.

Here is an example code snippet that demonstrates how to use factanal() function in R:

## R

 `# Install the required package ` `install.packages``(``"psych"``) ` ` `  `# Load the psych package for  ` `# data analysis and visualization ` `library``(psych) ` ` `  `# Load the mtcars dataset ` `data``(mtcars) ` ` `  `# Perform factor analysis on the mtcars dataset ` `factor_analysis <- ``factanal``(mtcars, ` `                            ``factors = 3, ` `                            ``rotation = ``"varimax"``) ` ` `  `# Print the results ` `print``(factor_analysis) `

Output:

```Call:
factanal(x = mtcars, factors = 3, rotation = "varimax")

Uniquenesses:
mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
0.135 0.055 0.090 0.127 0.290 0.060 0.051 0.223 0.208 0.125 0.158

Factor1 Factor2 Factor3
mpg   0.643  -0.478  -0.473
cyl  -0.618   0.703   0.261
disp -0.719   0.537   0.323
hp   -0.291   0.725   0.513
drat  0.804  -0.241
wt   -0.778   0.248   0.524
qsec -0.177  -0.946  -0.151
vs    0.295  -0.805  -0.204
am    0.880
gear  0.908           0.224
carb  0.114   0.559   0.719

Factor1 Factor2 Factor3
Proportion Var   0.398   0.320   0.143
Cumulative Var   0.398   0.718   0.862

Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 30.53 on 25 degrees of freedom.
The p-value is 0.205```

In this example, we load the psych package, which provides functions for data analysis and visualization, and the mtcars data set, which contains information about different car models. We then use the factanal() function to perform factor analysis on the mtcars data set, specifying that we want to extract three factors and use the varimax rotation method. Finally, we print the results of the factor analysis.

### Conclusion

In conclusion, factor analysis is a useful statistical technique for identifying underlying factors or latent variables that explain the correlations among a set of observed variables. In R programming, the psych package provides a range of functions for conducting factor analysis, which can be used to extract meaningful insights from complex datasets.