Confirmatory Factor Analysis in R

Confirmatory Factor Analysis (CFA) is a powerful statistical technique used to validate and understand the underlying structure of observed variables. Whether we're trying to understand why people behave the way they do or figuring out what makes customers tick, Confirmatory Factor Analysis is like a detective, piecing together clues to reveal the hidden structure. In this article, we will discuss how to measure Confirmatory Factor Analysis in R Programming Language.

What is Confirmatory Factor Analysis?

Confirmatory Factor Analysis (CFA) is a statistical method that helps us understand relationships between different variables in data. It's like a puzzle solver - it helps us see how pieces (or variables) fit together to form bigger patterns (or factors). Confirmatory Factor Analysis is often used in fields like psychology, education, and marketing to test theories and understand how different factors influence each other.

Features of Confirmatory Factor Analysis

Understanding Relationships: Confirmatory Factor Analysis helps understand how variables relate to each other.
Testing Theories: It confirms if observed data matches theoretical expectations.
Identifying Hidden Factors: It uncovers underlying constructs not directly observable.
Validating Measures: CFA ensures measurement scales accurately capture intended concepts.
Model Evaluation: It provides fit statistics to assess model adequacy.

Implement of Confirmatory Factor Analysis in R

We will take HolzingerSwineford1939 dataset that contains cognitive test scores of 301 schoolchildren, which can be used to demonstrate our Confirmatory Factor Analysis.

Step 1: Load the required packages

#install.packages("lavaan")

# Load required package
library(lavaan)

Step 2: Load and Check the Structure of dataset

# Load & Check the Structure 
data(HolzingerSwineford1939)
head(HolzingerSwineford1939)

Output:

  id sex ageyr agemo  school grade       x1   x2    x3       x4   x5        x6       x7   x8       x9
1  1   1    13     1 Pasteur     7 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
2  2   2    13     7 Pasteur     7 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
3  3   2    13     1 Pasteur     7 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
4  4   1    13     2 Pasteur     7 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
5  5   2    12     2 Pasteur     7 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
6  6   2    14     1 Pasteur     7 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000

Step 3: Specify the CFA Model

# Specify the CFA model
model <- '
    visual =~ x1 + x2 + x3
    textual =~ x4 + x5 + x6
    speed =~ x7 + x8 + x9
'

This model provided specifies the relationships between latent constructs (visual, textual, and speed) and their respective observed indicators (x1 to x9) in a Confirmatory Factor Analysis framework. This allows researchers to test hypotheses about the underlying structure of the observed data and to evaluate the fit of the proposed model to the observed data.

Step 4: Run and Check the summary of CFA

# Run CFA
cfa_result <- cfa(model, data = HolzingerSwineford1939)

# Interpret the results
summary(cfa_result)

Output:

lavaan 0.6.17 ended normally after 35 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21

  Number of observations                           301

Model Test User Model:
                                                      
  Test statistic                                85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual =~                                           
    x1                1.000                           
    x2                0.554    0.100    5.554    0.000
    x3                0.729    0.109    6.685    0.000
  textual =~                                          
    x4                1.000                           
    x5                1.113    0.065   17.014    0.000
    x6                0.926    0.055   16.703    0.000
  speed =~                                            
    x7                1.000                           
    x8                1.180    0.165    7.152    0.000
    x9                1.082    0.151    7.155    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  visual ~~                                           
    textual           0.408    0.074    5.552    0.000
    speed             0.262    0.056    4.660    0.000
  textual ~~                                          
    speed             0.173    0.049    3.518    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .x1                0.549    0.114    4.833    0.000
   .x2                1.134    0.102   11.146    0.000
   .x3                0.844    0.091    9.317    0.000
   .x4                0.371    0.048    7.779    0.000
   .x5                0.446    0.058    7.642    0.000
   .x6                0.356    0.043    8.277    0.000
   .x7                0.799    0.081    9.823    0.000
   .x8                0.488    0.074    6.573    0.000
   .x9                0.566    0.071    8.003    0.000
    visual            0.809    0.145    5.564    0.000
    textual           0.979    0.112    8.737    0.000
    speed             0.384    0.086    4.451    0.000

In the above code

CFA model is defined, indicating the relationship between observed variables and latent factors .
'visual', 'textual', and 'speed' are latent factors.
'x1' to 'x9' are observed variables representing the cognitive test scores.
The '~' symbol denotes the relationship between observed variables and latent factors.
Each latent factor is associated with three observed variables.
The cfa() function runs the Confirmatory Factor Analysis using the specified model and the dataset.
summary() function is used to interpret the results of the CFA analysis, providing information such as factor loadings, standard errors, and fit indices.

Conclusion

Confirmatory Factor Analysis (CFA) is a valuable tool for understanding hidden structures within observed variables. Here we explored its significance across various fields and its practical implementation in R using the 'lavaan' package with the 'HolzingerSwineford1939' dataset. Confirmatory Factor Analysis serves as a powerful instrument for unraveling complex data structures and facilitating informed decision-making across diverse domains.

Article Tags :

R Language

R Statistics-Function