Open In App

Introduction to Factor Analysis

Improve
Improve
Like Article
Like
Save
Share
Report

Factor analysis is a statistical method used to analyze the relationships among a set of observed variables by explaining the correlations or covariances between them in terms of a smaller number of unobserved variables called factors.

Introduction-to-Factor-Analysis

What is Factor Analysis?

Factor analysis, a method within the realm of statistics and part of the general linear model (GLM), serves to condense numerous variables into a smaller set of factors. By doing so, it captures the maximum shared variance among the variables and condenses them into a unified score, which can subsequently be utilized for further analysis.Factor analysis operates under several assumptions: linearity in relationships, absence of multicollinearity among variables, inclusion of relevant variables in the analysis, and genuine correlations between variables and factors. While multiple methods exist, principal component analysis stands out as the most prevalent approach in practice.

What is factor?

In the context of factor analysis, a “factor” refers to an underlying, unobserved variable or latent construct that represents a common source of variation among a set of observed variables. These observed variables, also known as indicators or manifest variables, are the measurable variables that are directly observed or measured in a study.

Why do we need factor analysis?

Factor analysis serves several purposes and objectives in statistical analysis:

  1. Dimensionality Reduction: Factor analysis helps in reducing the number of variables under consideration by identifying a smaller number of underlying factors that explain the correlations or covariances among the observed variables. This simplification can make the data more manageable and easier to interpret.
  2. Identifying Latent Constructs: It allows researchers to identify latent constructs or underlying factors that may not be directly observable but are inferred from patterns in the observed data. These latent constructs can represent theoretical concepts, such as personality traits, attitudes, or socioeconomic status.
  3. Data Summarization: By condensing the information from multiple variables into a smaller set of factors, factor analysis provides a more concise summary of the data while retaining as much relevant information as possible.
  4. Hypothesis Testing: Factor analysis can be used to test hypotheses about the underlying structure of the data. For example, researchers may have theoretical expectations about how variables should be related to each other, and factor analysis can help evaluate whether these expectations are supported by the data.
  5. Variable Selection: It aids in identifying which variables are most important or relevant for explaining the underlying factors. This can help in prioritizing variables for further analysis or for developing more parsimonious models.
  6. Improving Predictive Models: Factor analysis can be used as a preprocessing step to improve the performance of predictive models by reducing multicollinearity among predictors and capturing the shared variance among variables more efficient.

Terminology related to factor analysis

  1. Factor Loadings:
    • Factor loadings represent the correlations between the observed variables and the underlying factors in factor analysis. They indicate the strength and direction of the relationship between each variable and each factor.
      • Squaring the standardized factor loading gives the “communality,” which represents the proportion of variance in a variable explained by the factor.
  2. Communality:
    • Communality is the sum of the squared factor loadings for a given variable across all factors.It measures the proportion of variance in a variable that is explained by all the factors jointly.
      • Communality can be interpreted as the reliability of the variable in the context of the factors being considered.
  3. Spurious Solutions:
    • If the communality of a variable exceeds 1.0, it indicates a spurious solution, which may result from factors such as a small sample size or extracting too many or too few factors.
  4. Uniqueness of a Variable:
    • Uniqueness of a variable represents the variability of the variable minus its communality.It reflects the proportion of variance in a variable that is not accounted for by the factors.
  5. Eigenvalues/Characteristic Roots:
    • Eigenvalues measure the amount of variation in the total sample accounted for by each factor.They indicate the importance of each factor in explaining the variance in the variables.
      • A higher eigenvalue suggests a more important factor in explaining the data.
  6. Extraction Sums of Squared Loadings:
    • These are the sums of squared loadings associated with each extracted factor.They provide information on how much variance in the variables is accounted for by each factor.
  7. Factor Scores:
    • Factor scores represent the scores of each case (row) on each factor (column) in the factor analysis.They are computed by multiplying each case’s standardized score on each variable by the corresponding factor loading and summing these products.

Types of Factor Analysis

Two types of factor analysis:

  1. Exploratory Factor Analysis (EFA):
    • EFA is used to uncover the underlying structure of a set of observed variables without imposing preconceived notions about how many factors there are or how the variables are related to each factor. It explores complex interrelationships among items and aims to group items that are part of unified concepts or constructs.
      • Researchers do not make a priori assumptions about the relationships among factors, allowing the data to reveal the structure organically.
      • EFA helps in identifying the number of factors needed to account for the variance in the observed variables and understanding the relationships between variables and factors.
  2. Confirmatory Factor Analysis (CFA):
    • CFA is a more structured approach that tests specific hypotheses about the relationships between observed variables and latent factors based on prior theoretical knowledge or expectations. It uses structural equation modeling techniques to test a measurement model, wherein the observed variables are assumed to load onto specific factors.
      • CFA assesses the fit of the hypothesized model to the actual data, examining how well the observed variables align with the proposed factor structure.
      • This method allows for the evaluation of relationships between observed variables and unobserved factors, and it can accommodate measurement error.
      • Researchers hypothesize the relationships between variables and factors before conducting the analysis, and the model is tested against empirical data to determine its validity.

In summary, while EFA is more exploratory and flexible, allowing the data to dictate the factor structure, CFA is more confirmatory, testing specific hypotheses about how the observed variables are related to latent factors. Both methods are valuable tools in understanding the underlying structure of data and have their respective strengths and applications.

Types of factor extraction methods

  1. Principal Component Analysis (PCA):
    • PCA is a widely used method for factor extraction.
    • It aims to extract factors that account for the maximum possible variance in the observed variables.
    • Factor weights are computed to extract successive factors until no further meaningful variance can be extracted.
    • After extraction, the factor model is often rotated for further analysis to enhance interpretability.
  2. Canonical Factor Analysis:
    • Also known as Rao’s canonical factoring, this method computes a similar model to PCA but uses the principal axis method.
    • It seeks factors that have the highest canonical correlation with the observed variables.
    • Canonical factor analysis is not affected by arbitrary rescaling of the data, making it robust to certain data transformations.
  3. Common Factor Analysis:
    • Also referred to as Principal Factor Analysis (PFA) or Principal Axis Factoring (PAF).
    • This method aims to identify the fewest factors necessary to account for the common variance (correlation) among a set of variables.
    • Unlike PCA, common factor analysis focuses on capturing shared variance rather than overall variance.

Assumptions of factor analysis:

  1. Linearity: The relationships between variables and factors are assumed to be linear.
  2. Multivariate Normality: The variables in the dataset should follow a multivariate normal distribution.
  3. No Multicollinearity: Variables should not be highly correlated with each other, as high multicollinearity can affect the stability and reliability of the factor analysis results.
  4. Adequate Sample Size: Factor analysis generally requires a sufficient sample size to produce reliable results. The adequacy of the sample size can depend on factors such as the complexity of the model and the ratio of variables to cases.
  5. Homoscedasticity: The variance of the variables should be roughly equal across different levels of the factors.
  6. Uniqueness: Each variable should have unique variance that is not explained by the factors. This assumption is particularly important in common factor analysis.
  7. Independent Observations: The observations in the dataset should be independent of each other.
  8. Linearity of Factor Scores: The relationship between the observed variables and the latent factors is assumed to be linear, even though the observed variables may not be linearly related to each other.
  9. Interval or Ratio Scale: Factor analysis typically assumes that the variables are measured on interval or ratio scales, as opposed to nominal or ordinal scales.

Violation of these assumptions can lead to biased parameter estimates and inaccurate interpretations of the results. Therefore, it’s important to assess the data for these assumptions before conducting factor analysis and to consider potential remedies or alternative methods if the assumptions are not met.



Last Updated : 11 Mar, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads