Principal component analysis(PCA) in R programming is analysis on the linear components of all existing attributes. Principal components are linear combination(orthogonal transformation) of the original predictor in the dataset. It is a useful technique for EDA(Exploratory data analysis) and allowing you to better visualize the variations present in a dataset with many variables.
First principal component captures the maximum variance in dataset. It determines direction in of higher variability. Second principal component captures the remaining variance in data and is uncorrelated with PC1. The correlation between PC1 and PC2 should be zero. So, all succeeding principal components follows the same concept. They capture the remaining variance without being correlated to previous principal component.
mtcars(motor trend car road test) comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. It comes pre-installed with dplyr package in R.
Performing PCA using dataset
We perform Principal component analysis on
mtcars which consists of 32 car brands and 10 variables.
- Bi plot
The resultant principal components are plotted as Biplot. Scale value 0 represents that arrows are scaled representing loadings.
- Variance explained for each principal component
Scree Plot represents the proportion of variance and principal component. Below 2 principal components, there is maximum proportion of variance as clearly seen in the plot.
- Cumulative proportion of variance
Scree Plot represents the Cumulative proportion of variance and principal component. Above 2 principal components, there is maximum cumulative proportion of variance as clearly seen in the plot.
- Decision tree model
Decision tree model was build to predict disp using other variables in the dataset and using anova method. The decision tree plot is plotted and displays the information.