Skip to content

statistiXL Features

Principal Component Analysis

As with Factor Analysis, Principal Component Analysis is a technique that attempts to reduce complex data sets consisting of many different variables to a smaller set of new variables that still manage to describe much of the variation in the original data. These new variables, called Principal Components, are chosen to be independent (i.e. the new variables are not correlated whereas the original, untransformed variables may have been correlated) and to maximise the variance found in the original data set. The more significant PCAs are selected based on their eigenvalues, and hopefully far fewer PCA variables (e.g. one or two) are required than there were original variables. These fewer Principal Components can then be further analysed by Regression Analysis or ANOVA/MANOVA. Thus, the role of Principal Component Analysis has been to reduce a large number of variables into fewer, simpler ones. Principal Component Analysis is an alternative to Factor Analysis (both seek to find a simpler structure for a set of variables) but Principal Components are linear combinations of variables whereas variables are linear combinations of Factors.

statistiXL provides a number of options for Principal Component Analysis. Either the correlation or covariance matrix between variables can be selected as the basis for analysis. All Principal Components can be extracted or a subset of these based on limits such as the number to extract, the percent of variance to explain or the value of an eigenvalue. Screeplots can be produced to help in the visual determination of the appropriate number of Principal Components to extract.

Results are presented in tabular and graphical form. Descriptive statistics and the correlation or covariance matrix are displayed, if these options were selected. The eigenvalues are then tabulated along with the percent of variance and cumulative percent of total variance evident in the original dataset that each of the extracted Principal Components explains. The Component Loadings are then listed followed by the Principal Component score coefficients (eigenvectors). The case-wise PCA scores for each extracted component are listed, if this option was selected. Optional graphical output includes a Scree Plot and Bivariate Scatterplots of the various pair-wise combinations of extracted Principal Components.

The help file of statistiXL provides an introduction to Principal Component Analysis, and gives an example of Principal Component Analysis.