Difficulties verifying outputted PCA scores
I have had some difficulties verifying the outputted PCA scores - that is, when I put the original raw data into the equations to compute the principal components, I do not duplicate the values generated by the program. Therefore, I must presume that I am an idiot and have missed some elementary algebraic step. Nevertheless, after re-checking my calculations, I seem to arrive at the same discrepancy.
To illustrate my procedure, I will use the example data provided in the help menu (I am using version statistiXL v1.6) - I do this simply because everyone here has access to that data set:
The first line of data (which I will assume is "Case 1" when it is transformed via PCA), is:
WDIM=15.50; CIRCUM=59.69; FBEYE=21.10; EYEHD=10.30; EARHD=13.40; JAW=12.40
and the average value for each variable over the 60 cases given are
WDIM*=15.5; CIRCUM*=57.575; FBEYE*=19.807; EYEHD*=10.513; EARHD*=13.575; JAW*=11.873
Let us say that we applied PCA to the correlation matrix; then, the component score coefficients for PC1 are as listed in the help menu:
a1=0.511 (WDIM); a2=0.561 (CIRCUM); a3=0.462 (FBEYE); a4=0.144 (EYEHD); a5=0.110 (EARHD); a6=0.421 (JAW)
and, for PC2:
b1=-0.008 (WDIM); b2=0.087 (CIRCUM); b3=-0.147 (FBEYE); b4=0.664 (EYEHD); b5=0.644 (EARHD); b6=-0.339 (JAW)
Then, for Case 1, the first principal component score is calculated as
PC 1 = a1(WDIM-WDIM*)+a2(CIRCUM-CIRCUM*)+a3(FBEYE-FBEYE*)+a4(EYEHD-EYEHD*)+a5(EARHD-EARHD*)+a6(JAW-JAW*)
PC 1 = 0.511*(15.50-15.5)+0.561*(59.69-57.575)+0.462*(21.10-19.807)+0.144*(10.30-10.513)+0.110*(13.40-13.575)+0.421*(12.40-11.873)
PC 1 = 1.955826, which matches the value of 1.952 listed in the outputted casewise scores provided (within rounding errors due to the coefficients a1 through a6, of course). So far, so good.
Similarly, the second principal component score of Case 1 is
PC 2 = b1(WDIM-WDIM*)+b2(CIRCUM-CIRCUM*)+b3(FBEYE-FBEYE*)+b4(EYEHD-EYEHD*)+b5(EARHD-EARHD*)+b6(JAW-JAW*)
PC 2 = -0.008*(15.50-15.5)+0.087*(59.69-57.575)-0.147*(21.10-19.807)+0.664*(10.30-10.513)+0.644*(13.40-13.575)-0.339*(12.40-11.873)
PC 2 = -0.43885, which does not match the value of -0.760 listed in the outputted casewise scores for "PCA 2" under Case 1. Given that the PC coefficients are provided to three decimal places, this discrepancy could not be the result of rounding errors.
Is this an isolated typographical error in the help menu, or am I missing something? I am experiencing similar and multiple discrepancies in my own data set, where the outputted casewise scores also do not match the "by hand" calculations.
The algebra seems to be exceedingly facile and transparent, but yet the discrepancy resists my best and repeated attempts to reveal a flaw. Any thoughts?
Albert Loui, Ph.D.
Lawrence Livermore National Laboratory, U.S.A.