I'm trying out the cluster anaysis tool after running Principle Component analysis, extracting the highest corellative variables, and standardizing their units ((x-mean)/stndard dev). I am then using Wards method with Squared Euclidean distances, and looking at the resulting dendrogram. I understand that the distances are somehow related to SSE values, but what specific units would be prescribed to the x axis? Keep in mind that the units are standardized prior to cluster analysis.




    Your question is a bit tricky!

    If you are using the Euclidean distance metric for clustering, and you are using a single metric for clustering, then the units for distance would be the square of the units for the metric, and the units for the dendrogram would also be the square of the units for the metric (although the branch points are calculated from ESS). If you are clustering using more than 1 metric, then the units for distance become undefinable, being the sum of squares of each metric, which is not meaningful if the different metrics have different units. So, I think it is not possible to assign any unit to the distance scale.

    But, you have used PCA to extract the most correlated principal components, and you have standardized, then I presume they are dimensionless numbers anyway.

    I therefore don't think you can assign any meaningful units to the X-axis of the clustering tree.

    I hope this helps.

    Philip Withers
