After deciding on the number of clusters I will use I want to quickly identify and tag the cluster membership of each case. I hope there is an easy way because I often use data sets with well over 1000 cases.

To do this with SPSS, I open a dialog box, specify the number of clusters and check "save cluster membership" then SPSS outputs a new variable indicating "cluster membership."

How can I identify and tag cluster membership with statistiXL?


  • Are you refering to Discriminant Analysis? If so, statistiXL will include a list of cases and their assigned groups as part of the output.


  • No, not really referring to discriminant analysis in which I believe we assign group membership ourselves and then estimate a model to correctly classify cases to our pre-determined groups.

    In cluster analysis I am trying to do two things. (1) Determine the number of clusters to use based on finding a good balance between within-cluster homogeneity and between-cluster heterogeneity. (2) Once deciding on the number of clusters with which to divide the cases, tag each case with its cluster membership.

    Once each case is tagged with its cluster membership, then I would use cross-tabs and anova to develop descriptive statistics for each cluster.

    No problem accomplishing step 1, but I must be missing something on step 2. Surely we are not depending on the plots to identify cluster membership for each case. With 1000 or so cases to sort through I was hoping for a utility that would make step 2 quick and painless.

    Sorry for the misunderstanding (Discrim Analysis can also be used to classify cases for which group membership is unknown, based on an analysis of samples where membership is known).

    If you are dealing with Hierarchical Clustering and the dendrograms it produces then statistiXL doesn't currently support assigning cases to groups based on a specified level of relationship (if this is your requirement?)

    Please let me know if I've still got the wrong end of the stick!


