Clustering results

I must start with an obvious issue - I am not a statistician. blink.gif

That said, I am trying to cluster a set of companies based on a few performance characteristics. I was hoping for a classification scheme that would assign each company to a cluster using a centroid type method, as a competitive product does. I had hoped to see a list of companies with their cluster assignments. But I don't. I am obviously missing something. Can you offer guidance, please.


After more reading, I now understand that the software does not do k-means clustering, so I guess it will not meet my needs. Rats. sad.gif [B]


  • Hi seabc

    I hope this addresses your question. Please let me know if we have misunderstood you! The current version of statistical does not support k-means clustering, but this is an option that we are considering adding in the future. However, we are not sure that k-means clustering is necessarily what you would want to do anyway.

    The advantage (and disadvantage) of k-means clustering is that you predefine how many clusters you want – you may or may not have such an expectation. Maybe you want three groups, “good”, “average” and “bad”, but maybe you don’t want to restrict the analysis to an arbitrary number of groups from the very start of the analysis.

    Other clustering procedures will accomplish a grouping of companies based on your performance characteristics, with no a priori expectation of how many groups you want. For example, statistiXL’s example 2 for clustering shows the grouping of countries based on quantitative characteristics – might this be a good approach for your companies?

    Alternatively, there are non-clustering methods that might be useful e.g. discriminant analysis. If you want a priori groupings of the companies, then try a discriminant grouping analysis.

    We hope that these suggests assist you in accomplishing a useful analysis of your data – there is usually more than one way to skin a cat, and this is usually true of statistical analyses as well. In developing the early versions of statistiXL, we tried to focus on the more generally useful approaches, and decided that k-means clustering was not as generally useful as agglomerative hierarchical clustering.


    Alan (on behalf of Philip)
  • I have to reinforce that I am not a statistician. sad.gif

    My purpose in clustering is to segregate a large group of companies into smaller sets of like companies so that my firm can develop, sell, and deliver solutions to meet the needs of each distinct subset. The advantage of k-means clustering has appeared to be that I can identify demonstrably distinct groups, and that these groups are recognizable and understandable by others in my firm. In the past, I have not had much of a preconception about the "right" number of groups - I have tested several different numbers, and have tried to balance modelling "precision" with understandability.

    I can appreciate that agglomerative hierarchical clustering provides results that are, in many ways, richer than those produced by k-means. But I haven't yet figured out how to communicate the results to others. It appears to me, as a former math major with little statistics experience, that using hierarchical clustering results is a lot easier for people who can recognize and navigate through a series of interacting functions than it is for people who want (and maybe feel that they need) straightforward categorization schemes. So I really need to figure out how to summarize and communicate results. (Speaking of which, I particularly nervous when I imagine trying to interpret the results I find for a set of 1500 companies.) If you have any guidance on how to communicate agglomerative hierarchical clustering results, I would very much appreciate it.

    In the meantime, I will start educating myself about discriminant analysis, as you have suggested.

Sign In or Register to comment.