Skip to content

statistiXL Features

Discriminant Analysis

Discriminant Analysis is a technique used to determine which of a number of measured variables are important in distinguishing between objects belonging to known groups. For example a biologist could measure different morphological characteristics (e.g. limb lengths, skull sizes etc) of a range of species and use discriminant analysis to determine which of the measured traits are most useful in predicting species membership. This analysis can typically have two different objectives: 1) To identify the relative contributions of the variables in maximally discriminating between the groups, or 2) To determine mathematical functions based on the measured variables that can then be used to classify new data into the original groups.

statistiXL provides modules for both grouping and classification discriminant analysis. Both analyses provide discriminant functions that best allow for separation of the known groups based upon the measured variables. Testing for excessive colinearity between variables is catered for via estimates of tolerance. In addition to this, the classification module allows for the classification of cases with unknown group membership based on these previously determined functions. The effectiveness of these functions is estimated by also reclassifying the original data (i.e. that belonging to cases from known groups) in order to determine the proportion that are correctly classified. statistiXL also provides an improved estimate of the error rate via the holdout method, in which each case to be classified is in turn excluded from the dataset when calculating the discriminant functions to be used for that particular classification.

Results are presented with an optional display of descriptive statistics for each group/variable combination and the covariance matrix showing the relationships between measured variables. Next, eigenvalues are given (indicators of the amount of variance in the dataset encompassed in a discriminant function), along with values for Wilk’s lambda, Chi2, degrees of freedom, and P value for each discriminant function. Unstandardised and standardised discriminant functions (i.e. the coefficient scores) are then tabulated, along with group centroids. Individual case scores are provided and optional scatterplots of casewise discriminant scores can be created for each pair-wise set of selected discriminant functions; the scatterplots can include graphical representations of the contributions of each variable to the discriminant functions. For classification analysis, a classification table is given for the data set used to derive the classification functions, indicating the proportion of correct classifications for each group. Optionally, a classification table derived from the holdout procedure can also be presented. The classification group scores for an alternate data set (if entered) are then given.

The help file included with statistiXL provides an overview to discriminant analysis, and gives two examples of grouping discriminant analysis (2 groups and 3 groups), and two examples of classification discriminant analysis (2 groups and 3 groups).