ANOVA/ SNK AND UNBALANCED DESIGN

Hello,

I am currently doing some research involving 6 different species of fish. I am trying to determine if there exist morphological differences for some morphometric characters between species.

In the first stage of the experiment, all species were broken down into a series of morphometric characters, and every specimen collected (within a specific size range) was measured. While for some species we were able to collect a fairly descent sample size, for others (which are rather rare) only two specimens were (and will be) collected. Consequently the number of specimens sampled "N" for each species were:

A 7
B 3
C 2
D 10
E 9
F 3

Next, each character was analysed independently with Single Factor ANOVA and SNK post hoc significance tests were carried out for those with F<0.05. The data was arranged in the same manner that appears in the one-way Analysis of Variance "Strontium concentration in water samples example". Being the column with "species name" input in the Y variable range box (in an analogous manner to the name for each body of water); and the value obtained for each specimen and each character was entered as the "factor range".<br>
F values obtained for 14 out of 25 characters had a very low (0 or very close to 0) Prob. and the SNK post hoc analysis showed significance values (1-Prob) of e.g. .9999 I annex the descriptive statistics table for one character:

Mean Std Dev. Std Err N
A 46.7% 0.6% 0.002 7
B 44.5% 1.0% 0.006 3
C 49.1% 1.5% 0.011 2
D 44.3% 1.5% 0.005 10
E 45.2% 1.1% 0.004 9
F 41.6% 1.5% 0.009 3

Source Type III SS Df Mean Sq. F Prob.
Model 0.009 5 0.002 12.626 0.000
Error 0.004 28 0.000
Total 0.014 33

From the above it was concluded that (a) there exist significant differences between species for that specific character, and after the post - hoc test, it was concluded that e.g. species E is significantly larger than species F.

My questions regarding this issue are:

1. How does the fact that the sample sizes (and consequently the design) are unbalanced affected the results obtained?

2. Was my interpretation of "statistiXL does not require a balanced design (for most models) but it does require that there are no empty cells (except for models with no interaction terms and no nested factors). By this I mean that you can have a different number of measurements for each group, but every group must have at least 1 entry" correct ?

3. In the case that the above answer is yes, what is the proper way to cite statistiXL in the methodology section and refernces of the paper to be published?

Yours

Mauricio

• Hello Mauricio

Your description of the data you have and the ANOVA analyses sounds fine.

There is no problem per se with small sample sizes in doing the ANOVA analyses, but clearly the smaller the sample size the lower is your power in detecting differences. So, if you don't get differences then you worry about small sample sizes - if you do get differences, then they would only (presumably) be bigger if you had greater sampel sizes.

Your interpretation of balanced versus missing cells is right - you can't have empty cell combinations of your factors, but you can have as few as n=1 in some cells (with the caveat from above).

You could cite your use of the stats program as ..... using statistiXL v ?.? .... and include your version number, and you might want to give a web address since statistiXL does not have a printed manual e.g. ... using statistiXL v?.? (www.statistiXL.com).......

You might also try some multivariate analyses with your data, to analyse more than one character at a time e.g. Multivariate ANOVA, Principal Components Analysis, Descriminant Analysis. These multivariate approaches often have more power in looking at patterns between multiple measures for multiple species, but their use and interpretation gets more complex. Try looking at the Help files and some examples!