ANOVA/ SNK AND UNBALANCED DESIGN
I am currently doing some research involving 6 different species of fish. I am trying to determine if there exist morphological differences for some morphometric characters between species.
In the first stage of the experiment, all species were broken down into a series of morphometric characters, and every specimen collected (within a specific size range) was measured. While for some species we were able to collect a fairly descent sample size, for others (which are rather rare) only two specimens were (and will be) collected. Consequently the number of specimens sampled "N" for each species were:
Next, each character was analysed independently with Single Factor ANOVA and SNK post hoc significance tests were carried out for those with F<0.05. The data was arranged in the same manner that appears in the one-way Analysis of Variance "Strontium concentration in water samples example". Being the column with "species name" input in the Y variable range box (in an analogous manner to the name for each body of water); and the value obtained for each specimen and each character was entered as the "factor range".<br>
F values obtained for 14 out of 25 characters had a very low (0 or very close to 0) Prob. and the SNK post hoc analysis showed significance values (1-Prob) of e.g. .9999 I annex the descriptive statistics table for one character:
Mean Std Dev. Std Err N
A 46.7% 0.6% 0.002 7
B 44.5% 1.0% 0.006 3
C 49.1% 1.5% 0.011 2
D 44.3% 1.5% 0.005 10
E 45.2% 1.1% 0.004 9
F 41.6% 1.5% 0.009 3
Source Type III SS Df Mean Sq. F Prob.
Model 0.009 5 0.002 12.626 0.000
Error 0.004 28 0.000
Total 0.014 33
From the above it was concluded that (a) there exist significant differences between species for that specific character, and after the post - hoc test, it was concluded that e.g. species E is significantly larger than species F.
My questions regarding this issue are:
1. How does the fact that the sample sizes (and consequently the design) are unbalanced affected the results obtained?
2. Was my interpretation of "statistiXL does not require a balanced design (for most models) but it does require that there are no empty cells (except for models with no interaction terms and no nested factors). By this I mean that you can have a different number of measurements for each group, but every group must have at least 1 entry" correct ?
3. In the case that the above answer is yes, what is the proper way to cite statistiXL in the methodology section and refernces of the paper to be published?
Looking forward to your response.