# Regression least squares

I want to calculate a regression for the correlation of one independent variable with repeated measurements. An instrument measures a substance level at discrete steps (calibration curve) and this is repeated several times. In Excel and statistiXL only two one-dimenstional vectors can be correlated. Is there a procedure for repeated measures at distinct levels?

Klaus

## Comments

I have forwarded your query on to Philip and he should have a reply for you tomorrow.

Best Regards

Alan

Regression with repeated X measures is not an option in statistiXL (or other packages that I know of), but the calculations are exactly the same as a regular regression on the data. The difference is that repeat of X measures allows separation of the residual sum of squares into the "pure error" sum of squares and the "lack of fit" sum of squares. This is because the repeat X measure allows the estimation of the "pure error" sum of squares. Note that the repeated measure must be a true, independent repeat e.g. you you can't just repeat a measure for the same individual, or remeasure the value for a sample - you have to measure a different individual with the same X value, or measure a different standard of the same X value.

So, you can analyse your data with X repeats as a standard regression, and get the same result - ASSUMING that the "lack of fit" sum of squares is NOT significant (if it were significant, then you should abandon the particular regression model and seek a better one e.g. try a quadratic model).

It isn't too hard to calculate the pure error SS - if you are interested then read my description below.

Phil Withers

********************************************************************************************************************

To calculate the pure error SS in repeated X regression.

An excellent description of this is given by Draper & Smith (1998) Applied Regression Analysis,

whose example I use

********************************************************************************************************************

1. For each set of repeated X values calculate the sum of each Y-squared value and add them together

e.g. X,Y =(4.0, 2.8), (4.0, 2.8) and (4.0, 2.2) , Y2 = 2.8^2 + 2.8^2 + 2.2^2 = 20.52

2. Sum the Y values for each of these repeated X values, square this, and divide by how many repeats there are

e.g. ((2.8 + 2.8 + 2.2)^2)/3 = 20.28

3. The number of degrees of freedom for this sum of squares is the number of the X repeats - 1

e.g. df = 3-1 = 2

4. Sum all of the sums of squares, and all of the degrees of freedom - this is the PURE ERROR SS and DF

e.g. total repeat SS = 7.055, total df = 10

5. Get the residual SS and df from the normal regression analysis

e.g. residualSS = 15.278, df = 21

6. The Lack of Fit SS and df are obtained by subtraction from the residual SS and DF

e.g. LofF SS = 15.278 - 7.055 = 8.233, df = 21 - 10 = 11

7. Calculate the mean squares in the normal fashion (SS / df)

e.g. LofF MS = 8.233/11 = 0.748 Pure Error MS = 7.055/10 = 0.7055

8. Calculate F for LofF MS in the usual way (LofF MS/Pure Error MS)

e.g. F = 0.748/0.7055 = 1.061

9. Check significance of this F value:

IF F is non-significant, proceed with the regression in the conventional way and calculate

regression F, etc (i.e. just use repeat X values as X values)

IF F is significant, then stop and rethink your model - maybe a quadratic model is more appropriate

(look at your residuals)

QUOTE(Philip Withers @ 28 Apr 2005, 23:46)Regression with repeated X measures is not an option in statistiXL (or other packages that I know of), but the calculations are exactly the same as a regular regression on the data. The difference is that repeat of X measures allows separation of the residual sum of squares into the "pure error" sum of squares and the "lack of fit" sum of squares. This is because the repeat X measure allows the estimation of the "pure error" sum of squares. Note that the repeated measure must be a true, independent repeat e.g. you you can't just repeat a measure for the same individual, or remeasure the value for a sample - you have to measure a different individual with the same X value, or measure a different standard of the same X value.

So, you can analyse your data with X repeats as a standard regression, and get the same result - ASSUMING that the "lack of fit" sum of squares is NOT significant (if it were significant, then you should abandon the particular regression model and seek a better one e.g. try a quadratic model).

It isn't too hard to calculate the pure error SS - if you are interested then read my description below.

Phil Withers

********************************************************************************************************************

To calculate the pure error SS in repeated X regression.

An excellent description of this is given by Draper & Smith (1998) Applied Regression Analysis,

whose example I use

1. For each set of repeated X values calculate the sum of each Y-squared value and add them together

e.g. X,Y =(4.0, 2.8), (4.0, 2.8) and (4.0, 2.2) , Y2 = 2.8^2 + 2.8^2 + 2.2^2 = 20.52

2. Sum the Y values for each of these repeated X values, square this, and divide by how many repeats there are

e.g. ((2.8 + 2.8 + 2.2)^2)/3 = 20.28

3. The number of degrees of freedom for this sum of squares is the number of the X repeats - 1

e.g. df = 3-1 = 2

4. Sum all of the sums of squares, and all of the degrees of freedom - this is the PURE ERROR SS and DF

e.g. total repeat SS = 7.055, total df = 10

5. Get the residual SS and df from the normal regression analysis

e.g. residualSS = 15.278, df = 21

6. The Lack of Fit SS and df are obtained by subtraction from the residual SS and DF

e.g. LofF SS = 15.278 - 7.055 = 8.233, df = 21 - 10 = 11

7. Calculate the mean squares in the normal fashion (SS / df)

e.g. LofF MS = 8.233/11 = 0.748 Pure Error MS = 7.055/10 = 0.7055

8. Calculate F for LofF MS in the usual way (LofF MS/Pure Error MS)

e.g. F = 0.748/0.7055 = 1.061

9. Check significance of this F value:

IF F is non-significant, proceed with the regression in the conventional way and calculate

regression F, etc (i.e. just use repeat X values as X values)

IF F is significant, then stop and rethink your model - maybe a quadratic model is more appropriate

(look at your residuals)

a very simple and thorough description. It solves my problem.

Klaus