# Regression least squares

Hallo

I want to calculate a regression for the correlation of one independent variable with repeated measurements. An instrument measures a substance level at discrete steps (calibration curve) and this is repeated several times. In Excel and statistiXL only two one-dimenstional vectors can be correlated. Is there a procedure for repeated measures at distinct levels? Klaus

• Hi Klaus

I have forwarded your query on to Philip and he should have a reply for you tomorrow.

Best Regards

Alan
• Hello Klaus

Regression with repeated X measures is not an option in statistiXL (or other packages that I know of), but the calculations are exactly the same as a regular regression on the data. The difference is that repeat of X measures allows separation of the residual sum of squares into the "pure error" sum of squares and the "lack of fit" sum of squares. This is because the repeat X measure allows the estimation of the "pure error" sum of squares. Note that the repeated measure must be a true, independent repeat e.g. you you can't just repeat a measure for the same individual, or remeasure the value for a sample - you have to measure a different individual with the same X value, or measure a different standard of the same X value.

So, you can analyse your data with X repeats as a standard regression, and get the same result - ASSUMING that the "lack of fit" sum of squares is NOT significant (if it were significant, then you should abandon the particular regression model and seek a better one e.g. try a quadratic model).

It isn't too hard to calculate the pure error SS - if you are interested then read my description below.

Phil Withers

********************************************************************************************************************
To calculate the pure error SS in repeated X regression.
An excellent description of this is given by Draper & Smith (1998) Applied Regression Analysis,
whose example I use
********************************************************************************************************************

1. For each set of repeated X values calculate the sum of each Y-squared value and add them together
e.g. X,Y =(4.0, 2.8), (4.0, 2.8) and (4.0, 2.2) , Y2 = 2.8^2 + 2.8^2 + 2.2^2 = 20.52

2. Sum the Y values for each of these repeated X values, square this, and divide by how many repeats there are
e.g. ((2.8 + 2.8 + 2.2)^2)/3 = 20.28

3. The number of degrees of freedom for this sum of squares is the number of the X repeats - 1
e.g. df = 3-1 = 2

4. Sum all of the sums of squares, and all of the degrees of freedom - this is the PURE ERROR SS and DF
e.g. total repeat SS = 7.055, total df = 10

5. Get the residual SS and df from the normal regression analysis
e.g. residualSS = 15.278, df = 21

6. The Lack of Fit SS and df are obtained by subtraction from the residual SS and DF
e.g. LofF SS = 15.278 - 7.055 = 8.233, df = 21 - 10 = 11

7. Calculate the mean squares in the normal fashion (SS / df)
e.g. LofF MS = 8.233/11 = 0.748 Pure Error MS = 7.055/10 = 0.7055

8. Calculate F for LofF MS in the usual way (LofF MS/Pure Error MS)
e.g. F = 0.748/0.7055 = 1.061

9. Check significance of this F value:
IF F is non-significant, proceed with the regression in the conventional way and calculate
regression F, etc (i.e. just use repeat X values as X values)
IF F is significant, then stop and rethink your model - maybe a quadratic model is more appropriate •  QUOTE (Philip Withers @ 28 Apr 2005, 23:46) Hello KlausRegression with repeated X measures is not an option in statistiXL (or other packages that I know of), but the calculations are exactly the same as a regular regression on the data. The difference is that repeat of X measures allows separation of the residual sum of squares into the "pure error" sum of squares and the "lack of fit" sum of squares. This is because the repeat X measure allows the estimation of the "pure error" sum of squares. Note that the repeated measure must be a true, independent repeat e.g. you you can't just repeat a measure for the same individual, or remeasure the value for a sample - you have to measure a different individual with the same X value, or measure a different standard of the same X value.So, you can analyse your data with X repeats as a standard regression, and get the same result - ASSUMING that the "lack of fit" sum of squares is NOT significant (if it were significant, then you should abandon the particular regression model and seek a better one e.g. try a quadratic model).It isn't too hard to calculate the pure error SS - if you are interested then read my description below.Phil Withers********************************************************************************************************************To calculate the pure error SS in repeated X regression.An excellent description of this is given by Draper & Smith (1998) Applied Regression Analysis, whose example I use********************************************************************************************************************1. For each set of repeated X values calculate the sum of each Y-squared value and add them together e.g. X,Y =(4.0, 2.8), (4.0, 2.8) and (4.0, 2.2) , Y2 = 2.8^2 + 2.8^2 + 2.2^2 = 20.522. Sum the Y values for each of these repeated X values, square this, and divide by how many repeats there are e.g. ((2.8 + 2.8 + 2.2)^2)/3 = 20.283. The number of degrees of freedom for this sum of squares is the number of the X repeats - 1 e.g. df = 3-1 = 24. Sum all of the sums of squares, and all of the degrees of freedom - this is the PURE ERROR SS and DF e.g. total repeat SS = 7.055, total df = 105. Get the residual SS and df from the normal regression analysis e.g. residualSS = 15.278, df = 216. The Lack of Fit SS and df are obtained by subtraction from the residual SS and DF e.g. LofF SS = 15.278 - 7.055 = 8.233, df = 21 - 10 = 117. Calculate the mean squares in the normal fashion (SS / df) e.g. LofF MS = 8.233/11 = 0.748 Pure Error MS = 7.055/10 = 0.70558. Calculate F for LofF MS in the usual way (LofF MS/Pure Error MS) e.g. F = 0.748/0.7055 = 1.0619. Check significance of this F value: IF F is non-significant, proceed with the regression in the conventional way and calculate regression F, etc (i.e. just use repeat X values as X values) IF F is significant, then stop and rethink your model - maybe a quadratic model is more appropriate (look at your residuals) Klaus 