Does StatistiXL determine factors of success?

I am fairly competent with Excel, but very much a novice with stastics.

To simplify my question, I present this hypothetical:

Suppose I want to know what factor or factors allow me to run a mile under six minutes.
I have a spreadsheet with each row representing one day, because I run the mile test once each day. For each row in column A I have input a "1" if I ran under six minutes and a "0" if I did not.
I have 210 additional columns that reflect data such as:

In Column B I record how many hours of sleep I got that night.
In Column C I record what fruit I ate for breakfast that day.
In Column D I record whether I played ball with my kids.
In Column E I record the outside temperature.
In Column F I record how much water I drank for the day.

At the end of the month I determine that I have run under 6 minutes on 18 of the 30 days. What I want is for Excel or StatistiXL to tell me what factors most led to success. For example, the program might reveal that on the 10 days that I ate bananas I ran well on 8 of those days. And further, on 6 of the 7 days that I both ate bananas AND got eight hours sleep, I ran well. Thus the highest percentage or correlation for successful running was eating bananas AND getting eight hours sleep. The next highest correlation was when I both drank a gallon of water AND it was between 70 and 75 degrees.

Can I get this type of information? Keep in mind, for my worksheet I have 210 columns per record, so there's not much space left to write formulas.

I hope that I have been sufficiently detailed.
Thank you in advance for the help!



  • Hello Dew

    I think that what you are looking for is a discriminant analysis, to discriminate between running faster and slower than 6 minutes. Use column A as your discriminant factor. You could use all of your variables (columns B to .....) as the discriminant variables. The analysis will look for the best linear combination of your variables that discriminant the two factors (> 6 min and < 6 min).

    You will have to keep your discriminant variables as numeric - some already are (e.g. how many hours of sleep) - for for those that aren't use 0 or 1 (e.g. for whether you played ball with the kids: 0 = no, 1 = yes). Some of your variables will be a problem e.g. what fruit you ate - you can't have apples and oranges as a numeric variable. You could have apples = 0 and oranges = 1, but what about pears? Pears = 2 implies that oranges are half way between apples and pears! So, if you can solve this problem, a discriminant analysis might work.

    I would worry that you have so many (210) variables though - some might appear to be significant just by chance.

    Another strategy might be to use your actual running time as a Y variable, and do a stepwise linear regression of Y on your 210 numeric X variables.

    I hope this is of some assistance. Good luck, and keep fit!

    Phil Withers
Sign In or Register to comment.