# Zeros, Empty Cells and Missing Data

Hi,

I just purchased statistixl and am getting ready to enter data for a survey that is under way. I don't know what is best to do with missing data or cells that might be left blank but the data is not necessarily missing. I will be running mostly descriptive-type stats, some relationships and comparisons. Can you direct me as to what is best to do or where I might look to find this answer specific to how Excel operates.

Thanks for your help and for this handy program!

• Hi

There a 2 main ways to handle missing values, 1) exclude any cases with missing values from the analysis, and 2) replace missing values with the mean for the variable. Both of these have obvious drawbacks. Obviously going down route 1 reduces you sample size. Route 2 biases results by decreasing variances and making apparent differences more likely. Neither is ideal though in the real world it is often an unavoidable necessity.

A nice overview of this can be found in this PDF.

Alan
• It is a good document on missing values, I agree. But there are additional methods that are sometimes used. Simply using the mean, for example, can reduce the variance of a data set. Eliminating values (cases wise etc) can reduce the sample size quite a lot too. So one method (I think called "imputation") is to treat the variables with missing values as Y values in a regression analysis and try to predict on that basis what the missing value should be. I think David Howell's web site (for his textbook textbook has a chapter on missing values that is rather nice. See

http://www.uvm.edu/~dhowell/StatPages/More...ta/Missing.html

Lance
•  QUOTE (LanceGary @ 3 Oct 2007, 20:51) It is a good document on missing values, I agree. But there are additional methods that are sometimes used. Simply using the mean, for example, can reduce the variance of a data set. Eliminating values (cases wise etc) can reduce the sample size quite a lot too. So one method (I think called "imputation") is to treat the variables with missing values as Y values in a regression analysis and try to predict on that basis what the missing value should be. I think David Howell's web site (for his textbook textbook has a chapter on missing values that is rather nice. Seehttp://www.uvm.edu/~dhowell/StatPages/More...ta/Missing.htmlLance
Sorry about the rather incoherent post! I wrote it late at night. But I still think David Howell's chapter is worth reading.

Lance
• Thanks for your help and the articles. Some of the data are just smple counts so I can enter zeros where appropriate. I have also learned that SPSS and SAS are set up to handle missing values, although I don't know exactly what they do. Thanks again for the help!!

Halley'sC.
•  QUOTE I have also learned that SPSS and SAS are set up to handle missing values, although I don't know exactly what they do.
That's the main reason we haven't implemented missing values in statistiXL yet, i.e. we don't want it to be doing things without the user being aware of what is going on and the implications of whatever takes place. Our thoughts have been to include some sort of independent menu option that allows a user to consciously perform missing value manipulation of a dataset prior to further analysis. If anyone has any good suggestions on how they would like to see this implemented then please let us know!

Alan
• Here are some additional resources for dealing with missing data:

Acock, A. C. (2005). Working with missing values. Journal of Marriage
and Family, 67, 1012-1028.

Donders, A. Rogier T., van der Heijden, Geert J.M.G., Stijnen, T., &
Moons, K. G. M. (2006). Review: A gentle introduction to imputation of
missing values. Journal of Clinical Epidemiology, 59, 1087-1091.

Multiple Imputation Online. http://www.multiple-imputation.com/

Schafer, J. L. (1999). Multiple imputation: A primer. Statistical
Methods in Medical Research, 8, 3-15.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the
state of the art. Psychological Methods, 7(2), 147-177.

von Hippel, Paul T. (2004). Biases in SPSS 12.0 missing value
analysis. The American Statistician, 58(2), 160-164.

http://division.aomonline.org/rm/1999_RMD_...issing_Data.htm

•  QUOTE (Alan Roberts @ 12 Oct 2007, 19:10) That's the main reason we haven't implemented missing values in statistiXL yet, i.e. we don't want it to be doing things without the user being aware of what is going on and the implications of whatever takes place. Our thoughts have been to include some sort of independent menu option that allows a user to consciously perform missing value manipulation of a dataset prior to further analysis. If anyone has any good suggestions on how they would like to see this implemented then please let us know!Alan
I think a separate procedure sounds like a good idea. But for some routines where values have not been imputed, having an option that sepeifies how the missing data is to be treated will also be valuable.

Lance