What to do with missing/incomplete data?
The problem of missing data is very complex. For unsupervised learning, conventional statistical methods for missing data are often appropriate (Little and Rubin, 1987; Schafer, 1997). There is a concise introduction to these methods in the University of Texas statistics FAQ at http://www.utexas.edu/cc/faqs/stat/general/gen25.html. For supervised learning, the considerations are somewhat different, as discussed by Sarle (1998). The statistical literature on missing data deals almost exclusively with training rather than prediction (e.g., Little, 1992). For example, if you have only a small proportion of cases with missing data, you can simply throw those cases out for purposes of training; if you want to make predictions for cases with missing inputs, you don’t have the option of throwing those cases out! In theory, Bayesian methods take care of everything, but a full Bayesian analysis is practical only with special models (such as multivariate normal distributions) or small sample siz
Related Questions
- What Maintenance Type Code should a trading partner use to complete missing data, when a partially incomplete First Report has been submitted earlier?
- What maintenance type code should a trading partner use to report missing data on a previously submitted but incomplete transaction?
- Why is the weather data missing?