What is multiple imputation?
Imputation, the practice of ‘filling in’ missing data with plausible values, is an attractive approach to analyzing incomplete data. It apparently solves the missing-data problem at the beginning of the analysis. However, a naive or unprincipled imputation method may create more problems than it solves, distorting estimates, standard errors and hypothesis tests, as documented by Little and Rubin (1987) and others. The question of how to obtain valid inferences from imputed data was addressed by Rubin’s (1987) book on multiple imputation (MI). MI is a Monte Carlo technique in which the missing values are replaced by m>1 simulated versions, where m is typically small (e.g. 3-10). In Rubin’s method for `repeated imputation’ inference, each of the simulated complete datasets is analyzed by standard methods, and the results are combined to produce estimates and confidence intervals that incorporate missing-data uncertainty. Rubin (1987) addresses potential uses of MI primarily for large pub