Multiple Imputation of Missing Data in Multilevel Models

Profile picture of Nidhi Menon

Missing data are a common phenomenon in public health research. Multiple Imputation (MI) has been long recognised as an attractive approach to handle missing values. Statisticians are now advocating the use of MI as a gold standard in solving the missing data problem. Despite its early conception and its numerous advantages over the traditional ad hoc methods, there is still limited application of MI in public health research.

The theory of multiple imputation requires that imputations be made conditional on the sampling design. Not accounting for complex sample design features, such as stratification and clustering, during imputations can yield biased estimates from a design‐based perspective. Most datasets in public health research show some form of natural clustering (individuals within households, households within the same district, patients within wards, etc.). Cluster effects are often of interest in health research. These data structures are commonly observed in clinical and biostatistical settings where individuals are clustered within households, health care providers and so on. Real data example includes the three-level National Family Health Survey conducted in India with individuals nested within households, further nested within PSUs. Missing values can occur at any level in multilevel data, but guidance on multiple imputation in data with more than two levels is currently an open research question.

My study implements the Gelman and Hill approach for imputation of missing data at higher levels by including aggregate forms of individual-level measurements to impute for missing values at higher levels. The performance of popular methods of imputations, MICE and JoMo is compared to ad hoc procedures like available case analysis. Performance measures include bias in estimates, mean squared errors and probability coverage of confidence intervals. My study highlights the strengths and limitations of imputation for variables in datasets with more than two levels.

About Nidhi

Nidhi is a PhD candidate at the Research School of Population Health. She has a background in Statistics and a Masters in Biostatistics. Nidhi has also worked as a Statistical Programmer and Biostatistician with
TATA Consultancy Services in India. Her area of research is centred on multiple imputation of missing data and its application in a multilevel hierarchical model. Nidhi is currently working as a Biostatistician at the Biological Data Science Institute, ANU.