Journal of Statistical and Econometric Methods

Modeling High Dimensional Multilevel Data using the Lasso Estimator: A Simulation Study

  • Pdf Icon [ Download ]
  • Times downloaded: 9854
  • Abstract

    In some situations, researchers are faced with high dimensional data, where the number of variables in the dataset is large, and the sample size is relatively small. In such cases standard statistical methods do not perform well, making model parameter estimation potentially problematic. In order to deal with such high dimensional data, statisticians have developed estimators, such as the lasso, that are specially designed to provide model parameter estimates for such scenarios. Recently, this work has been extended to the context of high dimensional multilevel, or mixed effects data in which individuals at level-1 are nested within clusters at level-2. Such data structures are extremely common in the social sciences, particularly education and sociology. The goal of this simulation study was to assess a multilevel extension of the lasso estimator in high dimensional multilevel data case, and to compare this approach with the standard restricted maximum likelihood estimator typically used to fit multilevel models. Results of the study demonstrated that the multilevel lasso yielded better control of the Type I error rate and better parameter coverage than did REML, when level-1 and level-2 sample sizes were small, and there were many predictor variables. Implications of these results are discussed.

    Mathematics Subject Classification: 62P99
    Keywords: high dimensional data; multilevel and mixed effects models; Lasso Estimator