Variable Selection in Joint Modelling of Mean and Variance for Multilevel Data

Student thesis: Phd


We propose to extend the use of penalized likelihood based variable selectionmethods to hierarchical generalized linear models (HGLMs) for jointly modellingboth the mean and variance structures. We are interested in applying these newmethods on multilevel structured data, hence we assume a two-level hierarchicalstructure, with subjects nested within groups. We consider a generalized linearmixed model (GLMM) for the mean, with a structured dispersion in the formof a generalized linear model (GLM). In the first instance, we model the varianceof the random effects which are present in the mean model, or in otherwords the variation between groups (between-level variation). In the second scenario,we model the dispersion parameter associated with the conditional varianceof the response, which could also be thought of as the variation betweensubjects (within-level variation). To do variable selection, we use the smoothlyclipped absolute deviation (SCAD) penalty, a penalized likelihood variable selectionmethod, which shrinks the coefficients of redundant variables to 0 and at thesame time estimates the coefficients of the remaining important covariates. Ourmethods are likelihood based and so in order to estimate the fixed effects in ourmodels, we apply iterative procedures such as the Newton-Raphson method, inthe form of the LQA algorithm proposed by Fan and Li (2001). We carry out simulationstudies for both the joint models for the mean and variance of the randomeffects, as well as the joint models for the mean and dispersion of the response,to assess the performance of our new procedures against a similar process whichexcludes variable selection. The results show that our method increases both theaccuracy and efficiency of the resulting penalized MLEs and has 100% successrate in identifying the zero and non-zero components over 100 simulations. Forthe main real data analysis, we use the Health Survey for England (HSE) 2004dataset. We investigate how obesity is linked to several factors such as smoking,drinking, exercise, long-standing illness, to name a few. We also discover whetherthere is variation in obesity between individuals and between households of individuals,as well as test whether that variation depends on some of the factorsaffecting obesity itself.
Date of Award31 Dec 2011
Original languageEnglish
Awarding Institution
  • The University of Manchester
SupervisorJianxin Pan (Supervisor)


  • Variable Selection
  • SCAD
  • H-likelihood
  • Generalized Linear Mixed Models
  • Hierarchical Data
  • Joint Mean and Variance modelling

Cite this