College of Liberal Arts & Sciences
Congratulations to our PhD student, Zhen Wang for successfully presenting his dissertation via Zoom!
Bayesian Finite Mixture Regression Models with Cluster-specific Variable Selection
It's common to encounter data from heterogeneous populations. In regression problems, subpopulations may exist and differ in terms of which sets of covariates (e.g. treatment) best predict the response, their effect sizes, as well as the distribution of these covariates. When the subpopulations were unidentified or potentially dependent on unobserved variables, clustering analysis is needed conjointly with the regression analysis, for which Bayesian mixture models provide a natural framework. So far, due to computing considerations, only certain infinite mixture models including the Dirichlet process mixture models have been used in this context, though they produce inconsistent estimation of the number of clusters. Our work is a first in using Bayesian finite mixture models to analyze heterogeneous regression data. Specifically, we adopt a class of priors based on the Normalized Independent Finite Point Process (NIFPP), which includes the finite Dirichlet distribution prior as a special case. In addition, by utilizing NIFPP priors with spike-and-slab base measures, we can construct mixture models that are capable of performing cluster-specific variable selection. We prove posterior consistency for the proposed models and develop and code new MCMC algorithms for them. Various ways to do posterior inference are demonstrated, such as clustering, individual profiling, and prediction. Our empirical studies demonstrate that the proposed method outperforms existing Bayesian and non- Bayesian methods in terms of clustering, variable selection, and prediction. Finally, we apply our model to real-world datasets to showcase its practical applications.
Committee Chair: Aixin Tan
Committee Members: Patrick Breheney (Biostatistics), Joyee Ghosh, Sanvesh Srivastava, Luke Tierney