Sanjib Basu - Colloquium Speaker
We consider criterion based variable selection in which selection is done by optimizing a criterion over the model space. These methods differ from many regularized methods that perform variable selection in the process of model fitting. The Deviance Information Criterion (DIC) is a popular Bayesian selection criterion based on penalized goodness of fit of a model. Another selection criterion is the Log Pseudo Marginal Likelihood (LPML) belonging to the broad class of cross-validation based selection methods. These criteria are often estimated by Markov Chain sampling whereas the traditional Bayesian model selection criterion of marginal likelihood and Bayes factor can often be difficult to estimate. We investigate the theoretical performances of these criteria in selecting the data-generating model and obtain rather surprising results. These theoretical results are illustrated in extensive simulation studies. Criterion based model selection can be difficult to implement in high dimensional model space due to the computational limitations of an enumerative search. We propose a model space search methodology and compare its performance with other recent methods. We further examine the implication of these results for variable selection in complex biomedical models.