College of Liberal Arts & Sciences

# Michael Levine - Colloquium Speaker

Abstract:

We consider a semiparametric mixture of two univariate density functions where one of them is known while the weight and the other function are unknown. Such mixtures have a history of application to the problem of detecting differentially expressed genes under two or more conditions in microarray data. Until now, some additional knowledge about the unknown component (e.g., the fact that it belongs to a location family) has been assumed. As opposed to this approach, we do not assume any additional structure on the unknown density function. For this mixture model, we derive a new sufficient identifiability condition and pinpoint a specific class of distributions describing the unknown component for which this condition is mostly satisfied. We also suggest a novel approach to estimation of this model that is based on an idea of applying a maximum smoothed likelihood to what would have otherwise been an ill-posed problem. We introduce an iterative MM (Majorization-Minimization) algorithm that estimates all of the model parameters. We establish that the algorithm possesses a descent property with respect to a log-likelihood objective functional and prove that the algorithm, indeed, converges. A partial solution to the problem of large-sample convergence will also be discussed. Finally, we also illustrate the performance of our algorithm in a simulation study and using a real dataset.