2024 Hogg and Craig Lecturer is David Dunson

Arts and Sciences Distinguished Professor of Statistical Science, Trinity College of Arts & Sciences, Duke University
Thursday, April 18, 2024 - 3:30pm to Friday, April 19, 2024 - 3:30pm

Dr. David Dunson from Duke University will be our 51st Hogg and Craig Lecturer.  

Early in the 1969-70 academic year, Professor Allen T. Craig announced his retirement. He gave a retirement talk in January 1970. Under the leadership of Craig’s student and co-author, Professor Robert V. Hogg, the department decided to establish a lecture series to honor Professor Craig. His January 1970 talk was the first in this series. When Professor Hogg passed away at the age of 90 in 2014, the department decided to incorporate his name into the lecture series.


David Dunson

David Dunson's research focuses on developing statistical and machine learning methodology for analysis and interpretation of complex and high-dimensional data, with a particular emphasis on scientific applications, Bayesian statistics and probability modeling approaches. Methods development and theory is directly motivated by challenging applications in neuroscience, genomics, environmental health, and ecology among others. His work has had a substantial impact, with an H-index of 94. He has received numerous awards, including a gold medal from the U.S. Environmental Protection Agency, the COPSS Presidents' Award given to one outstanding statistician each year, the Mortimer Spiegelman Award given to one outstanding public health statistician each year, a highly cited researcher award from Web of Science, an IMS Medallion lecture, and most recently the G.W. Snedecor Award of the Committee of the Presidents of Statistical Societies (COPSS). Dr. Dunson is a highly regarded researcher with a profound expertise in mathematics and statistics, particularly in the realm of machine learning. His research interests are centered around statistical science and its practical applications, where he is notably passionate about Bayesian modeling, computational statistics, and machine learning techniques. One of the key areas of his focus lies in developing cutting-edge methodologies to address challenges posed by complex and high-dimensional data in diverse fields, including epidemiology, neurosciences, and ecology. In epidemiology, Dr. Dunson leverages machine learning algorithms to analyze large-scale health datasets, enabling a deeper understanding of disease transmission and risk factors. In neurosciences, he employs sophisticated machine learning approaches to glean insights from brain imaging data, unraveling the complexities of brain function and neurological disorders. Moreover, Dr. Dunson's contributions in ecology involve using machine learning to investigate intricate ecological patterns and dynamics, aiding conservation efforts and ecological management. Through his interdisciplinary and machine learning-driven approach, Dr. Dunson continues to push the boundaries of statistical science, leaving a lasting impact on various scientific disciplines.



Thursday, April 18

1:30 p.m. Refreshments and Awards in 302 Schaeffer Hall (SH)

3:30 p.m. Lecture #1 in 107 English-Philosophy Building (EPB)

Improving understanding of life on earth through novel data and statistics

Biodiversity data tend to be extremely biased towards large and charismatic organisms that are relatively easy to observe and accessible to human observers. We seek to address this gap and fundamentally improve understanding of life on earth through (relatively) unbiased automated monitoring of insects, fungi, birds and mammals at sites across the earth. Each site contains audio monitors (to identify bird vocalizations), camera traps (to detect mammals and large birds), malaise traps (to capture insects) and cyclone samplers (to capture fungal spores). Taxonomic classification of the insect and fungi species is based on DNA barcoding applied to the collected samples. There is interest in applying joint species distribution modeling (JSDMs) to infer the impact of covariates (habitat, environmental disruption, climate, etc.) on the biological communities being monitored, while also inferring interaction networks among the species. In addition, there is interest in the discovery of new species and the study of factors related to biodiversity. Our data contain large numbers of insects and fungi species that were previously unknown to science, and a fundamental aspect of the data is that most of the species being sampled are extremely rare. This talk will introduce our ERC-funded Lifeplan study and describe some of the exciting data being collected, while highlighting the important role of novel AI, machine learning and statistical methods in analyzing and interpreting the data.


Friday, April 19

2:30 p.m. Reception in 241 Schaeffer Hall (SH)

3:30 p.m. Lecture #2 in 107 English-Philosophy Building (EPB)

Novel models and algorithms for massive-dimensional and sparse multivariate data

Motivated by biodiversity data collected in our ongoing Lifeplan study, we seek to address fundamental challenges that arise in ecological joint species distribution modeling (JSDMs). Current JSDMs take the form of multivariate binary hierarchical regression models, with the binary outcome vector indicating occurrences of different species in a sample and covariates including factors such as habitat, environmental disruption, climate, etc. The state-of-the-art Hierarchical Modeling of Species Communities (HMSC) framework, which is broadly used in the ecology community, uses multivariate probit hierarchical factor regression models implemented in a Bayesian framework with Gibbs sampling. We are motivated by several challenges that arise in applying HMSC to fungi and insect data in Lifeplan and related studies: (1) current algorithms are too slow to handle all the 10,000s of species observed in the data and hence analyses have focused on common species; (2) statistical models are not structured to handle extremely rare species that are only observed a few times in the dataset; (3) in practice, we cannot pre-specify the species identities before collecting the data and indeed we discover species unknown to science as we sample. We propose several novel methods to address these problems, which are of broad independent interest in efficiently fitting factor and latent feature models to massive-dimensional data. We illustrate these methods through applications to several biodiversity datasets.