Steven L. Scott, PhD. - Colloquium Speaker & Researcher at Google

Steven L. Scott is a statistician primarily interested in the development and application of Bayesian methods. He received his PhD from the Harvard statistics department in 1998, and served on the faculty of USC's Marshall School of Business until 2007.
Date: 
Thursday, April 17, 2014 - 3:30pm to 4:30pm
Colloquium Title: 
Bayes and Big Data: The Consensus Monte Carlo Algorithm
Location: 
Reception at 3 p.m. in 241 B Schaeffer Hall / Talk at 3:30 in 61 Schaeffer

 

Abstract:  A useful definition of "big data" is data that is too big to comfortably process on a single
machine, either because of processor, memory, or disk bottlenecks. Graphics processing units
can alleviate the processor bottleneck, but memory or disk bottlenecks can only be eliminated
by splitting data across multiple machines. Communication between large numbers of machines
is expensive (regardless of the amount of data being communicated), so there is a need for
algorithms that perform distributed approximate Bayesian analyses with minimal communication.
Consensus Monte Carlo operates by running a separate Monte Carlo algorithm on each
machine, and then averaging individual Monte Carlo draws across machines. Depending on the
model, the resulting draws can be nearly indistinguishable from the draws that would have been
obtained by running a single machine algorithm for a very long time. Examples of consensus
Monte Carlo are shown for simple models where single-machine solutions are available, for large
single-layer hierarchical models, and for Bayesian additive regression trees (BART).