Sanvesh Srivastava - Colloquium Speaker

Assistant Professor, Department of Statistics and Actuarial Science, University of Iowa
Thursday, October 28, 2021 - 3:30pm
Colloquium Title: 
Asynchronous and Distributed Data Augmentation for Massive Data Settings
Meet and Greet at 3:00 pm in 241 SH / Talk at 3:30 pm in 61 SH


Data augmentation (DA) algorithms are widely used for Bayesian inference due to their simplicity. In massive data settings, however, DA algorithms are prohibitively slow because they pass through the full data in any iteration, imposing serious restrictions on their usage despite the advantages. Addressing this problem, we develop a framework for extending any DA that exploits asynchronous and distributed computing. The extended DA algorithm is indexed by a parameter 0 < r < 1 and is called Asynchronous and Distributed (AD) DA with the original DA as its parent. Any ADDA starts by dividing the full data into k smaller disjoint subsets and storing them on k processes, which could be machines or processors. Every iteration of ADDA augments only an r-fraction of the k data subsets with some positive probability and leaves the remaining (1-r)-fraction of the augmented data unchanged. The parameter draws are obtained using the r-fraction of new and (1-r)-fraction of old augmented data. For many choices of k and r, the fractional updates of ADDA lead to a significant speed-up over the parent DA in massive data settings, and it reduces to the distributed version of its parent DA when r=1. We show that the ADDA Markov chain is Harris ergodic with the desired stationary distribution under mild conditions on the parent DA algorithm. We demonstrate the numerical advantages of the ADDA in three representative examples corresponding to different kinds of massive data settings encountered in applications. In all these examples, our DA generalization is significantly faster than its parent DA algorithm for all the choices of k and r. We also establish geometric ergodicity of the ADDA Markov chain for all three examples, which in turn yields asymptotically valid standard errors for estimates of desired posterior quantities. We also provide an application of our methodology to the MovieLens recommender database.


Topic: Colloquia: Department of Statistics and Actuarial Science, The University of Iowa

Time: October 28, 2021 03:30 PM Central Time (US and Canada)

Join Zoom Meeting

Meeting ID: 989 2869 3758

One tap mobile

+13126266799,,98928693758# US (Chicago)

+16468769923,,98928693758# US (New York)

Dial by your location

        +1 312 626 6799 US (Chicago)

        +1 646 876 9923 US (New York)

        +1 301 715 8592 US (Washington DC)

        +1 346 248 7799 US (Houston)

        +1 669 900 6833 US (San Jose)

        +1 253 215 8782 US (Tacoma)

Meeting ID: 989 2869 3758

Find your local number:

Join by SIP

Join by H.323 (US West) (US East) (India Mumbai) (India Hyderabad) (Amsterdam Netherlands) (Germany) (Australia Sydney) (Australia Melbourne) (Brazil) (Canada Toronto) (Canada Vancouver) (Japan Tokyo) (Japan Osaka)

Meeting ID: 989 2869 3758