College of Liberal Arts & Sciences
2011 Fall Colloquia
|Thursday, September 15|
Title: The Joys of Geocoding (from a Spatial Statistician's Perspective)
Abstract: In many applications of spatial statistics to public health and the social sciences, the spatial locations of study subjects are taken to be the geographic (lat-long) coordinates of their places of residence. The process by which the coordinates are determined is known as geocoding. Typically, geocodes are determined using geographic information system software that attempts to match each address to an address-ranged street segment within a road database and then inter- polates the position of the address within that segment. Unfortunately, this process results in many errors (incorrect geocoding), and for various reasons many addresses simply fail to geocode (incomplete geocoding). In this talk, I describe common features of these errors and the missingness and the effects they can have on statistical inference. I also describe how several existing spatial statistical methods can be modified to properly account for the errors and the selection bias induced by them. Geocoded data from a rural health study in Carroll County, Iowa will be used to illustrate the issues and methods.
|Thursday, September 22|
Title: Hyperparameter and Model Selection for Nonparametric Bayes Problems via Radon-Nikodym Derivatives
Abstract: We consider families of semiparametric Bayesian models based on Dirichlet process mixtures, indexed by a multidimensional hyperparameter that includes the precision parameter. We wish to select the hyperparameter by considering Bayes factors. Our approach involves distinguishing some arbitrary value of the hyperparameter, and estimating the Bayes factor for the model indexed by the hyperparameter vs. the model indexed by the distinguished point, as the hyperparameter varies. The approach requires us to select a finite number of hyperparameter values, and for each get Markov chain Monte Carlo samples from the posterior distribution corresponding to the model indexed by that hyperparameter value. Implementation of the approach relies on a likelihood ratio formula for Dirichlet process models. Because we may view parametric models as limiting cases where the precision hyperparameter is infinity, the method also enables us to decide whether or not to use a semiparametric or an entirely parametric model. We illustrate the methodology through two detailed examples involving meta-analysis.
|Thursday, September 29|
Title: Option Pricing Without Tears
Abstract: Currently, a major segment of U.S. life insurance business is the variable annuities, which are investment products with (exotic) options and insurance guarantees. Many of these options and guarantees should be priced, hedged, and reserved using modern option-pricing theory, which involves sophisticated mathematical tools such as martingales, Brownian motion, stochastic differential equations, and so on. This talk will show that, if the guarantees are exercisable only at the moment of death of the policyholder, the mathematics simplifies to an elementary calculus exercise.
|Thursday, October 6|
Colloquium is jointly sponsored with the University of Iowa Tippie College of Business Deparment of Management Sciences.
Title: Dynamic Trees for Response Surface Learning and Optimization
Abstract: This talk presents a new response surface methodology, dynamic trees, for learning, optimization, and sequential design -- applications where Gaussian processes (GPs) have reigned supreme. Dynamic trees are thrifty on space and time: no need to store or invert big matrices. They are flexible: natively accommodating nonstationarity or heteroskedasticity. And they are inherently sequential, which means sequential design decisions (like for optimization/exploration) can be turned around quickly. Other benefits include the ability to deal with categorical predictors and responses, and to decompose partial dependencies by main, first order, and total effects, and fully online implementation. The talk will focus on the the dynamic tree methodology, inference by sequential Monte Carlo, design/optimization heuristics by active learning with illustrations, and applications in massive and streaming data contexts. It will also highlight an R package implementing the methods, called dynaTree, which is available on CRAN. For more details see: http://arxiv.org/abs/0912.1586
|Thursday, October 13|
Title: Easy User Interfaces Via rJava
Abstract: Some time ago, I started a website that offers various applets for calculating sample size. The site has received a fair amount of use, but it is limited to some standard situations where it can use my Java code for noncentral t, noncentral F, etc. It would be nice to have these access the power of R so that we can consider more sophisticated methods, do simulations, and so forth. The applets are based on a Java class that facilitates presenting an effective user interface with minimal code. It turns out that with the rJava package, all that is needed is a simple extension of the original base class for the applets that stores the values of variables in R rather than in Java. We now can do about the same amount of coding in R that previously was done in Java, and thereby considerably broaden what is possible. The talk will demonstrate how this works with several examples done live.
|Thursday, October 27|
Title: Estimation and Selection Properties of the LAD Fused Lasso Signal Approximator
Abstract: The fused lasso is an important method for signal processing when the hidden signals are sparse and blocky. It is often used in combination with the squared loss function. However, the squared loss is not suitable for heavy tail error distributions nor is robust against outliers which arise often in practice. The least absolute deviations (LAD) loss provides a robust alternative to the squared loss. In this paper, we study the asymptotic properties of the fused lasso estimator with the LAD loss for signal approximation. We refer to this estimator as the LAD fused lasso signal approximator, or LAD-FLSA. We investigate the estimation consistency properties of the LAD-FLSA and provide sufficient conditions under which the LAD-FLSA is sign consistent. We also construct an unbiased estimator for the generalized degrees of freedom (GDF) of the LAD-FLSA for any given tuning parameters. Both simulation studies and real data analysis are conducted to illustrate the performance of the LAD-FLSA and the effect of the unbiased estimator of GDF.
|Thursday, November 3|
Title: Improving Estimates of Regional Biogeochemical Interannual Variability
Abstract: Interannual variability in terrestrial vegetation states and processes have significant implications for social and natural systems. Year-to-year changes in the timing and magnitude of structure and productivity impact biodiversity, hydrological partitioning, food security, GHG emissions, and other ecosystem services and processes. Wide-swath remote sensing data have been widely used to monitor vegetation phenology and improve estimates of biogeochemical cycles such as carbon dynamics. This talk will examine the application of wide-swath optical remote sensing data to problems of estimating and modeling regional variability in vegetation photosynthesis and ecohydrological dynamics with a specific focus on the pre-processing, biophysical products and specific limitations of wide-swath data.
|Thursday, November 10|
Title: Some Developments for the R Engine
Abstract: The R language for statistical computing and graphics has become a major framework for both statistical practice and research. This talk will describe some current efforts on improvements to the core computational engine, including work on compilation of R code and efforts to take advantage of multiple processor cores.
|Thursday, December 1|
Title: Analysis of Quantitative Driving Data in Healthy and Impaired Populations
Abstract: In our driving simulators and instrumented vehicles, we often collect electronic measures at high capture rates, e.g., between 10 and 100 frames per second. A methodological challenge we face is how to reduce such data to a manageable number of parameters to use for testing scientific hypotheses.Â One model that we have developed for lateral vehicular control is a third-order autoregressive model that has been modified to accommodate the reflective nature of the driving lane boundaries.Â This model takes into account the driverâ€™s efforts to re-center the vehicle within the lane, and it also permits the average lane position to vary among drivers.Â A two-stage method of fitting this model can be accomplished using standard statistical procedures, such as those in SAS and R.Â We illustrate how the parameter estimates of this model vary between healthy and impaired drivers, and how one of the model parameters is associated with clinical predictors and with on-road error outcomes.Â Simulation studies are also shown that demonstrate the statistical properties of our two-stage method for fitting the data, as well as the ability of such models to recreate subject data. This research has been supported by NIH grants AG017177, AG015071, and NS044930.Â Currently Nissan Motor Corporation is supporting PhD Candidate Amy M. Johnson, who is a co-author on this work.