Dataset for: Bayesian Finite Population Modeling for Spatial Process Settings
datasetposted on 23.11.2019 by Alec M. Chan-Golston, Sudipto Banerjee, Mark S. Handcock
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
We develop a Bayesian model-based approach to finite population estimation accounting for spatial dependence. Our innovation here is a framework that achieves inference for finite population quantities in spatial process settings. A key distinction from the small area estimation setting is that we analyze finite populations referenced by their geographic coordinates (point-referenced data). Specifically, we consider a two-stage sampling design in which the primary units are geographic regions, the secondary units are point-referenced locations, and the measured values are assumed to be a partial realization of a spatial process. Traditional geostatistical models do not account for variation attributable to finite population sampling designs, which can impair inferential performance. On the other hand, design-based estimates will ignore the spatial dependence in the finite population. This motivates the introduction of geostatistical processes that will enable inference at arbitrary locations in our domain of interest.We demonstrate using simulation experiments that process-based finite population sampling models considerably improve model fit and inference over models that fail to account for spatial correlation. Furthermore, the process based models offer richer inference with spatially interpolated maps over the entire region. We reinforce these improvements and demonstrate scalable inference for groundwater Nitrate levels in the population of California Central Valley wells by offering estimates of mean Nitrate levels and their spatially interpolated maps.