Dataset for: Modelling imperfect presence data obtained by citizen science

dataset

posted on 2017-05-22, 02:17 authored by Kerrie Mengersen, Erin Peterson, Samuel Clifford, Nan Ye, June Kim, Tomasz Bednarz, Ross Brown, Allan James, Julie Vercelloni, Alan R Pearse, Jacqueline Davis, Vanessa Hunter

There is growing awareness about the potential benefit of harnessing citizen science for research, particularly in the biological and environmental sciences. Data quality is a major constraint in the use of citizen-science data; in particular, imperfect observations. In this paper we fit species distribution models (SDMs) to presence-only data (presences and counts, with no absences observed) by exploiting the uncertainty in reported presences, instead of generating pseudo-absences as is common in previous presence-only studies. This approach allowed us to extend the suite of models to include those commonly fit to presence/absence and abundance data. We fit several models to a case study dataset of jaguar encounters reported by citizens in the Peruvian Amazon. The true species distribution for the case study data is unknown, and so we also undertake an extensive simulation study to evaluate model performance. We analyze the sources of error by studying the bias and variance of the models, and also discuss the predictive performance of each model and its ability to recover the true species distribution. The simulation study shows that although several approaches are capable of recovering the species distribution, the choice of a modelling approach is a complex one, and depends on factors such as inferential aim, model complexity, sample size and computational resources. This study also addresses some issues in dealing with compound-imperfect observations arising from citizen-science data, and we discuss further steps needed in this research area.