Dataset for: An adaptive kriging method for solving nonlinear inverse statistical problems

In various environmental contexts, estimating the distribution of unobserved random vectors X_i from some noisy indirect observations H(X_i)+U_i is required. If the relation between X_i and the quantity H(X_i), measured with the error U_i, is implemented by a CPU-consuming computer model H, a major practical difficulty is to perform the statistical inference with a relatively small number of runs of H. Following Fu et al. (2014), a Bayesian statistical framework is considered to make use of possible prior knowledge on the parameters of the distribution of the X_i, which is assumed Gaussian. Moreover, a Markov Chain Monte Carlo (MCMC) algorithm is carried out to estimate their posterior distribution by replacing H by a kriging metamodel build from a limited number of simulated experiments. Two heuristics, involving two different criteria to be optimized, are proposed to sequentially design these computer experiments in the limits of a given computational budget. The first criterion is a Weighted Integrated Mean Square Error (WIMSE). The second one, called Expected Conditional Divergence (ECD), developed in the spirit of the Stepwise Uncertainty Reduction (SUR) (Vazquez and Bect 2009), is based on the discrepancy between two consecutive approximations of the target posterior distribution. Several numerical comparisons conducted over a toy example then a motivating hydraulic real case-study show that such adaptive designs can significantly outperform the classical choice of a maximin Latin Hypercube Design (LHD) of experiments. Dealing with a major concern in hydraulic engineering, a particular emphasis is placed upon the prior elicitation of the case-study, highlighting the overall feasibility of the methodology. Faster convergences and manageability considerations lead to recommend the use of the ECD criterion in practical applications.