• Non ci sono risultati.

A COMPARISON BETWEEN BAYESIAN AND ORDINARY KRIGING BASED ON VALIDATION CRITERIA: APPLICATION TO RADIOLOGICAL CHARACTERISATION

3. Numerical tests and results

60 ๐›ผ-CI plot and Mean Squared Error ๐›ผ (๐‘€๐‘†๐ธ๐›ผ)

The Gaussian process model allows to build prediction intervals of any level ๐›ผ โˆˆ]0,1[:

๐ถ๐ผ.`๐‘ง(๐‘ฅ4)b = ลธ๐‘งฬ‚O4โˆ’ ๐‘ ฬ‚O4๐‘ž

&O. 6

๐’ฉ(M,&); ๐‘งฬ‚O4+ ๐‘ ฬ‚O4๐‘ž

&O. 6 ๐’ฉ(M,&) , where ๐‘ž&O+

#

๐’ฉ(M,&) is the quantile of order 1 โˆ’.6 of the standard normal distribution. This expression is only valid if all parameters are known. For example, if the scale parameter is incorrectly estimated, the width of the predicted confidence intervals will not reflect what we might observe. But how can we validate a confidence interval without prior knowledge of the model parameters? The idea behind this criterion is to evaluate empirically the number of observations falling into the predicted confidence intervals and to compare this empirical estimation to the theoretical ones expected:

ฮ”.=G&โˆ‘G4%&๐›ฟ4 where ๐›ฟ4= ยฃ1 if ๐‘ง(๐‘ฅ4) โˆˆ ๐ถ๐ผ.(๐‘ง(๐‘ฅ4)

0 else .

This value can be computed for varying ๐›ผ, and can then be visualised against the theoretical values, yielding what Demay et al. (2022) call the ๐›ผ-CI plot.

Similarly to the ๐‘ƒ๐ผ๐ด, the ๐›ผ-CI plot must be adapted to the Bayesian kriging since the posterior distribution is not Gaussian. We therefore introduce a slightly different criterion based on the quantiles of the predictive distribution. More precisely, the ๐›ผ-CI plot relies now on credible intervals defined as:

๐ถ๐ผยง.(๐‘ง(๐‘ฅ4)) = ลธ๐‘žลก&O.

6 ,O4 ; ๐‘žลก&B.

6 ,O4 , where ๐‘žลก!$+

# ,O4 (respectively ๐‘žลก!,+

# ,O4 ) is the estimation of the quantile of order &O.6 (respectively &B.6 ) of the predictive distribution (at location ๐‘ฅ4) of the model built without the ๐‘–-th observation.

Once again, we obtain a criterion that is identical for both methods when the predictive distribution is Gaussian.

Illustrations of ๐›ผ-CI plot can be found in Demay et al. (2022). To summarise the ๐›ผ-CI plot, we also introduce a quantitative criterion called the Mean Squared Error ๐›ผ defined as follows:

๐‘€๐‘†๐ธ๐›ผ = >(ฮ”.-โˆ’ ๐›ผ5)ยฒ

G+

5%&

,

where ๐‘›. is the number of widths considered for prediction intervals and ๐›ผ5 the width of the ๐‘—-th confidence interval ๐ถ๐ผยง.

- (in practice a regular discretization of ๐›ผ on ]0,1[ will be considered to compute ๐‘€๐‘†๐ธ๐›ผ). The closer this criterion is to 0, the better the confidence/credible intervals are in average.

The different aforementioned criteria provide complementary information to evaluate the prediction quality of the kriging model, either in terms of mean, variance or confidence/credible intervals. They will be used in the following to compare the performance of ordinary and Bayesian kriging.

61 We simulate data sets of different sizes, varying from 16 and 81 observations, sampled in a square grid on the input space. For each size, the process is repeated 100 times with independent random Gaussian process simulations.

For each data set, Bayesian and ordinary kriging models are estimated and the different validation criteria are computed by cross-validation. Results are given in Figure 1 with boxplots w.r.t. the data set sizes.

The results indicate that Bayesian kriging performs better in terms of both mean and prediction variance for small sample sizes. More precisely, Bayesian kriging outperforms ordinary kriging on all the four criteria for data sets with less than 40 observations. This result is especially visible for the ๐‘ƒ๐‘‰๐ด and ๐‘ƒ๐ผ๐ด and shows that the main difference between both kriging methods still lies in the predictive variance estimation. This is mainly because the Bayesian kriging accounts for more uncertainty of the estimates of Gaussian process parameters than ordinary kriging. Bayesian kriging therefore yields larger and more accurate prediction intervals, and as a result better ๐‘ƒ๐‘‰๐ด, ๐‘ƒ๐ผ๐ด, and ๐‘€๐‘†๐ธ๐›ผ criteria.

Figure 1 โ€“ Distribution of validation criteria (Qยฒ, ๐‘ƒ๐‘‰๐ด, ๐‘ƒ๐ผ๐ด, and ๐‘€๐‘†๐ธ๐›ผ) w.r.t. the size of data sets, for simulated data.

It can also be noted that for larger data sets, Bayesian and ordinary kriging yield similar results. This observation was to be expected, since Bayesian and inferential methodology coincide for larger data sets. It can be therefore argued that Bayesian kriging becomes less advantageous and relevant for data set with more than 40 observations, since its computational cost is higher than that of ordinary kriging. Note that ๐‘„ยฒ values are also extremely low for 49 observations or fewer, but again this is to be expected for very small data sets.

3.2. Real application case: G3's data set

We apply a similar protocol to the real data set of the G3 reactor in CEA Marcoule. This data set is made of 70 observations of radioactivity measurements sampled in the input domain [0,10] ร— [0,7]. To generate multiple data sets, we resampled without replacement data sets of various sizes 20, 30, 40, 50, 60 and 70 observations, with the last one being the real size of the original data set. Once again, the process is repeated 100 times for each sample size (except for 70 observations) and for each sample a cross-validation is applied to estimate the validation criteria.

The obtained results are given in Figure 2. They are similar to the ones obtained for the simulated data sets.

We can remark that the variance of each validation criterion is reduced as the data sets size grows. This is both explained by the larger data sets, but also by our protocol, where observations are randomly drawn without replacement among the original 70 observations, so that as the data set sizes increases, the samples differ less and less. It can be noted that ordinary kriging seems to be slightly better than Bayesian kriging for larger data sets, reinforcing our precedent argument that Bayesian kriging should be reserved for smaller data sets for which the uncertainty in parameter estimation is high.

62 Figure 2 โ€“ Distribution of validation criteria (Qยฒ, ๐‘ƒ๐‘‰๐ด, ๐‘ƒ๐ผ๐ด, and ๐‘€๐‘†๐ธ๐›ผ) w.r.t. the size of data sets, for the G3 data set.

4. Discussion and Conclusions

In conclusion, the use of Bayesian kriging for spatial interpolation of data sets in support of decommissioning and dismantling projects shows promising results. It is particularly true for small data sets for which it outperforms the ordinary kriging in terms of accuracy of predictive mean, variance and predictive intervals.

This advantage becomes less important as the sample size increases: ordinary kriging, less computationally expensive, is then preferable for large data sets. Bayesian kriging has also the drawback of requiring a prior specification, which is often difficult to choose and can strongly influence the predictions. Therefore, the use of Bayesian kriging should be restricted to smaller data sets or cases in which prior information on parameters is well known. Our future work will focus on better modelling of measurement uncertainty in Bayesian kriging, particularly through the use of heteroscedastic models (Ng and Yin (2012)).

References

Al-Mudhafar, W.J. (2019). Bayesian kriging for reproducing reservoir heterogeneity in a tidal depositional environment of a sandstone formation. Journal of Applied Geophysics 160, 84โ€“102. https://doi.org/10.1016/j.jappgeo.2018.11.007.

Bachoc, F. (2013a). Cross Validation and Maximum Likelihood estimations of hyper-parameters of Gaussian processes with model misspecification. Computational Statistics & Data Analysis 66, 55โ€“69. https://doi.org/10.1016/j.csda.2013.03.016 Bachoc, F. (2013b). Estimation paramรฉtrique de la fonction de covariance dans le modรจle de krigeage par processus gaussiens: application ร  la quantification des incertitudes en simulation numรฉrique, PhD Thesis of University Paris VII.

CEA DEN (2017). Lโ€™assainissement-dรฉmantรจlement des installations nuclรฉaires, Le Moniteur. ed, Monographie CEA.

Chilรจs, J.-P., Delfiner, P. (2012). Geostatisticsโ€‰: Modeling Spatial Uncertainty, Second Edition. ed, Wiley Series In Probability and Statistics. Wiley.

Cressie, N. (1993). Statistics for spatial data. John Wiley & Sons.

Demay, C., Iooss, B., Le Gratiet, L., Marrel, A. (2022). Model selection based on validation criteria for Gaussian process regression: An application with highlights on the predictive variance. Quality and Reliability Engineering International, 38:1482-1500. https://doi.org/10.1002/qre.2973.

Desnoyers, Y. (2010). Approche mรฉthodologique pour la caractรฉrisation gรฉostatistique des contaminations radiologiques dans les installations nuclรฉaires, PhD Thesis of Ecole des Mines de Paris.

Diggle, P.J., Ribeiro, P.J. (2007). Model-based Geostatistics, Springer Series in Statistics. Springer.

Diggle, P.J., Ribeiro, P.J. (2002). Bayesian Inference in Gaussian Model-based Geostatistics. Geographical and Environmental Modelling 6, 129โ€“146. https://doi.org/10.1080/1361593022000029467.

Gupta, A., Kamble, T., Machiwal, D. (2017). Comparison of ordinary and Bayesian kriging techniques in depicting rainfall variability in arid and semi-arid regions of north-west India. Environ Earth Sci 76, 512. https://doi.org/10.1007/s12665-017-6814-3.

Kitanidis, P.K. (1986). Parameter Uncertainty in Estimation of Spatial Functions: Bayesian Analysis. Water Resources Research 22, 499โ€“507. https://doi.org/10.1029/WR022i004p00499.

Krivoruchko, K., Gribov, A. (2019). Evaluation of empirical Bayesian kriging. Spatial Statistics 32, 100368.

https://doi.org/10.1016/j.spasta.2019.100368.

Ng, S.H., Yin, J. (2012). Bayesian Kriging Analysis and Design for Stochastic Simulations. ACM Trans. Model. Comput.

Simul. 22, 17:1-17:26. https://doi.org/10.1145/2331140.2331145.

Rasmussen, C.E., Williams, C.K.I. (2006). Gaussian Processes for Machine Learning. MIT Press.

Webster, R., Oliver, M.A. (2007). Geostatistics for environmental scientists. John Wiley & Sons.

63

DOES MORE INFORMATION INCLUDED IN SPATIALLY DISTRIBUTED FIELDS LEAD TO

Documenti correlati