Back to Search Start Over

Bayesian approach for sample size determination, illustrated with Soil Health Card data of Andhra Pradesh (India)

Authors :
Bas Kempen
Balwinder-Singh
Dick J. Brus
David G. Rossiter
Andrew M. McDonald
Source :
Geoderma 405 (2022), Geoderma, Geoderma, 405
Publication Year :
2022

Abstract

Highlights • Sample Size Determination (SSD) is a crucial step in sampling design. • Bayesian, mixed Bayesian-likelihood (MBL) and frequentist SSD approaches compared. • Bayesian and MBL SSD approach account for uncertainty about design parameters. • Various SSD criteria derived from probability distribution of credible intervals . • Legacy data on Zn concentration in soil used for postulating prior distributions.<br />A crucial decision in designing a spatial sample for soil survey is the number of sampling locations required to answer, with sufficient accuracy and precision, the questions posed by decision makers at different levels of geographic aggregation. In the Indian Soil Health Card (SHC) scheme, many thousands of locations are sampled per district. In this paper the SHC data are used to estimate the mean of a soil property within a defined study area, e.g., a district, or the areal fraction of the study area where some condition is satisfied, e.g., exceedence of a critical level. The central question is whether this large sample size is needed for this aim. The sample size required for a given maximum length of a confidence interval can be computed with formulas from classical sampling theory, using a prior estimate of the variance of the property of interest within the study area. Similarly, for the areal fraction a prior estimate of this fraction is required. In practice we are uncertain about these prior estimates, and our uncertainty is not accounted for in classical sample size determination (SSD). This deficiency can be overcome with a Bayesian approach, in which the prior estimate of the variance or areal fraction is replaced by a prior distribution. Once new data from the sample are available, this prior distribution is updated to a posterior distribution using Bayes’ rule. The apparent problem with a Bayesian approach prior to a sampling campaign is that the data are not yet available. This dilemma can be solved by computing, for a given sample size, the predictive distribution of the data, given a prior distribution on the population and design parameter. Thus we do not have a single vector with data values, but a finite or infinite set of possible data vectors. As a consequence, we have as many posterior distribution functions as we have data vectors. This leads to a probability distribution of lengths or coverages of Bayesian credible intervals, from which various criteria for SSD can be derived. Besides the fully Bayesian approach, a mixed Bayesian-likelihood approach for SSD is available. This is of interest when, after the data have been collected, we prefer to estimate the mean from these data only, using the frequentist approach, ignoring the prior distribution. The fully Bayesian and mixed Bayesian-likelihood approach are illustrated for estimating the mean of log-transformed Zn and the areal fraction with Zn-deficiency, defined as Zn concentration

Details

Language :
English
ISSN :
00167061
Database :
OpenAIRE
Journal :
Geoderma 405 (2022), Geoderma, Geoderma, 405
Accession number :
edsair.doi.dedup.....8c49d65eba518f03e9d39c24351e591b