Back to Search
Start Over
Effects of ignoring survey design information for data reuse
- Source :
- Ecological Applications (1051-0761) (Wiley), 2021-09, Vol. 31, N. 6, P. e02360 (8p.)
- Publication Year :
- 2020
-
Abstract
- Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: "Are aggregated databases currently providing the right information to enable effective and unbiased reuse?" We investigate this question, with a focus on designs that purposefully favor the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating data sets with progressively more uneven inclusion probabilities and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g., inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse.
- Subjects :
- survey design
0106 biological sciences
bias
Computer science
Inference
Sample (statistics)
inclusion probability
Reuse
accessible
computer.software_genre
010603 evolutionary biology
01 natural sciences
population density estimate
Horvitz–Thompson estimator
interoperable
111 Mathematics
Computer Simulation
Horvitz-Thompson estimator
112 Statistics and probability
database
Selection (genetic algorithm)
Probability
model
Ecology
010604 marine biology & hydrobiology
findable
reusable data
Estimator
Sampling (statistics)
Density estimation
reuse
data
Research Design
1181 Ecology, evolutionary biology
INFERENCE
Data mining
computer
Subjects
Details
- ISSN :
- 10510761
- Volume :
- 31
- Issue :
- 6
- Database :
- OpenAIRE
- Journal :
- Ecological applications : a publication of the Ecological Society of AmericaLiterature Cited
- Accession number :
- edsair.doi.dedup.....7aaca2db2048aadd54c0918a16b4e0ca