Start Over

Spatial data quality: from description to application

Authors :: van Oort, P.A.J.
Wageningen University
Arnold Bregt
Sytze de Bruin
Publication Year :: 2006
Publisher :: Wageningen Universiteit, 2006.
Abstract: The growing availability of spatial data along with growing ease to use the spatial data (thanks to wide-scale adoption of GIS) have made it possible to use spatial data in applications inappropriate considering the quality of the data. As a result, concerns about spatial data quality have increased. To deal with these concerns, it is necessary to (1) formalise and standardise descriptions of spatial data quality and (2) to apply these descriptions in assessing the suitability (fitness for use) of spatial data, before using the data. The aim of this thesis was twofold: (1) to enhance the description of spatial data quality and (2) to improve our understanding of the implications of spatial data quality.Chapter 1 sets the scene with a discussion on uncertainty and an explanation of why concerns about spatial data quality exist. Knowledge gaps are identified and the chapter concludes with six research questions.Chapter 2 presents an overview of definitions of spatial data quality. Overall, I found a strong agreement on which elements together define spatial data quality. Definitions appear to differ in two aspects: (1) the location within the meta-data report: some elements occur not in the spatial data quality section but in another section of the meta-data report; and (2) the explicitness with which elements are recognised as individual elements. For example, the European pre-standard explicitly recognises theelement'homogeneity'. Other standards recognise the importance of documenting the variation in quality, without naming it explicitly as an individual element.In chapter 3 we quantified the spatial variability in classification accuracy for the agricultural crops in the Dutch national land cover database (LGN). Classification accuracy was significantly correlated with: (1) the crop present according to LGN, (2) the homogeneity of the 8-cell neighbourhood around each cell, (3) the size of the patch in which a cell is located, and (4) the heterogeneity of the landscape in which a cell is located.In chapter 4 I present methods that use error matrices and change detection error matrices as input to make more accurate land cover change estimates. It was shown that temporal correlation in classification errors has a significant impact and must be taken into account. Producers of time series land cover data are recommended not only to report error matrices, but also change detection error matrices.Chapter 5 focuses on positional accuracy and area estimates. From the positional accuracy of vertices delineating polygons, the variance and covariance in area can be derived. Earlier studies derived equations for thevariance,this chapter presents a covariance equation. The variance and covariance equation were implemented in a model and applied in a case-study. The case-study consisted of 97 polygons with a small subsidy value (in euros per hectare) assigned to each polygon. With the model we could calculate the uncertainty in the total subsidy value (in euros) of the complete set of polygons as a consequence of uncertainty in the position of vertices.Chapter 6 explores the relationship between completeness of spatial data and risk in digging activities around underground cables and pipelines. A model is presented for calculating the economic implications of over- and incompleteness. An important element of this model is therelationship between detection time and costs. The model can be used to calculate the optimal detection time, i.e. the time at which expected costs are at their minimum.Chapter 7 addresses the question why risk analysis (RA) is so rarely applied to assess the suitability of spatial data prior to using the data. In theory, the use of RA is beneficial because it allows the user to judge if the use of certain spatial data does not produce unacceptable risks. Frequently proposed hypotheses explaining the scarce adoption of RA are all technical and educational. In chapter 7 we propose a new group of hypotheses, based on decision theory. We found that the willingness to spend resources on RA depends (1) on the presence of feedback mechanisms in the decision-making process, (2) on how much is at stake and (3) to a minor extent on how well the decision-making process can be modelled.Chapter 8 presents conclusions on the six research questions (chapters 2-7) and lists recommendations for users, producers and researchers of spatial data. With regard to the description, four recommendations are given. Firstly, spend more effort on documenting the lineage of reference data. Secondly, quantify and report correlation of quality between related data sets. Thirdly, investigate the integration of different forms of uncertainty (error, vagueness, ambiguity). Fourthly, study the implementation and use of spatial data quality standards. With regard to the application of spatial data quality descriptions, I have two main recommendations. Firstly, to continue the line of research followed in this thesis: quantification of implications of spatial data quality, through development of theory along with tangible illustrations in case-studies. Secondly, there is a need for more empirical research into how users cope with spatial data quality.