1. A Coq formalization of data provenance
- Author
-
Véronique Benzaken, Évelyne Contejean, Rébecca Zucchini, Chantal Keller, Sarah Cohen-Boulakia, Université Paris-Saclay, Centre National de la Recherche Scientifique (CNRS), Laboratoire Méthodes Formelles (LMF), Institut National de Recherche en Informatique et en Automatique (Inria)-CentraleSupélec-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS)-Ecole Normale Supérieure Paris-Saclay (ENS Paris Saclay), École normale supérieure - Cachan (ENS Cachan), ACM, Cătălin Hriţcu and Andrei Popescu, and ANR-15-CE39-0009,DataCert,Spécification intensive en Coq d'intégration de données orientée sécurité(2015)
- Subjects
CCS Concepts: • Software and its engineering → Formal software verification ,Correctness ,Theoretical computer science ,• Theory of computation → Logic and verification ,Computer science ,02 engineering and technology ,Relational algebra ,Software verification ,0502 economics and business ,0202 electrical engineering, electronic engineering, information engineering ,Coq ,[INFO]Computer Science [cs] ,[INFO.INFO-MS]Computer Science [cs]/Mathematical Software [cs.MS] ,Data provenance extended relational algebra ,Interpretation (logic) ,[INFO.INFO-PL]Computer Science [cs]/Programming Languages [cs.PL] ,Type theory ,Data manipulation language ,05 social sciences ,Proof assistant ,[INFO.INFO-LO]Computer Science [cs]/Logic in Computer Science [cs.LO] ,Null (SQL) ,Program verification ,A priori and a posteriori ,050211 marketing ,020201 artificial intelligence & image processing ,Raw data ,Program semantics - Abstract
International audience; In multiple domains, large amounts of data are daily generated and combined to be analyzed. The interpretation of these analyses requires to track back the provenance of combined data with respect to initial, raw data. The correctness of the provenance is crucial in many critical domains, such as medicine to prescribe treatments. In this article, we propose the first provenance-aware extended relational algebra formalized in a proof assistant (Coq), for a non trivial subset of database queries: queries containing aggregates, null values, and correlated sub-queries. The formalization is validated by an adequacy proof with respect to standard evaluation of queries. This development is a first step towards a posteriori certification of provenance for data manipulation, with strong guaranties.
- Published
- 2021
- Full Text
- View/download PDF