1. Model-Driven Development of Web APIs to Access Integrated Tabular Open Data
- Author
-
Irene Garrigós, César González-Mora, Jose-Norberto Mazón, David Tomás, Jose Zubcoff, Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante. Departamento de Ciencias del Mar y Biología Aplicada, Web and Knowledge (WaKe), and Procesamiento del Lenguaje y Sistemas de Información (GPLSI)
- Subjects
General Computer Science ,Computer science ,Web APIs ,open data ,Union ,02 engineering and technology ,computer.software_genre ,Web API ,Set (abstract data type) ,Software ,data access ,Estadística e Investigación Operativa ,020204 information systems ,0202 electrical engineering, electronic engineering, information engineering ,General Materials Science ,Data access ,Information retrieval ,business.industry ,Join ,Open data ,General Engineering ,join ,Publishing ,union ,Word embeddings ,Lenguajes y Sistemas Informáticos ,Table (database) ,020201 artificial intelligence & image processing ,Data integration ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,business ,computer ,lcsh:TK1-9971 ,Word (computer architecture) - Abstract
More and more governments around the world are publishing tabular open data, mainly in formats such as CSV or XLS(X). These datasets are mostly individually published, i.e. each publisher exposes its data on the Web without considering potential relationships with other datasets (from its own or from other publishers). As a result, reusing several open datasets together is not a trivial task, thus requiring mechanisms that allow data consumers (as software developers or data scientists) to integrate and access tabular open data published on the Web. In this paper, we propose a model-driven approach to automatically generate Web APIs that homogeneously access multiple integrated tabular open datasets. This work focuses on data that can be integrated by means of join and union operations. As a first step, our approach detects unionable and joinable tabular open data by using a table similarity measure based on word embeddings. Then, an APIfication process is developed to create APIs that access the previously integrated datasets through a single endpoint. A running example is presented throughout the article, as well as a set of experiments for performance evaluation to show the feasibility of our approach. This work was supported by the National Foundation for Research, Technology and Development of the Spanish Ministry of Economy, Industry and Competitiveness under Project TIN2016-78103-C2-2-R and Project RTI2018-094653-B-C22. The work of César González-Mora was supported by a contract for predoctoral training with the Generalitat Valenciana and the European Social Fund under Grant ACIF/2019/044.
- Published
- 2020