Back to Search Start Over

Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens

Authors :
Helena Galhardas
Angelos-Christos G. Anadiotis
Théo Bouganim
Stephane Horel
Oana Balalau
Ioana Manolescu
Youssr Youssef
Francesco Chimienti
Mhd Yamen Haddad
Rich Data Analytics at Cloud Scale (CEDAR)
Laboratoire d'informatique de l'École polytechnique [Palaiseau] (LIX)
Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X)-Inria Saclay - Ile de France
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)
Instituto Superior Técnico, Universidade Técnica de Lisboa (IST)
Le Monde
ANR-20-CHIA-0015,SourcesSay,Analyse et Interconnexion Intelligente des Contenus Héterogènes dans des Arènes Numériques(2020)
École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)-École polytechnique (X)-Centre National de la Recherche Scientifique (CNRS)-Inria Saclay - Ile de France
Source :
ACM International Conference on Information and Knowledge Management (CIKM 2021), ACM International Conference on Information and Knowledge Management (CIKM 2021), Nov 2021, Online, Australia. ⟨10.1145/3459637.3481982⟩, CIKM
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

International audience; Investigative Journalism (IJ, in short) requires combining highly heterogeneous digital datasets coming from a wide variety of sources. We have developed ConnectionLens, a system that integrates such sources into a single heterogeneous graph and enables users to query the graph using keywords. The first iteration of the system [7] followed a mediator architecture which severely constrained its query scalability. Thus, we fully re-engineered the system, moving it to a warehouse architecture, and replacing its core components (information extraction, data querying, and interactive interfaces), which allowed us to handle uses cases orders of magnitude larger than the previous platform. In a consortium of computer scientists and investigative journalists, we propose to demonstrate ConnectionLens' capability to integrate arbitrary heterogeneous datasets and query them flexibly by means of keywords. Among several scenarios, our main focus will be on a real-world journalistic use case about situations which may lead to Conflicts of Interest between biomedical experts and various organizations, such as corporations, lobbies, etc. The demonstration will showcase the end-to-end data analysis pipeline, illustrate each system component, and the different parameters governing graph creation and querying.

Details

Language :
English
Database :
OpenAIRE
Journal :
ACM International Conference on Information and Knowledge Management (CIKM 2021), ACM International Conference on Information and Knowledge Management (CIKM 2021), Nov 2021, Online, Australia. ⟨10.1145/3459637.3481982⟩, CIKM
Accession number :
edsair.doi.dedup.....9e484677b79f62625ec4c43187bd12e9