Back to Search
Start Over
Discovering Conflicts of Interest across Heterogeneous Data Sources with ConnectionLens
- Source :
- ACM International Conference on Information and Knowledge Management (CIKM 2021), ACM International Conference on Information and Knowledge Management (CIKM 2021), Nov 2021, Online, Australia. ⟨10.1145/3459637.3481982⟩, CIKM
- Publication Year :
- 2021
- Publisher :
- HAL CCSD, 2021.
-
Abstract
- International audience; Investigative Journalism (IJ, in short) requires combining highly heterogeneous digital datasets coming from a wide variety of sources. We have developed ConnectionLens, a system that integrates such sources into a single heterogeneous graph and enables users to query the graph using keywords. The first iteration of the system [7] followed a mediator architecture which severely constrained its query scalability. Thus, we fully re-engineered the system, moving it to a warehouse architecture, and replacing its core components (information extraction, data querying, and interactive interfaces), which allowed us to handle uses cases orders of magnitude larger than the previous platform. In a consortium of computer scientists and investigative journalists, we propose to demonstrate ConnectionLens' capability to integrate arbitrary heterogeneous datasets and query them flexibly by means of keywords. Among several scenarios, our main focus will be on a real-world journalistic use case about situations which may lead to Conflicts of Interest between biomedical experts and various organizations, such as corporations, lobbies, etc. The demonstration will showcase the end-to-end data analysis pipeline, illustrate each system component, and the different parameters governing graph creation and querying.
- Subjects :
- heterogeneous datasets
Graph database
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
Computer science
02 engineering and technology
investigative journalism
computer.software_genre
Data science
Pipeline (software)
keyword search
Variety (cybernetics)
Information extraction
020204 information systems
Component (UML)
Scalability
0202 electrical engineering, electronic engineering, information engineering
Graph (abstract data type)
020201 artificial intelligence & image processing
computer
Data integration
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- ACM International Conference on Information and Knowledge Management (CIKM 2021), ACM International Conference on Information and Knowledge Management (CIKM 2021), Nov 2021, Online, Australia. ⟨10.1145/3459637.3481982⟩, CIKM
- Accession number :
- edsair.doi.dedup.....9e484677b79f62625ec4c43187bd12e9