Back to Search
Start Over
Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework
- Source :
- IEEE Access, Vol 7, Pp 110050-110073 (2019)
- Publication Year :
- 2019
- Publisher :
- IEEE, 2019.
-
Abstract
- Researchers from academia and the corporate-sector rely on scholarly digital libraries to access articles. Attackers take advantage of innocent users who consider the articles' files safe and thus open PDF-files with little concern. In addition, researchers consider scholarly libraries a reliable, trusted, and untainted corpus of papers. For these reasons, scholarly digital libraries are an attractive-target and inadvertently support the proliferation of cyber-attacks launched via malicious PDF-files. In this study, we present related vulnerabilities and malware distribution approaches that exploit the vulnerabilities of scholarly digital libraries. We evaluated over two-million scholarly papers in the CiteSeerX library and found the library to be contaminated with a surprisingly large number (0.3-2%) of malicious PDF documents (over 55% were crawled from the IPs of US-universities). We developed a two layered detection framework aimed at enhancing the detection of malicious PDF documents, Sec-Lib, which offers a security solution for large digital libraries. Sec-Lib includes a deterministic layer for detecting known malware, and a machine learning based layer for detecting unknown malware. Our evaluation showed that scholarly digital libraries can detect 96.9% of malware with Sec-Lib, while minimizing the number of PDF-files requiring labeling, and thus reducing the manual inspection efforts of security-experts by 98%.
- Subjects :
- 021110 strategic, defence & security studies
General Computer Science
Exploit
Scholarly
Computer science
digital
malware
paper
0211 other engineering and technologies
General Engineering
library
02 engineering and technology
Digital library
computer.software_genre
World Wide Web
PDF documents
0202 electrical engineering, electronic engineering, information engineering
Malware
020201 artificial intelligence & image processing
General Materials Science
lcsh:Electrical engineering. Electronics. Nuclear engineering
computer
lcsh:TK1-9971
Subjects
Details
- Language :
- English
- ISSN :
- 21693536
- Volume :
- 7
- Database :
- OpenAIRE
- Journal :
- IEEE Access
- Accession number :
- edsair.doi.dedup.....9b88ba740ae2d7ca79ba7bf4fe49b715