1. Exploitation and Sanitization of Hidden Data in PDF Files: Do Security Agencies Sanitize Their PDF Files?
- Author
-
Supriya Adhatarao, Cédric Lauradoux, Privacy Models, Architectures and Tools for the Information Society (PRIVATICS), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-CITI Centre of Innovation in Telecommunications and Integration of services (CITI), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Inria Lyon, Institut National de Recherche en Informatique et en Automatique (Inria), and Université Grenoble Alpes (UGA)
- Subjects
050101 languages & linguistics ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,Computer security ,computer.software_genre ,Systems security ,Security and privacy ,[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR] ,0202 electrical engineering, electronic engineering, information engineering ,Information system ,0501 psychology and cognitive sciences ,Quality (business) ,Publication ,media_common ,business.industry ,05 social sciences ,Metadata ,Information sensitivity ,Hidden data ,Publishing ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,020201 artificial intelligence & image processing ,business ,File system security ,computer - Abstract
International audience; Organizations publish and share more and more electronic documents like PDF files. Unfortunately, most organizations are unaware that these documents can compromise sensitive information like authors names, details on the information system and architecture. All these information can be exploited easily by attackers to footprint and later attack an organization. In this paper, we analyze hidden data found in the PDF files published by an organization. We gathered a corpus of 39664 PDF files published by 75 security agencies from 47 countries. We have been able to measure the quality and quantity of information exposed in these PDF files. It can be effectively used to find weak links in an organization: the employees who are running outdated software. We have also measured the adoption of PDF files sanitization by security agencies. We identified only 7 security agencies which sanitize few of their PDF files before publishing. Unfortunately, we were still able to find sensitive information within 65% of these sanitized PDF files. Some agencies are using weak sanitization techniques: it requires to remove all the hidden sensitive information from the file and not just to remove the data at the surface. Security agencies need to change their sanitization methods.
- Published
- 2021
- Full Text
- View/download PDF