1. Compiler and optimization level recognition using graph neural networks
- Author
-
Bardin, Sébastien, Benoit, Tristan, Marion, Jean-Yves, CEA- Saclay (CEA), Commissariat à l'énergie atomique et aux énergies alternatives (CEA), Carbone (CARBONE), Department of Formal Methods (LORIA - FM), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), This work is supported by a public grant overseen by the French National Research Agency (ANR) as part of the 'Investissements d’Avenir' French PIA project 'Lorraine Université d’Excellence', reference ANR-15-IDEX-04-LUE. Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr)., GRID5000, IMPACT-DIGITRUST, ANR-15-IDEX-0004,LUE,Isite LUE(2015), Benoit, Tristan, and ISITE - Isite LUE - - LUE2015 - ANR-15-IDEX-0004 - IDEX - VALID
- Subjects
reverse engineering ,[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR] ,machine learning ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,ACM: D.: Software/D.2: SOFTWARE ENGINEERING/D.2.7: Distribution, Maintenance, and Enhancement/D.2.7.5: Restructuring, reverse engineering, and reengineering ,ACM: D.: Software/D.2: SOFTWARE ENGINEERING/D.2.5: Testing and Debugging/D.2.5.2: Diagnostics ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,toolchain provenance ,ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.6: Learning ,[INFO.INFO-CR] Computer Science [cs]/Cryptography and Security [cs.CR] - Abstract
The main objective of this workshop is to bring together researchers in the machine learning and program analysis communities and to serve as a platform for identifying cross-disciplinary problems of mutual interest.; International audience; We consider the problem of recovering the compiling chain used to generate a given bare binary code. We present a first attempt to devise a Graph Neural Network framework to solve this problem, in order to take into account the shallow semantics provided by the binary code's structured control flow graph (CFG). We introduce a Graph Neural Network, called Site Neural Network (SNN), dedicated to this problem. Feature extraction is simplified by forgetting almost everything in a CFG except transfer control instructions. While at an early stage, our experiments show that our method already recovers the compiler and the optimization level provenance with very high accuracy. We believe these are promising results that may offer new, more robust leads for compiling tool chain identification.
- Published
- 2021