1. An efficient and scalable platform for java source code analysis using overlaid graph representations
- Author
-
Oscar Rodriguez-Prieto, Francisco Ortin, Alan Mycroft, Ortin, F [0000-0003-1199-8649], and Apollo - University of Cambridge Repository
- Subjects
Source code ,General Computer Science ,Computer science ,media_common.quotation_subject ,Computation ,graph database ,Static program analysis ,Database languages ,computer.software_genre ,01 natural sciences ,Cypher ,Tools ,03 medical and health sciences ,Databases ,coding guidelines ,0103 physical sciences ,General Materials Science ,010306 general physics ,030304 developmental biology ,media_common ,0303 health sciences ,Graph database ,Programming language ,Ciphers ,General Engineering ,Neo4j ,Software metric ,Semantics ,Java compiler ,Scalability ,program representation ,lcsh:Electrical engineering. Electronics. Nuclear engineering ,declarative query language ,Syntactics ,Code analysis ,lcsh:TK1-9971 ,computer ,Java - Abstract
© 2013 IEEE. Although source code programs are commonly written as textual information, they enclose syntactic and semantic information that is usually represented as graphs. This information is used for many different purposes, such as static program analysis, advanced code search, coding guideline checking, software metrics computation, and extraction of semantic and syntactic information to create predictive models. Most of the existing systems that provide these kinds of services are designed ad hoc for the particular purpose they are aimed at. For this reason, we created ProgQuery, a platform to allow users to write their own Java program analyses in a declarative fashion, using graph representations. We modify the Java compiler to compute seven syntactic and semantic representations, and store them in a Neo4j graph database. Such representations are overlaid, meaning that syntactic and semantic nodes of the different graphs are interconnected to allow combining different kinds of information in the queries/analyses. We evaluate ProgQuery and compare it to the related systems. Our platform outperforms the other systems in analysis time, and scales better to program sizes and analysis complexity. Moreover, the queries coded show that ProgQuery is more expressive than the other approaches. The additional information stored by ProgQuery increases the database size and associated insertion time, but these increases are significantly lower than the query/analysis performance gains obtained.
- Published
- 2020
- Full Text
- View/download PDF