Vukmirović, P., Bentkamp, A., Blanchette, J., Cruanes, S., Nummelin, V., Tourret, S., Platzer, André, Sutcliffe, Geoff, Vrije Universiteit Amsterdam [Amsterdam] (VU), Modeling and Verification of Distributed Algorithms and Systems (VERIDIS), Max-Planck-Institut für Informatik (MPII), Max-Planck-Gesellschaft-Max-Planck-Gesellschaft-Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Formal Methods (LORIA - FM), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Aesthetic Integration, Theoretical Computer Science, Network Institute, Computer Science, Computer Systems Section - Vrije Universiteit Amsterdam, Max-Planck-Gesellschaft, Proof-oriented development of computer-based systems (MOSEL), Department of Formal Methods (LORIA - FM), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Platzer, André, Sutcliffe, Geoff, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Max-Planck-Institut für Informatik (MPII), and Max-Planck-Gesellschaft-Max-Planck-Gesellschaft
This archive contains the raw evaluation results and scripts associated with the experiments described in the article "Making Higher-Order Superposition Work" by Vukmirović, Bentkamp, Blanchette, Cruanes, Nummelin and Tourret available at: https://matryoshka-project.github.io/pubs/mhosw_article.pdf The problems we used for the evaluation are in problems/ subdirectory, divided in categories sh and tptp as in the article. In the results/ directory there are two subdirectories, sh and tptp, one for each benchmark category. Both subdirectories contain files named fD.csv where D is a number from 1 to 10. The files contain experiment results for the figure labelled with the number D, and the benchmark category corresponding to the parent directory. In the result files, the columns give the results of the experiment run for a given prover configuration (e.g., CPU time, reported status, memory usage, etc.). Each row corresponds to one problem file, whose name is given in the "prob_name" column. The "i_solver" column corresponds to a prover, whereas the "i_configuration" column corresponds to a configuration, where i is a natural number identifying a combination of prover and configuration. Files ending with "summary.csv" contain concise summaries of evaluation runs for a corresponding results file. Columns of these files are of the form "{solver}_{configuration}", and rows contain different statistics described in the "summary" column. The names of the solvers and configurations used in result files are self-explanatory (for DCS, DCI and IC configurations presence of av in the name of the configuration denotes that Avatar is enabled). In some cases, for uniformity, configurations are used in multiple figures. We used the following versions of higher-order provers: CVC4 1.9 ([git cascJ10 db26022c]), Leo-III 1.5.6, Satallax 3.5, Vampire 4.5.1 (commit 68d3e7314), E 2.7 (603992156b943171002cacca72aa3f2d0a211ee3). The "zipperposition/" directory contains the scripts that can be used to run the provided Zipperposition binary (compiled for Linux) on a given problem using a given configuration. To run Zipperposition on a single problem using some configuration, use the scripts with the name "run_*", where * stands for the configuration used in the evaluation. The configuration names match those used in the result files. Except for "run_coop" and "run_uncoop", scripts accept three arguments: 1) path to the problem 2) CPU timeout 3) path to the temporary directory for files used to communicate with Ehoh. "run_coop" and "run_uncoop" expect only the first two arguments For example, to run problem with the path ~/problem.p using configuration which enables PEHO priority function for up to 15 seconds execute ./run_prio_peho ~/problem.p 15 /tmp Working installation of Python 3 and bash is required. Source code for Zipperposition can be obtained from the wip-merge-hlbe branch of Zipperposition git repository: git@github.com:sneeuwballen/zipperposition.git. The binary stored under scripts/ directory corresponds to compiled sources tagged with commit hash 8d205ecf8869c76dcfdf0c1faa98b002226780a1. Compilation instructions are as in README.md file contained in the git repository. As noted in footnote 2, three tables have been evaluated using a faulty version of Ehoh prover. Reevaluation is being performed, and it is expected that the evaluation results will stay mostly the same, up to the noise in the evaluation environment. "eprover-ho" binary provided in this archive is the corrected version of the one used to evaluate tables 7, 8 and 9.