1. A novel pattern-based edit distance for automatic log parsing
- Author
-
Maxime Raynal, Marc-Olivier Buob, Georges Quenot, Nokia Bell Labs [Nozay], Modélisation et Recherche d’Information Multimédia [Grenoble] (MRIM ), Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Laboratory of Information, Network and Communication Sciences (LINCS), Université Pierre et Marie Curie - Paris 6 (UPMC)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut Mines-Télécom [Paris] (IMT), Modélisation et Recherche d’Information Multimédia [Grenoble] (MRIM), and Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre Mendès France - Grenoble 2 (UPMF)-Université Joseph Fourier - Grenoble 1 (UJF)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Institut National Polytechnique de Grenoble (INPG)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS] - Abstract
International audience; This work aims at inferring a set of regular expressions to parse a text file, like a system log. To this end, we propose a novel edit distance taking advantage of the pattern matching background. Edit distances are commonly used for fuzzy search and in bioinformatics, and compare two strings at the character level. By doing so, edit distances do not consider the nature of the data conveyed by the strings. To address this problem, we propose the following contributions. First, we propose to model strings at the pattern level using a dedicated data structure, called pattern automaton. Second, we design a novel edit distance, operating at the pattern level. Third, we derive a clustering algorithm optimized for this distance. Finally, we evaluate our proposal through experimental validation.
- Published
- 2022
- Full Text
- View/download PDF