1. Development of the mechanism for graphical analysis formation for pattern recognition of technical documentation and converting graphic information into machine-readable form.
- Author
-
Petrushevskaya, Anastasia and Rabin, Alexey
- Subjects
- *
NATURAL languages , *SEARCH algorithms , *DATA warehousing , *DOCUMENTATION , *PROBLEM solving - Abstract
The main difficulty arising in the process of automating the retrieval of objects from heterogeneous distributed information bases of an enterprise is the problem of unification of disparate content presented from different points of view and in the context of different paradigms for organizing data storage. The article presents the formulation of the problem of developing graphematic analysis for the purpose of recognizing images of technical documentation and converting graphic information into a machine-readable form, the mechanisms for removing stop words, stemming, lemmatization necessary for solving the problem are described in detail, and an algorithm for searching text structures using templates is developed. The article proposes the implementation of the graphematic analysis algorithm as the first module in the automatic processing of texts in natural language, which makes it possible to parcel out semantically significant constructions from semi-structured resources using special graphematic descriptors. The proposed implementation makes it possible to parcel out such complex structures in natural language, such as, for example, direct speech, to detect and replace abbreviations and abbreviations. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF