1. Automatic Subject Indexing and Classification Using Text Recognition and Computer-Based Analysis of Tables of Contents
- Author
-
Jan Pokorny, ENKI, o.p.s., Leslie Chan, and Pierre Mounier
- Subjects
Structure (mathematical logic) ,Information retrieval ,Computer science ,[SHS.INFO]Humanities and Social Sciences/Library and information sciences ,media_common.quotation_subject ,Subject indexing ,Computer based ,Subject (documents) ,Text recognition ,text mining ,machine learning system ,Index (publishing) ,library automatization ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,[ SHS.INFO ] Humanities and Social Sciences/Library and information sciences ,Quality (business) ,Relevance (information retrieval) ,computer-generated keywords ,computer-generated subject headings ,media_common - Abstract
International audience; This paper will describe a method for machine-based creation of high quality subject indexing and classification for both electronic and print documents using tables of contents (ToCs). The technology described here is primarily focused on electronic and print documents for which, because of technical or licensing reasons, it is not possible to index full text. However, the technology would also be useful for full text documents, because it could significantly enhance the accuracy and relevance of subject description by analyzing the structure of ToCs.
- Published
- 2018
- Full Text
- View/download PDF