1. Mining multiple informational text structure from text data
- Author
-
Syaamantak Das, Anupam Basu, and Shyamal Kumar Das Mandal
- Subjects
Computer science ,business.industry ,Feature vector ,020206 networking & telecommunications ,02 engineering and technology ,computer.software_genre ,Task (project management) ,Set (abstract data type) ,Naive Bayes classifier ,Categorization ,0202 electrical engineering, electronic engineering, information engineering ,General Earth and Planetary Sciences ,Domain knowledge ,020201 artificial intelligence & image processing ,Artificial intelligence ,F1 score ,business ,Classifier (UML) ,computer ,Natural language processing ,General Environmental Science - Abstract
This study aimed to distinguish the various types of informational text structure present in the text data. Classification of informational text structure in a given text is an essential area of research for discovering knowledge present in the text content. Several previous studies defined a set of categories of informational text structure which can be identified based on their respective signal words. The paper proposed a methodology for automatic extraction of those text informational structures from school textbook data. The task was to classify a text into one or more of the given predefined categories. Human annotators have performed the categorization, who have sufficient domain knowledge about the subjects of the book. For automatic classification, the occurrence frequency of the signal words was used as a feature vector. A Naive Bayes based classifier was trained using 120 manually annotated text data. Forty text data was used to test the classifier. The classifier had a precision rate of 92% and F1 score of 95.6%.
- Published
- 2020
- Full Text
- View/download PDF