101. A Semi-automatic Structure Learning Method for Language Modeling
- Author
-
Vitor Pera
- Subjects
Factorial ,Computer science ,business.industry ,Markov model ,computer.software_genre ,Information theory ,Sketch ,Robustness (computer science) ,Language model ,Semi automatic ,Artificial intelligence ,business ,computer ,Structure learning ,Natural language processing - Abstract
This paper presents a semi-automatic method for statistical language modeling. The method addresses the structure learning problem of the linguistic classes prediction model (LCPM) in class-dependent N-grams supporting multiple linguistic classes per word. The structure of the LCPM is designed, within the Factorial Language Model framework, combining a knowledge-based approach with a data-driven technique. First, simple linguistic knowledge is used to define a set with linguistic features appropriate to the application, and to sketch the LCPM main structure. Next an automatic algorithm selects, based on Information Theory solid concepts, the relevant factors associated to the selected features and establishes the LCPM definitive structure. This approach is based on the so called Buried Markov Models [1]. Although only preliminary results were obtained, they afford great confidence on the method’s ability to learn from the data, LCPM structures that represent accurately the application’s real dependencies and also favor the training robustness.
- Published
- 2019
- Full Text
- View/download PDF