Back to Search
Start Over
Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts
- Source :
- Communications in Computer and Information Science ISBN: 9783030712136, AIST (Supplement)
- Publication Year :
- 2021
- Publisher :
- Springer International Publishing, 2021.
-
Abstract
- Recent papers on Russian readability suggest several formulas aimed at evaluating text reading difficulty for learners of different ages. However, little is known about individual formulas for school subjects and their performance compared to that of existing universal readability formulas. Our goal is to study the impact of the subject both in terms of model quality and on the importance of individual features. We trained 4 linear regression models: an individual formula for each of 3 school subjects (Biology, Literature, and Social Studies) and a universal formula for all the 3 subjects. The dataset was created of schoolbook texts, randomly sampled into pseudo-texts of size 500 sentences. It was split into train and test sets in the ratio of 75 to 25. As for the features, previous papers on Russian readability do not provide proper feature selection. So we suggested a set of 32 features that are possibly relevant to text difficulty in Russian. For every model, features were selected from this set based on their importance. The results obtained show that all the one-subject formulas outperform the universal model and previously developed readability formulas. Experiments with other sample sizes (200 and 900 sentences per sample) prove these results. This is because feature importances vary significantly among the subjects. Suggested readability models might be beneficial for school education for evaluating text relevance for learners and adjusting those texts to target difficulty levels.
- Subjects :
- business.industry
media_common.quotation_subject
Sample (statistics)
Feature selection
computer.software_genre
Readability
Sample size determination
Reading (process)
ComputingMethodologies_DOCUMENTANDTEXTPROCESSING
Feature (machine learning)
Relevance (information retrieval)
Artificial intelligence
Set (psychology)
business
computer
Natural language processing
media_common
Subjects
Details
- ISBN :
- 978-3-030-71213-6
- ISBNs :
- 9783030712136
- Database :
- OpenAIRE
- Journal :
- Communications in Computer and Information Science ISBN: 9783030712136, AIST (Supplement)
- Accession number :
- edsair.doi...........c1a29667f7d677a903c92c2a07b6dc26