Back to Search Start Over

Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts

Authors :
Alexey Sorokin
Ulyana Isaeva
Source :
Communications in Computer and Information Science ISBN: 9783030712136, AIST (Supplement)
Publication Year :
2021
Publisher :
Springer International Publishing, 2021.

Abstract

Recent papers on Russian readability suggest several formulas aimed at evaluating text reading difficulty for learners of different ages. However, little is known about individual formulas for school subjects and their performance compared to that of existing universal readability formulas. Our goal is to study the impact of the subject both in terms of model quality and on the importance of individual features. We trained 4 linear regression models: an individual formula for each of 3 school subjects (Biology, Literature, and Social Studies) and a universal formula for all the 3 subjects. The dataset was created of schoolbook texts, randomly sampled into pseudo-texts of size 500 sentences. It was split into train and test sets in the ratio of 75 to 25. As for the features, previous papers on Russian readability do not provide proper feature selection. So we suggested a set of 32 features that are possibly relevant to text difficulty in Russian. For every model, features were selected from this set based on their importance. The results obtained show that all the one-subject formulas outperform the universal model and previously developed readability formulas. Experiments with other sample sizes (200 and 900 sentences per sample) prove these results. This is because feature importances vary significantly among the subjects. Suggested readability models might be beneficial for school education for evaluating text relevance for learners and adjusting those texts to target difficulty levels.

Details

ISBN :
978-3-030-71213-6
ISBNs :
9783030712136
Database :
OpenAIRE
Journal :
Communications in Computer and Information Science ISBN: 9783030712136, AIST (Supplement)
Accession number :
edsair.doi...........c1a29667f7d677a903c92c2a07b6dc26