Back to Search Start Over

More Data and Better Keywords Imply Better Educational Transcript Classification?

Authors :
Danciulescu, Theodora Ioana
Mihaescu, Marian Cristian
Heras, Stella
Palanca, Javier
Julian, Vicente
Source :
International Educational Data Mining Society. 2020.
Publication Year :
2020

Abstract

Building and especially improving a classification kernel represents a challenging task. The works presented in this paper continue an already developed semi-supervised classification approach that aimed at labelling transcripts from educational videos. We questioned whether the size of the ground-truth data-set (Wikipedia articles) or the quality of the keywords used in the semi-supervised labelling have a significant impact on the accuracy metrics of the final obtained data model. Experimental results took into consideration three Wikipedia data-sets of "Small," "Medium" and "Large" sizes. For each data-set there were used three sets of keywords: offered by video authors, determined by "rake-nltk" on available transcripts and determined by "rake-nltk" on Wikipedia articles that serve as training and testing data for the LDA [latent Dirichlet allocation] model that determine keywords on the transcripts. Experiments show that the size of the data-set has little importance, while the quality of the keywords has a more significant impact. Therefore, an improved version of the previously developed classifier has been obtained by improving the quality of the keywords involved in semi-supervised training. This result paves the way towards further improvements that may finally be deployed as within a recommender system of educational videos at the Universitat Politècnica de València. [For the full proceedings, see ED607784.]

Details

Language :
English
Database :
ERIC
Journal :
International Educational Data Mining Society
Publication Type :
Conference
Accession number :
ED608010
Document Type :
Speeches/Meeting Papers<br />Reports - Research