Back to Search
Start Over
More Data and Better Keywords Imply Better Educational Transcript Classification?
- Source :
-
International Educational Data Mining Society . 2020. - Publication Year :
- 2020
-
Abstract
- Building and especially improving a classification kernel represents a challenging task. The works presented in this paper continue an already developed semi-supervised classification approach that aimed at labelling transcripts from educational videos. We questioned whether the size of the ground-truth data-set (Wikipedia articles) or the quality of the keywords used in the semi-supervised labelling have a significant impact on the accuracy metrics of the final obtained data model. Experimental results took into consideration three Wikipedia data-sets of "Small," "Medium" and "Large" sizes. For each data-set there were used three sets of keywords: offered by video authors, determined by "rake-nltk" on available transcripts and determined by "rake-nltk" on Wikipedia articles that serve as training and testing data for the LDA [latent Dirichlet allocation] model that determine keywords on the transcripts. Experiments show that the size of the data-set has little importance, while the quality of the keywords has a more significant impact. Therefore, an improved version of the previously developed classifier has been obtained by improving the quality of the keywords involved in semi-supervised training. This result paves the way towards further improvements that may finally be deployed as within a recommender system of educational videos at the Universitat Politècnica de València. [For the full proceedings, see ED607784.]
Details
- Language :
- English
- Database :
- ERIC
- Journal :
- International Educational Data Mining Society
- Publication Type :
- Conference
- Accession number :
- ED608010
- Document Type :
- Speeches/Meeting Papers<br />Reports - Research