Start Over

Optimization of AES using BERT and BiLSTM for Grading the Online Exams.

Authors :: Azhari, Azhari
Santoso, Agus
Ratna, Anak Agung Putri
Prestiliano, Jasson
Source :: International Journal of Intelligent Engineering & Systems; 2024, Vol. 17 Issue 5, p395-411, 17p
Publication Year :: 2024
Abstract: Essays are one of the most used exams to assess students. Universitas Terbuka Indonesia (Open University) conducts three online essay exams within a week for each first-year course, accounting for 30% of the mid-test score. The university has over 500.000 students and hundreds of courses. However, the limited number of correctors resulted in a time-consuming and ineffective process of checking and scoring each student's essay response. Even score results can be subjective, unfair, lack detailed feedback from students, miss contextual and creative aspects, and be less reliable with complex or non-standard writing. This research proposes a hybrid approach of deep learning models and natural semantic grammar to improve and optimize an AES system, with the following steps: First, the datasets are collected from hundreds of students' answers, each representing a single question. The datasets are pre-processed and augmented to enhance the quantity and variety of the original data and scores. Second, the BERT approach was utilized to transform each text dataset into vector feature spaces using pre-trained model weights. Finally, a prediction score model was generated using the BiLSTM method. The experiment's results show that the model had an average Cohen's Kappa score of 0.749 and the highest Cohen's Kappa score of 0.91. This BERT-BiLSTM optimization model also has a better Cohen's Kappa Score average (0.820) than the ATT-CNN-LSTM, BERT-XLNET, R2BERT, and CNNBiLSTM models. After conducting a test on 46 lecturers, the results showed that the average time taken to examine one course for each student decreased to 1 minute and 2 seconds. Additionally, 92.75% of the lecturers found the process of checking responses to be fairer and more objective. In the results of the trial with 200 students, the mean percentage of question indications related to UI/UX, fairness, responsiveness, rubric, feedback, and transparency across students was 93.72%. [ABSTRACT FROM AUTHOR]