Back to Search Start Over

Datasets classification using deep learning and machine learning classification algorithms.

Authors :
Abdulameer, Maysaa H.
Abdullah, Mahmood Z.
Source :
AIP Conference Proceedings. 2023, Vol. 2591 Issue 1, p1-11. 11p.
Publication Year :
2023

Abstract

The process of building new dataset and the existence of such a data followed the urgent need for the existence of datasets that are specialized in educational lectures, so this will need an accurate classification algorithm to classify it, the benefit of classify such dataset is to minimize the workload of classifying each file manually and individually. In the present paper, authors perform experimentations for conducting an empirical deep learning study, especially, convolutional neural network, for three new datasets of educational lectures which are (PDF, Word and PowerPoint datasets), The three new datasets using real data educational resources lectures collected from various document projects of different universities and institutions. The architecture has been applied to the task of the text classification in the domain of the document with documents data-sets have been obtained from a variety of projects on actual document cases. The aim of the present study is to initially test the performance of each dataset (PDF, Word, and PowerPoint dataset) through using four machine learning classification algorithms which are (Bayes Net, Random Forest, Random Committee, and OneR). Second goal is experimenting the efficiency of the approach of the deep learning in the tasks of classification and after that, comparing the efficiencies with the efficiencies of traditional machine learning classification methods. Mainly two classification techniques used to maximize the benefits of the classification process, the first one is to use the deep learning algorithm which shows an accuracy of classifying file between (95 and 96%) for three new dataset files and standard machine learning algorithms (OneR, Random forest, Bayes net, and Random Committee) these algorithm shows accuracy 91% for PDF Dataset using random forest and random committee algorithms, for Word dataset the accuracy is 46% using random committee, and for the last dataset PowerPoint the accuracy is 77% using random forest, Therefore, we will choose Deep learning algorithm because it gives higher results and accuracy than machine learning algorithms. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0094243X
Volume :
2591
Issue :
1
Database :
Academic Search Index
Journal :
AIP Conference Proceedings
Publication Type :
Conference
Accession number :
162753223
Full Text :
https://doi.org/10.1063/5.0120454