Back to Search
Start Over
Application of Machine Learning Methods to Compare Disciplines Content Using Text Data
- Source :
- FRUCT, Proceedings of the XXth Conference of Open Innovations Association FRUCT, Vol 30, Iss 1, Pp 115-120 (2021)
- Publication Year :
- 2021
- Publisher :
- IEEE, 2021.
-
Abstract
- The paper investigates one of the approaches based on machine learning methods aimed at finding and identifying similar disciplines. In the research we used two most popular methods of machine learning to process text data - BERT and Doc2Vec. Machine learning was conducted using the datasets of various disciplines with the total of 2,5 million entries. To assess the quality of the developed models, 30 experts from different scientific fields were engaged in the study to evaluate the level of similarity between the disciplines defined by the trained models. Based on the results of the research, both methods trained using identical datasets generated similar outputs. Another algorithm Doc2Vec, trained on a relatively small data sample with 15 000 entries of the target discipline database that included disciplines descriptions and curriculums, showed better results which justifies the need for developing specific solutions for particular tasks. Further development of machine learning methods and models design to solve specific tasks in the educational field will promote digitalization of education within the area of university operations management.
- Subjects :
- education
Small data
educational data mining
business.industry
Process (engineering)
Computer science
media_common.quotation_subject
Sample (statistics)
text mining
TK5101-6720
Machine learning
computer.software_genre
Field (computer science)
Similarity (psychology)
Telecommunication
Quality (business)
Artificial intelligence
business
Design methods
text's similarity
computer
Curriculum
media_common
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2021 30th Conference of Open Innovations Association FRUCT
- Accession number :
- edsair.doi.dedup.....06d5f30ece73bb2d7c0e61b11dd9e081