Machine learning for literature classification during systematic literature review – establishing the minimum threshold for labelling papers

Authors :: Venugopal, Vivek
Ates, Aylin
McKiernan, Peter
Source :: 36th Annual Conference of the British Academy of Management
Publication Year :: 2022
Abstract: Taking inspiration from the use of machine learning in the field of medicine for literature classification, this paper explores the use of machine learning to aid the classification of documents during systematic literature reviews in the field of business and management studies. The performances of two machine learning models, SVM and Logistic regression, are compared. The dataset used is a labelled dataset on weak signal literature. The data is iteratively split into training and testing sets with the aim of minimising the training set. The models were evaluated on Sensitivity (Recall), Precision, Specificity, Accuracy, and f1_Score to find the optimal training split. The optimal value was found to be between 40% to 50%. Which meant only 40% to 50% of the dataset needed to be labelled for the machine learning model to predict the labels for the rest of the dataset. Even though machine learning will not eliminate the labour involved in systematic literature reviews, it will save the amount of labour involved and the amount of time required.

Tools