Start Over

URL filtering using machine learning algorithms.

Authors :: Aljahdalic, Asia Othman
Banafee, Shoroq
Aljohani, Thana
Source :: Information Security Journal: A Global Perspective. 2024, Vol. 33 Issue 3, p193-203. 11p.
Publication Year :: 2024
Abstract: Cyber-attacks using malicious uniform resource locator (URL) propagation are very common and serious. Statistics indicate that there is a need to research and apply techniques and methods for identifying and preventing malicious URLs. The main objective of this research is to train machine learning models on selected dataset to predict phishing websites based on URL-related features. The accuracy level of each model is measured and compared. Finally, the best performing model will be used to develop a web application that provide internet users with an easy way to check suspicious URLs. We have used five different machine learning models to classify URLs as legitimate or phishing, these models are eXtreme Gradient Boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), Decision Tree, and Random Forest. Finally, we used Voting Classifier to combine the work of Random Forest (RF) algorithm with other two models, Gaussian Naive Bayes, and Logistic Regression, to check if we can increase the accuracy of RF as suggested in the literature, but we found that the accuracy of RF alone was higher than the accuracy of the combined models. This project can be implemented as a browser extension or mobile application to classify suspicious URLs to legitimate or phishing with the use of the saved model. [ABSTRACT FROM AUTHOR]