Back to Search
Start Over
MalFamAware: Automatic Family Identification and Malware Classification Through Online Clustering
- Publication Year :
- 2020
-
Abstract
- The skyrocketing growth rate of new malware brings novel challenges to protect computers and networks. Discerning truly novel malware from variants of known samples is a way to keep pace with this trend. This can be done by grouping known malware in families by similarity and classifying new samples into those families. As malware and their families evolve over time, approaches based on classifiers trained on a fixed ground truth are not suitable. Other techniques use clustering to identify families, but they need to periodically re-cluster the whole set of samples, which does not scale well. A promising approach is based on incremental clustering, where periodically only yet unknown samples are clustered to identify new families, and classifiers are retrained accordingly. However, the latter solutions usually are not able to immediately react and identify new malware families. In this paper, we propose MalFamAware, a novel approach to malware family identification based on an online clustering algorithm, namely BIRCH, which efficiently updates clusters as new samples are fed without requiring to re-scan the entire dataset. MalFamAwareis able to both classify new malware in existing families and identify new families at runtime. We present experimental evaluations where MalFamAware outperforms both total re-clustering and incremental clustering solutions in terms of accuracy and time. We also compare our solution with classifiers retrained over time, obtaining better accuracy, in particular when samples belong to yet unknown families.
- Subjects :
- Computer Networks and Communications
Computer science
0211 other engineering and technologies
02 engineering and technology
computer.software_genre
Machine learning
Set (abstract data type)
Similarity (network science)
Safety, Risk, Reliability and Quality
Cluster analysis
Pace
021110 strategic, defence & security studies
Ground truth
malware analysis
malware family identification
incremental clustering
business.industry
Identification (information)
ComputingMethodologies_PATTERNRECOGNITION
Malware
Artificial intelligence
business
computer
Software
Information Systems
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Accession number :
- edsair.doi.dedup.....fd472b8d6e268df25a88ee7f9b08a793