Back to Search Start Over

A Machine Learning-based Triage methodology for automated categorization of digital media.

Authors :
Marturana, Fabio
Tacconi, Simone
Source :
Digital Investigation; Sep2013, Vol. 10 Issue 2, p193-204, 12p
Publication Year :
2013

Abstract

Abstract: The global diffusion of smartphones and tablets, exceeding traditional desktops and laptops market share, presents investigative opportunities and poses serious challenges to law enforcement agencies and forensic professionals. Traditional Digital Forensics techniques, indeed, may be no longer appropriate for timely analysis of digital devices found at the crime scene. Nevertheless, dealing with specific crimes such as murder, child abductions, missing persons, death threats, such activity may be crucial to speed up investigations. Motivated by this, the paper explores the field of Triage, a relatively new branch of Digital Forensics intended to provide investigators with actionable intelligence through digital media inspection, and describes a new interdisciplinary approach that merges Digital Forensics techniques and Machine Learning principles. The proposed Triage methodology aims at automating the categorization of digital media on the basis of plausible connections between traces retrieved (i.e. digital evidence) and crimes under investigation. As an application of the proposed method, two case studies about copyright infringement and child pornography exchange are then presented to actually prove that the idea is viable. The term “feature” will be regarded in the paper as a quantitative measure of a “plausible digital evidence”, according to the Machine Learning terminology. In this regard, we (a) define a list of crime-related features, (b) identify and extract them from available devices and forensic copies, (c) populate an input matrix and (d) process it with different Machine Learning mining schemes to come up with a device classification. We perform a benchmark study about the most popular mining algorithms (i.e. Bayes Networks, Decision Trees, Locally Weighted Learning and Support Vector Machines) to find the ones that best fit the case in question. Obtained results are encouraging as we will show that, triaging a dataset of 13 digital media and 45 copyright infringement-related features, it is possible to obtain more than 93% of correctly classified digital media using Bayes Networks or Support Vector Machines while, concerning child pornography exchange, with a dataset of 23 cell phones and 23 crime-related features it is possible to classify correctly 100% of the phones. In this regards, methods to reduce the number of linearly independent features are explored and classification results presented. [Copyright &y& Elsevier]

Details

Language :
English
ISSN :
17422876
Volume :
10
Issue :
2
Database :
Supplemental Index
Journal :
Digital Investigation
Publication Type :
Academic Journal
Accession number :
90093990
Full Text :
https://doi.org/10.1016/j.diin.2013.01.001