A comparative analysis of tree-based models classifying imbalanced breath alcohol data

Authors :: Alcañiz, M.
MIGUEL SANTOLINO
Ramon, L.
Universitat de Barcelona
Source :: Scopus-Elsevier, Recercat. Dipósit de la Recerca de Catalunya, instname, Dipòsit Digital de la UB, Universidad de Barcelona
Publication Year :: 2017
Publisher :: Sociedad de Estadística e Investigación Operativa, 2017.
Abstract: When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification.

Subjects :: Drinking of alcoholic beverages
Consum d'alcohol
Algorismes
Sampling (Statistics)
Mostreig (Estadística)
Algorithms

Language :: English
Database :: OpenAIRE
Journal :: Scopus-Elsevier, Recercat. Dipósit de la Recerca de Catalunya, instname, Dipòsit Digital de la UB, Universidad de Barcelona
Accession number :: edsair.dedup.wf.001..4813fef9a79d874daa1d281ef581469b

Tools