A comparative analysis of tree-based models classifying imbalanced breath alcohol data

All of RECERCAT

To access the full text documents, please follow this link: http://hdl.handle.net/2445/120281

Title:	A comparative analysis of tree-based models classifying imbalanced breath alcohol data
Author:	Alcañiz, Manuela; Santolino, Miguel; Ramon, Lluís
Other authors:	Universitat de Barcelona
Abstract:	When applied to binary data, most classification algorithms behave well provided the dataset is balanced. However, when one single class includes the majority of cases, a good predictive performance for the minority class is not easy to achieve. We examine the strengths and weaknesses of three tree-based models when dealing with imbalanced data.We also explore sampling and cost sensitive methods as strategies for improving machine learning algorithms. An application to a large dataset of breath alcohol content tests performed in Catalonia (Spain) to detect drunk drivers is shown. The Random Forest method proved to be the model of choice if a high performance is required, while down- sampling strategies resulted in a significant reduction in computing time. When predicting alcohol impairment, the area of control (built-up or not), hour of day and drivers age were the most relevant variables for classification.
Subject(s):	-Consum d'alcohol -Mostreig (Estadística) -Algorismes -Drinking of alcoholic beverages -Sampling (Statistics) -Algorithms
Rights:	(c) Sociedad de Estadística e Investigación Operativa, 2017
Document type:	Article Article - Published version
Published by:	Sociedad de Estadística e Investigación Operativa
Share: