Work-efficient parallel non-maximum suppression for embedded GPU architectures

dc.contributor
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.contributor
Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.contributor
Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
dc.contributor.author
Oro Garcia, David
dc.contributor.author
Fernandez Tena, Carles
dc.contributor.author
Martorell Bofill, Xavier
dc.contributor.author
Hernando Pericás, Francisco Javier
dc.date.issued
2016
dc.identifier
Oro, D., Fernandez, C., Martorell, X., Hernando, J. Work-efficient parallel non-maximum suppression for embedded GPU architectures. A: IEEE International Conference on Acoustics, Speech, and Signal Processing. "2016 IEEE International Conference on Acoustics, Speech, and Signal Processing: proceedings: March 20-25, 2016: Shanghai International Convention Center: Shanghai, China". Shanghai: Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1026-1030.
dc.identifier
978-1-4799-9988-0
dc.identifier
https://hdl.handle.net/2117/91351
dc.identifier
10.1109/ICASSP.2016.7471831
dc.description.abstract
With the emergence of GPU computing, deep neural networks have become a widely used technique for advancing research in the field of image and speech processing. In the context of object and event detection, slidingwindow classifiers require to choose the best among all positively discriminated candidate windows. In this paper, we introduce the first GPU-based non-maximum suppression (NMS) algorithm for embedded GPU architectures. The obtained results show that the proposed parallel algorithm reduces the NMS latency by a wide margin when compared to CPUs, even clocking the GPU at 50% of its maximum frequency on an NVIDIA Tegra K1. In this paper, we show results for object detection in images. The proposed technique is directly applicable to speech segmentation tasks such as speaker diarization.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (published version)
dc.format
5 p.
dc.format
application/pdf
dc.language
eng
dc.publisher
Institute of Electrical and Electronics Engineers (IEEE)
dc.relation
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7471831
dc.relation
info:eu-repo/grantAgreement/EC/H2020/644312/EU/Heterogeneous Secure Multi-level Remote Acceleration Service for Low-Power Integrated Systems and Devices/RAPID
dc.rights
Restricted access - publisher's policy
dc.subject
Àrees temàtiques de la UPC::Informàtica::Sistemes d'informació
dc.subject
Àrees temàtiques de la UPC::Informàtica
dc.subject
Embedded computer systems
dc.subject
Information display systems
dc.subject
Embedded systems
dc.subject
Graphics processing units
dc.subject
Parallel algorithms
dc.subject
Work-efficient parallel nonmaximum suppression
dc.subject
Embedded GPU architectures
dc.subject
Image processing
dc.subject
Speech processing
dc.subject
Deep neural networks
dc.subject
NMS latency
dc.subject
Positively discriminated candidate windows
dc.subject
Parallel algorithm
dc.subject
NVIDIA Tegra JC1
dc.subject
Speech segmentation tasks
dc.subject
Speaker diarization
dc.subject
Sistemes incrustats (Informàtica)
dc.subject
Visualització (Informàtica)
dc.title
Work-efficient parallel non-maximum suppression for embedded GPU architectures
dc.type
Conference report


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

E-prints [73034]