Title:
|
Performance evaluation of macroblock-level parallelization of H.264 decoding on a cc-NUMA multiprocessor architecture
|
Author:
|
Álvarez Mesa, Mauricio; Ramírez Bellido, Alejandro; Valero Cortés, Mateo; Azevedo, Arnaldo; Meenderinck, Cor; Juurlink, Ben
|
Other authors:
|
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors; Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions |
Abstract:
|
This paper presents a study of the performance scalability of a macroblock-level parallelization of the H.264 decoder for High De nition (HD) applications on a multiprocessor
architecture. We have implemented this parallelization on a cache coherent Non-uniform Memory Access (cc-NUMA)
shared memory multiprocessor (SMP) and compared the results with the theoretical expectations. Three di erent scheduling techniques were analyzed: static, dynamic and
dynamic with tail-submit. A dynamic scheduling approach with a tail-submit optimization presents the best performance
obtaining a maximum speed-up of 9.5 using 24 processors. A detailed pro ling analysis showed that thread synchronization is one of the limiting factors for achieving a better parallel scalability. The paper includes an evaluation of the impact of using blocking synchronization APIs like POSIX threads and POSIX real-time extensions. Results showed that macroblock-level parallelism as a very negrain form of Thread-Level Parallelism (TLP) is highly affected by the thread synchronization overhead generated by
these APIs. Other synchronization methods, possibly with hardware support, are required in order to make MB-level parallelization more scalable. |
Abstract:
|
Peer Reviewed |
Subject(s):
|
-Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles -cc-NUMA multiprocessor architecture -H.264 -Multiprocessadors -- Avaluació |
Rights:
|
|
Document type:
|
Article - Published version Conference Object |
Share:
|
|