Improving predication efficiency through compaction/restoration of SIMD instructions

Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Moretó Planas, Miquel; Casas, Marc; Valero Cortés, Mateo; Barredo Ferreira, Adrián; Cebrián González, Juan Manuel; Moretó Planas, Miquel; Casas, Marc; Valero Cortés, Mateo

Improving predication efficiency through compaction/restoration of SIMD instructions

Author

Barredo Ferreira, Adrián

Cebrián González, Juan Manuel

Moretó Planas, Miquel

Casas, Marc

Valero Cortés, Mateo

Other authors

Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

Barcelona Supercomputing Center

Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions

Publication date

2020

Abstract

Vector processors offer a wide range of unexplored opportunities to improve performance and energy efficiency. However, despite its potential, vector code generation and execution have significant challenges, the most relevant ones being control flow divergence. Most modern processors including SIMD extensions (such as AVX) rely on predication to support divergence control. In predicated codes, performance and energy consumption are usually insensitive to the number of true values in a predicated mask. This implies that the system efficiency becomes sub-optimal as vector length increases. In this paper we focus on SIMD extensions and propose a novel approach to improve execution efficiency in predicated SIMD instructions, the Compaction/Restoration (CR) technique. CR delays predicated SIMD instructions with inactive elements and compacts them with instances of the same instruction from different loop iterations to form an equivalent dense vector instruction, where, in the best case, all the elements are active. After executing such dense instructions, their results are restored to the original instructions. Our evaluation shows that CR improves performance by up to 25% and reduces dynamic energy consumption by up to 43% on real unmodified applications with predicated execution. Moreover, CR allows executing unmodified legacy code with short vector instructions (AVX-2) on newer architectures with wider vectors (AVX-512), achieving up to 56% performance benefits.

This work has been partially supported by the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence, the Spanish Government (contract TIN2015-65316-P) and the European Union’s Horizon 2020 research and innovation program under the Mont-Blanc 2020 project (grant agreement 779877). A. Barredo has been supported by the Spanish Government under Formación del Personal Investigador fellowship number BES-2017-080635. M. Moretó and M. Casas have been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship numbers RYC-2016-21104 and RYC-2017-23269.

Peer Reviewed

Postprint (author's final draft)

Document Type

Conference report

Language

English

Subjects and keywords

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles; High performance computing -- Energy consumption; Parallel processing (Electronic computers); SIMD; Energy effiency; Predication; Càlcul intensiu (Informàtica) -- Consum d'energia; Processament en paral·lel (Ordinadors)

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Related items

https://ieeexplore.ieee.org/document/9065430

info:eu-repo/grantAgreement/MINECO//TIN2015-65316-P/ES/COMPUTACION DE ALTAS PRESTACIONES VII/

info:eu-repo/grantAgreement/AEI/RYC-2016-21104

info:eu-repo/grantAgreement/EC/FP7/321253/EU/Riding on Moore's Law/ROMOL

info:eu-repo/grantAgreement/EC/H2020/779877/EU/Mont-Blanc 2020, European scalable, modular and power efficient HPC processor/Mont-Blanc 2020

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

Open Access

This item appears in the following Collection(s)

E-prints [72954]

Improving predication efficiency through compaction/restoration of SIMD instructions

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Related items

Recommended citation

Export

Rights

This item appears in the following Collection(s)