dc.contributor
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor
Barcelona Supercomputing Center
dc.contributor
Universitat Politècnica de Catalunya. CAP - Grup de Computació d'Altes Prestacions
dc.contributor.author
De Haro Ruiz, Juan Miguel
dc.contributor.author
Cano, Rubén
dc.contributor.author
Álvarez Martínez, Carlos
dc.contributor.author
Jiménez González, Daniel
dc.contributor.author
Martorell Bofill, Xavier
dc.contributor.author
Ayguadé Parra, Eduard
dc.contributor.author
Labarta Mancho, Jesús José
dc.contributor.author
Abel, François
dc.contributor.author
Ringlein, Burkhard
dc.contributor.author
Weiss, Beat
dc.identifier
De Haro, J. [et al.]. OmpSs@cloudFPGA: An FPGA task-based programming model with message passing. A: IEEE International Parallel and Distributed Processing Symposium. "2022 IEEE 36th International Parallel and Distributed Processing Symposium: 30 May-3 June 2022, virtual event: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2022, p. 828-838. ISBN 978-1-6654-8106-9. DOI 10.1109/IPDPS53621.2022.00085.
dc.identifier
978-1-6654-8106-9
dc.identifier
https://hdl.handle.net/2117/374059
dc.identifier
10.1109/IPDPS53621.2022.00085
dc.description.abstract
Nowadays, a new parallel paradigm for energy-efficient heterogeneous hardware infrastructures is required to achieve better performance at a reasonable cost on high-performance computing applications. Under this new paradigm, some application parts are offloaded to specialized accelerators that run faster or are more energy-efficient than CPUs.
Field-Programmable Gate Arrays (FPGA) are one of those types of accelerators that are becoming widely available in data centers.
This paper proposes OmpSs@cloudFPGA, which includes novel extensions to parallel task-based programming models that enable easy and efficient programming of heterogeneous clusters with FPGAs.
The programmer only needs to annotate, with OpenMP-like pragmas, the tasks of the application that should be accelerated in the cluster of FPGAs.
Next, the proposed programming model framework automatically extracts parts annotated with High-Level Synthesis (HLS) pragmas and synthesizes them into hardware accelerator cores for FPGAs.
Additionally, our extensions include and support two novel features: 1) FPGA-to-FPGA direct communication using a Message Passing Interface (MPI) similar Application Programming Interface (API) with one-to-one and collective communications to alleviate host communication channel bottleneck, and 2) creating and spawning work from inside the FPGAs to their own accelerator cores based on an MPI rank-like identification.
These features break the classical host-accelerator model, where the host (typically the CPU) generates all the work and distributes it to each accelerator.
We also present an evaluation of OmpSs@cloudFPGA for different parallel strategies of the N-Body application on the IBM cloudFPGA research platform.
Results show that for cluster sizes up to 56 FPGAs, the performance scales linearly.
To the best of our knowledge, this is the best performance obtained for N-body over FPGA platforms, reaching 344 Gpairs/s with 56 FPGAs.
Finally, we compare the performance and power consumption of the proposed approach with the ones obtained by a classical execution on the MareNostrum 4 supercomputer, demonstrating that our FPGA approach reduces power consumption by an order of magnitude.
dc.description.abstract
This work has been done in the context of the IBM/BSC Deep Learning Center initiative. This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 754337 (EuroEXA), from Spanish Government (PID2019-107255GBC21/AEI/10.13039/501100011033), and from Generalitat de Catalunya (2017-SGR-1414 and 2017-SGR-1328).
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (author's final draft)
dc.format
application/pdf
dc.publisher
Institute of Electrical and Electronics Engineers (IEEE)
dc.relation
https://ieeexplore.ieee.org/document/9820636
dc.relation
info:eu-repo/grantAgreement/EC/H2020/754337/EU/Co-designed Innovation and System for Resilient Exascale Computing in Europe: From Applications to Silicon/EuroEXA
dc.relation
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/
dc.subject
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors::Arquitectures paral·leles
dc.subject
Supercomputers -- Energy consumption
dc.subject
Application program interfaces (Computer software)
dc.subject
Parallel processing (Electronic computers)
dc.subject
Programming models
dc.subject
Network-attached FPGA
dc.subject
Stand-alone FPGA
dc.subject
High-level synthesis
dc.subject
Heterogeneous programming
dc.subject
High-performance computing
dc.subject
Supercomputadors -- Consum d'energia
dc.subject
Interfícies de programació d'aplicacions (Programari)
dc.subject
Processament en paral·lel (Ordinadors)
dc.title
OmpSs@cloudFPGA: An FPGA task-based programming model with message passing
dc.type
Conference report