dc.contributor
Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors
dc.contributor
Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors
dc.contributor
Barcelona Supercomputing Center
dc.contributor.author
López Paradís, Guillem
dc.contributor.author
Li, Brian
dc.contributor.author
Armejach Sanosa, Adrià
dc.contributor.author
Wallentowitz, Stefan
dc.contributor.author
Moretó Planas, Miquel
dc.contributor.author
Balkind, Jonathan
dc.identifier
López, G. [et al.]. Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi. A: Design, Automation and Test in Europe Conference and Exhibition. "2023 Design, Automation & Test in Europe Conference & Exhibition (DATE): Antwerp, Belgium, USA, 17-19 April 2023: proceedings". Institute of Electrical and Electronics Engineers (IEEE), 2023, ISBN 978-3-9819263-7-8. DOI 10.23919/DATE56975.2023.10137080.
dc.identifier
978-3-9819263-7-8
dc.identifier
https://hdl.handle.net/2117/390396
dc.identifier
10.23919/DATE56975.2023.10137080
dc.description.abstract
Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for the designer's benefit. However, tools largely remain restricted to a single machine and in the case of RTL simulation, we believe that this leaves much potential performance on the table. We introduce Metro-MPI to improve RTL simulation for modern 10 billion transistor-scale chips. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. Our implementation of Metro-MPI in Open-Piton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, opening new avenues for distributed RTL simulation of emerging system-on-chip designs. Compared to sequential and multithreaded RTL simulations of smaller designs, Metro-MPI achieves up to 135.98× and 9.29× speedups. Similarly, for a representative regression run, Metro-Mpireduces energy consumption by up to 2.53× and 2.91× .
dc.description.abstract
This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (contract PID2019-107255GB-C21), by the Generalitat de Catalunya (contract 2017-SGR-1328), by the European Union within the framework of the ERDF of Catalonia 2014-2020 under the DRAC project [001-P-001723], and by the Arm-BSC Center of Excellence. G. Lopez-Paradís has been supported by the Generalitat de Catalunya through a FI fellowship 2021FI-B00994 and GSoC 2021, and M. Moreto by a Ramon y Cajal fellowship no. RYC-2016-21104. A. Armejach is a Serra Hunter Fellow.
dc.description.abstract
Peer Reviewed
dc.description.abstract
Postprint (author's final draft)
dc.format
application/pdf
dc.publisher
Institute of Electrical and Electronics Engineers (IEEE)
dc.relation
https://ieeexplore.ieee.org/document/10137080
dc.relation
info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C21/ES/BSC - COMPUTACION DE ALTAS PRESTACIONES VIII/
dc.subject
Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors
dc.subject
High performance computing
dc.subject
Networks on a chip
dc.subject
Parallel processing (Electronic computers)
dc.subject
RTL Simulation
dc.subject
Network-on-Chip
dc.subject
Càlcul intensiu (Informàtica)
dc.subject
Processament en paral·lel (Ordinadors)
dc.title
Fast behavioural RTL simulation of 10B transistor SoC designs with Metro-Mpi
dc.type
Conference report