Cache-aware optimization of matrix multiplication and matrix factorizations on multicore processors

Martínez Pérez, Héctor; Catalán Pallarés, Sandra; Igual Peña, Francisco D.; Herrero Zaragoza, José Ramón; Rodríguez Sánchez, Rafael; Quintana Ortí, Enrique Salvador

Cache-aware optimization of matrix multiplication and matrix factorizations on multicore processors

dc.contributor

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

dc.contributor

Universitat Politècnica de Catalunya. PM - Programming Models

dc.contributor.author

Martínez Pérez, Héctor

dc.contributor.author

Catalán Pallarés, Sandra

dc.contributor.author

Igual Peña, Francisco D.

dc.contributor.author

Herrero Zaragoza, José Ramón

dc.contributor.author

Rodríguez Sánchez, Rafael

dc.contributor.author

Quintana Ortí, Enrique Salvador

dc.date.issued

2025-09

dc.identifier

Martínez, H. [et al.]. Cache-aware optimization of matrix multiplication and matrix factorizations on multicore processors. «Cluster computing», Setembre 2025, vol. 28, article 779.

dc.identifier

1386-7857

dc.identifier

https://hdl.handle.net/2117/445011

dc.identifier

10.1007/s10586-025-05426-6

dc.description.abstract

This paper advocates for a careful customization of the special general matrix multiplication (GEMM) kernels that are invoked from blocked routines for several relevant matrix factorizations in LAPACK, in order to improve their performance on modern multicore processors with hierarchical cache memories. To achieve this, we leverage a refined analytical model to dynamically tune the cache configuration parameters of GEMM for these kernels, taking into account the matrix operands’ dimensions, in order to improve cache occupation. In addition, toward the same goal, we accommodate a flexible development of architecture-specific micro-kernels for GEMM that allows us to select the option that, depending on the operands’ dimensions, ameliorates cache utilization. Our experiments for the LU and QR factorizations on two platforms, equipped with ARM (NVIDIA Carmel) and x86 (AMD EPYC) multi-core processors, demonstrate the benefits of this approach in terms of a better cache utilization and, in general, higher performance. Moreover, they also reveal the delicate balance between optimizing for multi-threaded parallelism versus cache usage as well as the positive effects of software prefetching.

dc.description.abstract

This work was supported by grants PID2020- 113656RB-C22, PID2019-107255GB, PID2021-126576NB-I00 and PID2021-123627OB-C52 of MCIN/AEI/10.13039/501100011033, by ‘‘ERDF A way of making Europe’’, and 2021-SGR-01007 of the Generalitat de Catalunya. Héctor Martínez is a postdoctoral fellow supported by the Consejería de Transformación Económica, Industria, Conocimiento y Universidades de la Junta de Andalucía. Sandra Catalán was supported by the grant RYC2021-033973- I, funded by MCIN/AEI/10.13039/501100011033 and the European Union ‘‘NextGenerationEU’’/PRTR. Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

dc.description.abstract

Peer Reviewed

dc.description.abstract

Postprint (published version)

dc.format

application/pdf

dc.language

eng

dc.publisher

Kluwer Academic Publishers

dc.relation

https://link.springer.com/article/10.1007/s10586-025-05426-6

dc.relation

info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-107255GB-C22/ES/UPC-COMPUTACION DE ALTAS PRESTACIONES VIII/

dc.rights

http://creativecommons.org/licenses/by/4.0/

dc.rights

Open Access

dc.rights

Attribution 4.0 International

dc.subject

Àrees temàtiques de la UPC::Informàtica::Arquitectura de computadors

dc.subject

Dense linear algebra

dc.subject

Computer architecture

dc.subject

Multicore processors

dc.subject

Cache memory

dc.subject

Matrix factorization

dc.title

Cache-aware optimization of matrix multiplication and matrix factorizations on multicore processors

dc.type

Article

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

E-prints [72988]