Inherently workload-balanced clustered microarchitecture

Altres autors/es

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors

Data de publicació

2005

Resum

The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.


Peer Reviewed


Postprint (published version)

Tipus de document

Conference report

Llengua

Anglès

Publicat per

Institute of Electrical and Electronics Engineers (IEEE)

Documents relacionats

http://ieeexplore.ieee.org/document/1419837/

Citació recomanada

Aquesta citació s'ha generat automàticament.

Drets

Open Access

Aquest element apareix en la col·lecció o col·leccions següent(s)

E-prints [72986]