Inherently workload-balanced clustered microarchitecture

Otros/as autores/as

Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors

Universitat Politècnica de Catalunya. ARCO - Microarquitectura i Compiladors

Fecha de publicación

2005

Resumen

The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.


Peer Reviewed


Postprint (published version)

Tipo de documento

Conference report

Lengua

Inglés

Publicado por

Institute of Electrical and Electronics Engineers (IEEE)

Documentos relacionados

http://ieeexplore.ieee.org/document/1419837/

Citación recomendada

Esta citación se ha generado automáticamente.

Derechos

Open Access

Este ítem aparece en la(s) siguiente(s) colección(ones)

E-prints [73012]