Abstract:
|
The growing influence of wire delay in cache design has meant that access latencies to last-level cache banks are no longer constant. Non-Uniform Cache Architectures (NUCAs)have been proposed to address this problem. Furthermore, an efficient last-level cache is crucial in chip multiprocessors (CMP) architectures to reduce requests to the offchip memory, because of the significant speed gap between
processor and memory and the limited memory bandwidth.
Therefore, a bank replacement policy that efficiently manages the NUCA cache is desirable. However, the decentralized
nature of NUCA has prevented previously proposed replacement policies from being effective in this kind of caches.
As banks operate independently of each other, their replacement decisions are restricted to a single NUCA bank. We
propose a novel mechanism based on the bank replacement policy for NUCA caches on CMP, called The Auction. This mechanism enables the replacement decisions taken in a single
bank to be spread to the whole NUCA cache. Thus, global replacement policies that rely on the current state of the NUCA cache, such as evicting the least frequently accessed
data in the whole NUCA cache, are now feasible. Moreover, The Auction adapts to current program behaviour in order to relocate a line that is being evicted from a bank in the NUCA cache to the most suitable position in the whole cache. We propose, implement and evaluate three approaches
of The Auction mechanism.We also show that The Auction manages the cache efficiently and significantly reduces the requests to the off-chip memory by increasing the
hit ratio in the NUCA cache. This translates into an average IPC improvement of 8%, and reduces energy consumed by the memory system by 4%. |