Abstract
This paper discusses the potential of an on-chip cache memory for modern vector supercomputers. The vector supercomputers can achieve the high computational efficiency for compute-intensive scientific applications. The most important factor affecting the computational performance is high memory bandwidth to provide a sufficient amount of data to the rich arithmetic units in time; the modern vector supercomputers such as NEC SX-7 and SX-8 have 4 bytes per flop (4B/FLOP) on the ratio of memory bandwidth to floating-point operations. However, the gap in performance between memory and processors has become remarkably exposed year by year in high performance computing. Therefore, it is getting harder to keep the 4B/FLOP memory bandwidth in design of future vector supercomputers. As a promising solution to cover a lack of the memory bandwidths of vector load/store units of the future vector supercomputers, we design an on-chip vector cache for the NEC SX vector processor architecture. This paper evaluates the performance of the on-chip cache memory system on the SX-7 system with 2B/FLOP or lower memory bandwidth across two kernel loops and five leading scientific applications. The results of the kernel loops demonstrate that a 2B/FLOP memory system with the on-chip cache whose hit ratio is 50% can achieve a performance comparable to that of a 4B/FLOP system without the cache. The results of the four applications indicate that the on-chip cache can improve sustained performance of the four applications by 20% to 98%. The experimental results regarding the last one show a conflicting effect of loop unrolling with vector caching, resulting in a poor hit rate. However, when loop-unrolling is disabled, its cache hit rate is improved, and the sustained performance comparable to that of the 4B/FLOP memory bandwidth without the loop-unrolling is obtained. In addition, selective caching, in which only a part of data with the high locality of reference are cached, is also effective for efficient use of the limited cache capacity.
Original language | English |
---|---|
Pages (from-to) | 17-23 |
Number of pages | 7 |
Journal | Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT |
DOIs | |
Publication status | Published - 2007 |
Event | 8th MEDEA Workshop on MEmory Performance: DEaling with Applications, Systems and Architecture, MEDEA '07, Held in Conjunction with the PACT 2007 Conference - Brasov, Romania Duration: 2007 Sept 16 → 2007 Sept 16 |
Keywords
- memory system
- performance characterization
- scientific application
- vector cache
- vector processing