To realize more efficient and powerful computations on a vector processor, a chip multi vector processor (CMVP) has been proposed as a next generation vector processor. However, the usefulness of CMVP for scientific applications has been unclear. The objective of this paper is to clarify the potential of CMVP. Although the computational performance of CMVP increases with the number of cores, the ratio of memory bandwidth to computational performance (B/F) will decrease. To cover the insufficient B/F, CMVP has a shared vector cache. Therefore, to exploit the potential of CMVP, applications for CMVP should be optimized not only with conventional tuning techniques to improve the efficiency of vector operations, but also with new techniques to effectively use the vector cache. Under this situation, this paper presents a performance tuning strategy for CMVP. The strategy analyzes the performance bottleneck of an application to find the best combination of tuning techniques. The performance and scalability improvements due to the tuning strategy are evaluated using real applications. The evaluation results clarify that performance tuning becomes more important as the number of cores increases.
|Number of pages||18|
|Publication status||Published - 2012|
|Event||2011 14th Teraflop Workshop - Stuttgart, Germany|
Duration: 2011 Dec 5 → 2011 Dec 6
|Conference||2011 14th Teraflop Workshop|
|Period||11/12/5 → 11/12/6|