Performance and scalability analysis of a chip multi vector processor

Yoshiei Sato, Akihiro Musa, Ryusuke Egawa, Hiroyuki Takizawa, Koki Okabe, Hiroaki Kobayashi

Research output: Contribution to conferencePaperpeer-review


To realize more efficient and powerful computations on a vector processor, a chip multi vector processor (CMVP) has been proposed as a next generation vector processor. However, the usefulness of CMVP for scientific applications has been unclear. The objective of this paper is to clarify the potential of CMVP. Although the computational performance of CMVP increases with the number of cores, the ratio of memory bandwidth to computational performance (B/F) will decrease. To cover the insufficient B/F, CMVP has a shared vector cache. Therefore, to exploit the potential of CMVP, applications for CMVP should be optimized not only with conventional tuning techniques to improve the efficiency of vector operations, but also with new techniques to effectively use the vector cache. Under this situation, this paper presents a performance tuning strategy for CMVP. The strategy analyzes the performance bottleneck of an application to find the best combination of tuning techniques. The performance and scalability improvements due to the tuning strategy are evaluated using real applications. The evaluation results clarify that performance tuning becomes more important as the number of cores increases.

Original languageEnglish
Number of pages18
Publication statusPublished - 2012
Event2011 14th Teraflop Workshop - Stuttgart, Germany
Duration: 2011 Dec 52011 Dec 6


Conference2011 14th Teraflop Workshop


Dive into the research topics of 'Performance and scalability analysis of a chip multi vector processor'. Together they form a unique fingerprint.

Cite this