TY - JOUR
T1 - Memory allocation exploiting temporal locality for reducing data-transfer bottlenecks in heterogeneous multicore processors
AU - Waidyasooriya, Hasitha Muthumala
AU - Ohbayashi, Yosuke
AU - Hariyama, Masanori
AU - Kameyama, Michitaka
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011/10
Y1 - 2011/10
N2 - High performance and low-power very large-scale integrations are required to implement complex media processing applications on mobile devices. Heterogeneous multicore processors are a promising way to achieve this objective. They contain multiple accelerator cores and CPU cores to increase the processing speed. Since media processing applications access a huge amount of data, fast address generation is very important. To increase the address generation speed, accelerator cores contain address generation units (AGUs). To reduce the power consumption, the AGUs have limited hardware resources such as adders and counters. Therefore, the AGUs generate simple addressing patterns where the address increases linearly in each clock cycle. Media processing applications frequently encounter addressing patterns where the same data are accessed in different time slots. To implement such addressing patterns, the same data have to be allocated into multiple memory addresses in such a way that those addresses can be generated by the AGUs. Allocation of the same data in multiple addresses is called the data-duplication. The data-duplication increases the data-transfer time and also the total processing time significantly. To remove such data-transfer bottlenecks, this paper proposes a memory allocation method that exploits the temporal and spatial locality of the memory access in media processing applications. We evaluate the proposed method using media processing applications to validate its effectiveness. According to the results, the proposed method reduces the total processing time by 14% to more than 85% compared to previous works.
AB - High performance and low-power very large-scale integrations are required to implement complex media processing applications on mobile devices. Heterogeneous multicore processors are a promising way to achieve this objective. They contain multiple accelerator cores and CPU cores to increase the processing speed. Since media processing applications access a huge amount of data, fast address generation is very important. To increase the address generation speed, accelerator cores contain address generation units (AGUs). To reduce the power consumption, the AGUs have limited hardware resources such as adders and counters. Therefore, the AGUs generate simple addressing patterns where the address increases linearly in each clock cycle. Media processing applications frequently encounter addressing patterns where the same data are accessed in different time slots. To implement such addressing patterns, the same data have to be allocated into multiple memory addresses in such a way that those addresses can be generated by the AGUs. Allocation of the same data in multiple addresses is called the data-duplication. The data-duplication increases the data-transfer time and also the total processing time significantly. To remove such data-transfer bottlenecks, this paper proposes a memory allocation method that exploits the temporal and spatial locality of the memory access in media processing applications. We evaluate the proposed method using media processing applications to validate its effectiveness. According to the results, the proposed method reduces the total processing time by 14% to more than 85% compared to previous works.
KW - Dynamic reconfiguration
KW - heterogeneous multicore
KW - memory allocation
KW - multicontext FPGA
UR - http://www.scopus.com/inward/record.url?scp=80053530192&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80053530192&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2011.2162277
DO - 10.1109/TCSVT.2011.2162277
M3 - Article
AN - SCOPUS:80053530192
SN - 1051-8215
VL - 21
SP - 1453
EP - 1466
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 10
M1 - 5955105
ER -