TY - JOUR
T1 - High-throughput low-energy self-timed CAM based on reordered overlapped search mechanism
AU - Onizawa, Naoya
AU - Matsunaga, Shoun
AU - Gaudet, Vincent C.
AU - Gross, Warren J.
AU - Hanyu, Takahiro
PY - 2014/3
Y1 - 2014/3
N2 - This paper introduces a reordered overlapped search mechanism for high-throughput low-energy content-addressable memories (CAMs). Most mismatches can be found by searching a few bits of a search word. To lower power dissipation, a word circuit is often divided into two sections that are sequentially searched or even pipelined. Because of this process, most of match lines in the second section are unused. Since searching the last few bits is very fast compared to searching the rest of the bits, we propose to increase throughput by asynchronously initiating second-stage searches on the unused match lines as soon as a first-stage search is complete. In our circuit implementation, each word circuit is independently controlled by a locally generated timing signal rather than a global signal. This allows the circuits to be in the required phase for their own local operation: evaluate or precharge, instead of having to synchronize their phase to the rest of the word circuits, which greatly reduces the cycle time. As a design example, a 128 %times; 64-bit CAM is implemented and evaluated by HSPICE simulation under a 90 nm CMOS technology. The proposed asynchronous CAM operates 5.98 times faster than a synchronous CAM with 14.2% smaller energy dissipation. The post-layout proposed CAM achieves 385-ps cycle delay time and 0.773 fJ/bit/search and is also evaluated under different corner conditions and PVT variations to guarantee it operates properly.
AB - This paper introduces a reordered overlapped search mechanism for high-throughput low-energy content-addressable memories (CAMs). Most mismatches can be found by searching a few bits of a search word. To lower power dissipation, a word circuit is often divided into two sections that are sequentially searched or even pipelined. Because of this process, most of match lines in the second section are unused. Since searching the last few bits is very fast compared to searching the rest of the bits, we propose to increase throughput by asynchronously initiating second-stage searches on the unused match lines as soon as a first-stage search is complete. In our circuit implementation, each word circuit is independently controlled by a locally generated timing signal rather than a global signal. This allows the circuits to be in the required phase for their own local operation: evaluate or precharge, instead of having to synchronize their phase to the rest of the word circuits, which greatly reduces the cycle time. As a design example, a 128 %times; 64-bit CAM is implemented and evaluated by HSPICE simulation under a 90 nm CMOS technology. The proposed asynchronous CAM operates 5.98 times faster than a synchronous CAM with 14.2% smaller energy dissipation. The post-layout proposed CAM achieves 385-ps cycle delay time and 0.773 fJ/bit/search and is also evaluated under different corner conditions and PVT variations to guarantee it operates properly.
KW - associative memory
KW - Asynchronous circuits
KW - NAND-type CAM
KW - pre-computation
UR - http://www.scopus.com/inward/record.url?scp=84895930768&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84895930768&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2013.2283997
DO - 10.1109/TCSI.2013.2283997
M3 - Article
AN - SCOPUS:84895930768
SN - 1549-8328
VL - 61
SP - 865
EP - 876
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 3
M1 - 6642145
ER -