A Weight Ternary Ensemble Vision Transformer Toward Memory Size Reduction

Ryota Kayanoma, Hiroki Nakahara

研究成果: 書籍の章/レポート/Proceedings会議への寄与査読

抄録

Vision Transformer (ViT) is an image recognition model with advanced accuracy. However, there is a problem with the large parameter size. A ternary representation that introduces zero to the binary representation of {-1, + 1} can be applied to reduce the parameter size. We define it as weight ternary ViT, Although the weight ternary ViT can reduce the parameter size, the recognition accuracy decreases compared to ViT with float32 accuracy. Therefore, we introduce an ensemble method that computes multiple weight ternary ViTs in parallel. Recognition accuracy can be improved by majority voting through ensemble formation. In this paper, we propose a method for adding a normalization layer to advance the training of a weight ternary ViT and a training algorithm. Next, we will describe the training method of an ensemble weight ternary ViT. We evaluate the ensemble weight ternary ViT using CIFAR10 benchmark images, an image classification task. As a result, we achieve recognition accuracy equivalent to the original float32 precision ViT by using ten weight ternary ViTs with 80% zero representation applied. The parameter size is reduced by 87.5%.

本文言語英語
ホスト出版物のタイトルProceedings - 2024 IEEE 54th International Symposium on Multiple-Valued Logic, ISMVL 2024
出版社IEEE Computer Society
ページ155-160
ページ数6
ISBN(電子版)9798350343083
DOI
出版ステータス出版済み - 2024
イベント54th IEEE International Symposium on Multiple-Valued Logic, ISMVL 2024 - Brno, チェコ共和国
継続期間: 2024 5月 282024 5月 30

出版物シリーズ

名前Proceedings of The International Symposium on Multiple-Valued Logic
ISSN(印刷版)0195-623X

会議

会議54th IEEE International Symposium on Multiple-Valued Logic, ISMVL 2024
国/地域チェコ共和国
CityBrno
Period24/5/2824/5/30

フィンガープリント

「A Weight Ternary Ensemble Vision Transformer Toward Memory Size Reduction」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル