TY - JOUR
T1 - High speed and high accuracy pre-classification method for OCR
T2 - Margin added hashing
AU - Katsuyama, Yutaka
AU - Hotta, Yoshinobu
AU - Omachi, Masako
AU - Omachi, Shinichiro
N1 - Copyright:
Copyright 2017 Elsevier B.V., All rights reserved.
PY - 2013/9
Y1 - 2013/9
N2 - Reducing the time complexity of character matching is critical to the development of efficient Japanese Optical Character Recognition (OCR) systems. To shorten the processing time, recognition is usually split into separate pre-classification and precise recognition stages. For high overall recognition performance, the pre-classification stage must both have very high classification accuracy and return only a small number of putative character categories for further processing. Furthermore, for any practical system, the speed of the pre-classification stage is also critical. The associative matching (AM) method has often been used for fast preclassification because of its use of a hash table and reliance on just logical bit operations to select categories, both of which make it highly efficient. However, a certain level of redundancy exists in the hash table because it is constructed using only the minimum and maximum values of the data on each axis and therefore does not take account of the distribution of the data. We propose a novel method based on the AM method that satisfies the performance criteria described above but in a fraction of the time by modifying the hash table to reduce the range of each category of training characters. Furthermore, we show that our approach outperforms pre-classification by VQ clustering, ANN, LSH and AM in terms of classification accuracy, reducing the number of candidate categories and total processing time across an evaluation test set comprising 116,528 Japanese character images.
AB - Reducing the time complexity of character matching is critical to the development of efficient Japanese Optical Character Recognition (OCR) systems. To shorten the processing time, recognition is usually split into separate pre-classification and precise recognition stages. For high overall recognition performance, the pre-classification stage must both have very high classification accuracy and return only a small number of putative character categories for further processing. Furthermore, for any practical system, the speed of the pre-classification stage is also critical. The associative matching (AM) method has often been used for fast preclassification because of its use of a hash table and reliance on just logical bit operations to select categories, both of which make it highly efficient. However, a certain level of redundancy exists in the hash table because it is constructed using only the minimum and maximum values of the data on each axis and therefore does not take account of the distribution of the data. We propose a novel method based on the AM method that satisfies the performance criteria described above but in a fraction of the time by modifying the hash table to reduce the range of each category of training characters. Furthermore, we show that our approach outperforms pre-classification by VQ clustering, ANN, LSH and AM in terms of classification accuracy, reducing the number of candidate categories and total processing time across an evaluation test set comprising 116,528 Japanese character images.
KW - Associative matching method
KW - Clustering
KW - Hash table
KW - OCR
KW - Pre-classification
UR - http://www.scopus.com/inward/record.url?scp=84883503525&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84883503525&partnerID=8YFLogxK
U2 - 10.1587/transinf.E96.D.2087
DO - 10.1587/transinf.E96.D.2087
M3 - Article
AN - SCOPUS:84883503525
SN - 0916-8532
VL - E96-D
SP - 2087
EP - 2095
JO - IEICE Transactions on Information and Systems
JF - IEICE Transactions on Information and Systems
IS - 9
ER -