TY - GEN
T1 - Text Detection by Faster R-CNN with Multiple Region Proposal Networks
AU - Nagaoka, Yoshito
AU - Miyazaki, Tomo
AU - Sugaya, Yoshihiro
AU - Omachi, Shinichiro
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/1/25
Y1 - 2018/1/25
N2 - We propose an end-to-end consistently trainable text detection method based on the Faster R-CNN. The original Faster R-CNN is an end-to-end CNN for fast and accurate object detection. By considering the characteristics of texts, a novel architecture that make use of its ability on object detection is proposed. Although the original Faster R-CNN generates region of interests (RoIs) by a region proposal network (RPN) using the feature map of the last convolutional layer, the proposed method generates RoIs by multiple RPNs using the feature maps of multiple convolutional layers. This method uses multiresolution feature maps to detect texts of various sizes simultaneously. To aggregate the RoIs, we introduce RoI-merge layer, and this layer enables to select valid RoIs from multiple RPNs effectively. In addition, a training strategy is proposed for realizing end-to-end training and making each RPN be specialized in text region size. Experimental results using ICDAR2013/2015 RRC test dataset show that the proposed Multi-RPN method improved detection scores and kept almost the same detection speed as compared to the original Faster R-CNN and recent methods.
AB - We propose an end-to-end consistently trainable text detection method based on the Faster R-CNN. The original Faster R-CNN is an end-to-end CNN for fast and accurate object detection. By considering the characteristics of texts, a novel architecture that make use of its ability on object detection is proposed. Although the original Faster R-CNN generates region of interests (RoIs) by a region proposal network (RPN) using the feature map of the last convolutional layer, the proposed method generates RoIs by multiple RPNs using the feature maps of multiple convolutional layers. This method uses multiresolution feature maps to detect texts of various sizes simultaneously. To aggregate the RoIs, we introduce RoI-merge layer, and this layer enables to select valid RoIs from multiple RPNs effectively. In addition, a training strategy is proposed for realizing end-to-end training and making each RPN be specialized in text region size. Experimental results using ICDAR2013/2015 RRC test dataset show that the proposed Multi-RPN method improved detection scores and kept almost the same detection speed as compared to the original Faster R-CNN and recent methods.
KW - Faster R-CNN
KW - Region Proposal Network
KW - Text detection
UR - http://www.scopus.com/inward/record.url?scp=85045233971&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045233971&partnerID=8YFLogxK
U2 - 10.1109/ICDAR.2017.343
DO - 10.1109/ICDAR.2017.343
M3 - Conference contribution
AN - SCOPUS:85045233971
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 15
EP - 20
BT - Proceedings - 7th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2017
PB - IEEE Computer Society
T2 - 7th International Workshop on Camera-Based Document Analysis and Recognition, CBDAR 2017
Y2 - 11 November 2017
ER -