Loading [MathJax]/extensions/MathZoom.js
High Throughput Hardware/Software Heterogeneous System for RRPN-Based Scene Text Detection | IEEE Journals & Magazine | IEEE Xplore

High Throughput Hardware/Software Heterogeneous System for RRPN-Based Scene Text Detection


Abstract:

Rotation Region Proposal Networks (RRPN) are used to generate rotated proposals with the information of text angle for arbitrary oriented scene text detection (STD). Howe...Show More

Abstract:

Rotation Region Proposal Networks (RRPN) are used to generate rotated proposals with the information of text angle for arbitrary oriented scene text detection (STD). However, the computational complexity of RRPN inference is relatively high compared with other methods, which makes it difficult for massive deployment. In this paper, the first full-stack FPGA-CPU heterogeneous system design of RRPN-based STD algorithm is proposed. A hardware/software partition method is presented to analyze and split the tasks to enhance the computation efficiency of hardware. The fast 2D Winograd algorithm and block floating point are utilized to reduce computation complexity while maintaining a relatively high precision. The implementation results show that the peak performance of MAC arrays in the proposed architecture reaches 655.4 GOPS and the energy efficiency achieves 64.9 GOPS/W. By fully exploiting the parallel and pipelined merits in the algorithms, the first hardware architectures for skew non-maximum suppression (S-NMS) layer and rotation region-of-interest (RRoI) polling layer are proposed. The throughput of the proposed hardware/software heterogeneous system achieves 40 times and 1.4 times improvements compared with CPU and GPU, respectively. Moreover, the comprehensive operating expense ratio of pure CPU, GPU, and the proposed system is 80.7:2.5:1, which indicates that it is suitable for massive deployment.
Published in: IEEE Transactions on Computers ( Volume: 71, Issue: 7, 01 July 2022)
Page(s): 1507 - 1521
Date of Publication: 24 June 2021

ISSN Information:

Funding Agency:

References is not available for this document.

1 Introduction

Scene text detection (STD) and recognition from natural scene images are important research topics in computer vision [1]. Current text detection and recognition techniques have been deeply applied in many industries such as finance, insurance, medical care, transportation, education, etc. The scenarios involving pictures or videos include e-commerce text translation, user-made content review, content/advertising recommendation distribution, and so on. While these business scenarios need to process tens of billions of data every day, the number of requests for these algorithms is still increasing significantly.

Select All
1.
Y. Zhu, C. Yao and X. Bai, "Scene text detection and recognition: Recent advances and future trends", Front. Comput. Sci., vol. 10, no. 1, pp. 19-36, 2016.
2.
M. Liao, B. Shi, X. Bai, X. Wang and W. Liu, "TextBoxes: A fast text detector with a single deep neural network", Proc. 31st AAAI Conf. Artif. Intell., pp. 4161-4167, 2017.
3.
X. Zhou et al., "EAST: An efficient and accurate scene text detector", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2642-2651, 2017.
4.
Q. Ye and D. Doermann, "Text detection and recognition in imagery: A survey", IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 7, pp. 1480-1500, Jul. 2015.
5.
Z. Cheng, Y. Xu, F. Bai, Y. Niu, S. Pu and S. Zhou, "AON: Towards arbitrarily-oriented text recognition", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 5571-5579, 2018.
6.
M. Jaderberg, K. Simonyan, A. Vedaldi and A. Zisserman, "Reading text in the wild with convolutional neural networks", Int. J. Comput. Vis., vol. 116, no. 1, pp. 1-20, 2016.
7.
P. Lyu, C. Yao, W. Wu, S. Yan and X. Bai, "Multi-oriented scene text detection via corner localization and region segmentation", Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., pp. 7553-7563, 2018.
8.
J. Ma et al., "Arbitrary-oriented scene text detection via rotation proposals", IEEE Trans. Multimedia, vol. 20, no. 11, pp. 3111-3122, Nov. 2018.
9.
Y. Guan, N. Xu, C. Zhang, Z. Yuan and J. Cong, "Using data compression for optimizing FPGA-based convolutional neural network accelerators", Proc. Int. Workshop Adv. Parallel Process. Technol., pp. 14-26, 2017.
10.
S. Han et al., "ESE: Efficient speech recognition engine with sparse LSTM on FPGA", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 75-84, 2017.
11.
Y. Li, S. Zhang, X. Zhou and F. Ren, "Build a compact binary neural network through bit-level sensitivity and data pruning", Neurocomputing, vol. 398, pp. 45-54, 2020.
12.
R. Zhao et al., "Accelerating binarized convolutional neural networks with software-programmable FPGAS", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 15-24, 2017.
13.
S. Liang, S. Yin, L. Liu, W. Luk and S. Wei, "FP-BNN: Binarized neural network on FPGA", Neurocomputing, vol. 275, pp. 1072-1086, 2018.
14.
E. Wang, J. J. Davis, P. Y. K. Cheung and G. Constantinides, "LUTNet: Learning FPGA configurations for highly efficient neural network inference", IEEE Trans. Comput., vol. 69, no. 12, pp. 1795-1808, Dec. 2020.
15.
Q. Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma and B. Yu, "Recent advances in convolutional neural network acceleration", Neurocomputing, vol. 323, pp. 37-51, 2019.
16.
R. Zhao, X. Niu and W. Luk, "Automatic optimising CNN with depthwise separable convolution on FPGA: (Abstract only)", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 285-285, 2018.
17.
Y. Ma, Y. Cao, S. Vrudhula and J.-S. Seo, "Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 45-54, 2017.
18.
K. Guo et al., "Angel-eye: A complete design flow for mapping CNN onto embedded FPGA", IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 37, no. 1, pp. 35-47, Jan. 2018.
19.
Y. Yu, Y. Li, S. Che, N. K. Jha and W. Zhang, "Software-defined design space exploration for an efficient DNN accelerator architecture", IEEE Trans. Comput., vol. 70, no. 1, pp. 45-56, Jan. 2021.
20.
E. Nurvitadhi et al., "Can FPGAs beat GPUs in accelerating next-generation deep neural networks?", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 5-14, 2017.
21.
H. Zeng, R. Chen, C. Zhang and V. Prasanna, "A framework for generating high throughput CNN implementations on FPGAs", Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, pp. 117-126, 2018.
22.
J. Wang, J. Lin and Z. Wang, "Efficient hardware architectures for deep convolutional neural network", IEEE Trans. Circuits Syst. I: Regular Papers, vol. 65, no. 6, pp. 1941-1953, Jun. 2018.
23.
M. Ferianc, H. Fan, R. S. W. Chu, J. Stano and W. Luk, "Improving performance estimation for FPGA-based accelerators for convolutional neural networks", Proc. Int. Symp. Appl. Reconfigurable Comput., pp. 3-13, 2020.
24.
S. Mittal, "A survey of FPGA-based accelerators for convolutional neural networks", Neural Comput. Appl., vol. 32, pp. 1109-1139, 2020.
25.
X. Yu et al., "A data-center FPGA acceleration platform for convolutional neural networks", Proc. 29th Int. Conf. Field Programmable Logic Appl., pp. 151-158, 2019.
26.
S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks", Proc. Adv. Neural Inf. Process. Syst., pp. 91-99, 2015.
27.
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", 2014.
28.
M. Jaderberg, A. Vedaldi and A. Zisserman, "Deep features for text spotting", Proc. Eur. Conf. Comput. Vis., pp. 512-528, 2014.
29.
B. Shi, X. Bai and S. Belongie, "Detecting oriented text in natural images by linking segments", Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 3482-3490, 2017.
30.
X. Lian, Z. Liu, Z. Song, J. Dai, W. Zhou and X. Ji, "High-performance FPGA-based CNN accelerator with block-floating-point arithmetic", IEEE Trans. Very Large Scale Integration Syst., vol. 27, no. 8, pp. 1874-1885, Aug. 2019.
Contact IEEE to Subscribe

References

References is not available for this document.