banner

STRTrans: An accurate scene text recognition based on improved transformer network

Prabu Selvam, Saravanan Palani, Marimuthu M, Elakkiya Rajasekar

Abstract


Text recognition represents a significant research domain within the field of computer vision. Specifically, scene text recognition (STR), which involves the identification of text within real-world scenes, presents a distinctive set of challenges. These challenges encompass the need for text to capture attention immediately, the potential for text distortion, and the influence of various factors like occlusion, noise, and obstructions during the image capture process. All of these elements significantly complicate the task of recognizing text within scenes. In this paper, we introduce STRTrans, a modified Transformer network designed to enhance the performance of STR. This enhancement addresses the shortcomings observed in the existing model, characterized by lower accuracy and difficulties in recognizing irregular text. The modification of the encoder structure involves the implementation of two consecutive layers of the self-attention (SA) mechanism and the reduction of the point-wise feed-forward layer. This modification aims to enable the network to interpret the semantic arrangement better. Our approach underwent experimental validation using three publicly available datasets and was benchmarked against other advanced methods. The experimental results consistently demonstrate the robust performance of our approach across all three benchmark tests, achieving recognition accuracies of 90.60%, 86.20%, and 86.90% in the IC15, SVT-P, and CUTE datasets, respectively. Moreover, the improved model comprehensively surpasses the existing approaches.


Keywords


text recognition; deep learning; transformer; attention; image rectification

Full Text:

PDF

References


1. Yang M, Yang B, Liao M, Zhu Y. and Bai X. Class-Aware Mask-guided feature refinement for scene text recognition. Pattern Recognition, 2024, 149: 110244. doi: 10.1016/j.patcog.2023.110244

2. Lu N, Yu W, Qi X, et al. MASTER: Multi-aspect non-local network for scene text recognition. Pattern Recognition. 2021, 117: 107980. doi: 10.1016/j.patcog.2021.107980

3. Sengan S, Priya V, Syed Musthafa A, et al. A fuzzy based high-resolution multi-view deep CNN for breast cancer diagnosis through SVM classifier on visual analysis. Varadarajan V, Kommers P, Piuri V, Subramaniyaswamy V, eds. Journal of Intelligent & Fuzzy Systems. 2020, 39(6): 8573-8586. doi: 10.3233/jifs-189174

4. Wang C, Liu CL. Multi-branch guided attention network for irregular text recognition. Neurocomputing. 2021, 425: 278-289. doi: 10.1016/j.neucom.2020.04.129

5. Zhang J, Luo C, Jin L, et al. SaHAN: Scale-aware hierarchical attention network for scene text recognition. Pattern Recognition Letters. 2020, 136: 205-211. doi: 10.1016/j.patrec.2020.06.009

6. Mu D, Sun W, Xu G, et al. Random Blur Data Augmentation for Scene Text Recognition. IEEE Access. 2021, 9: 136636-136646. doi: 10.1109/access.2021.3117035

7. Qiao Z, Zhou Y, Wei J, et al. PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition. Proceedings of the 29th ACM International Conference on Multimedia. Published online October 17, 2021: 1-10. doi: 10.1145/3474085.3475238

8. Phan TQ, Shivakumara P, Tian S, et al. Recognizing Text with Perspective Distortion in Natural Scenes. 2013 IEEE International Conference on Computer Vision. Published online December 2013: 1-13. doi: 10.1109/iccv.2013.76

9. Liu W, Chen C, Wong KYeeK, et al. STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition. Procedings of the British Machine Vision Conference 2016. Published online 2016. doi: 10.5244/c.30.43

10. Selvam P, Koilraj JAS. A Deep Learning Framework for Grocery Product Detection and Recognition. Food Analytical Methods. 2022, 15(12): 3498-3522. doi: 10.1007/s12161-022-02384-2

11. Lee CY, Osindero S. Recursive Recurrent Nets with Attention Modeling for OCR in the Wild. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Published online June 2016: 2231-2239. doi: 10.1109/cvpr.2016.245

12. Risnumawan A, Shivakumara P, Chan CS, et al. A robust arbitrary text detection system for natural scene images. Expert Systems with Applications. 2014, 41(18): 8027-8048. doi: 10.1016/j.eswa.2014.07.008

13. Yu D, Li X, Zhang C, et al. Towards accurate scene text recognition with semantic reasoning networks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Published online June 2020: 1-10. doi: 10.1109/cvpr42600.2020.01213.

14. Xia S, Kou J, Liu N, et al. Scene text recognition based on two-stage attention and multi-branch feature fusion module. Applied Intelligence. 2022, 53(11): 14219-14232. doi: 10.1007/s10489-022-04241-5

15. Wu L, Xu Y, Hou J, et al. A Two-Level Rectification Attention Network for Scene Text Recognition. IEEE Transactions on Multimedia. 2023, 25: 2404-2414. doi: 10.1109/tmm.2022.3146779

16. Dai P, Zhang H, Cao X. SLOAN: Scale-Adaptive Orientation Attention Network for Scene Text Recognition. IEEE Transactions on Image Processing. 2021, 30: 1687-1701. doi: 10.1109/tip.2020.3045602

17. Luan X, Zhang J, Xu M, et al. Lightweight Scene Text Recognition Based on Transformer. Sensors. 2023, 23(9): 4490. doi: 10.3390/s23094490

18. Selvam P, Koilraj JAS, Romero CAT, et al. A Transformer-Based Framework for Scene Text Recognition. IEEE Access. 2022, 10: 100895-100910. doi: 10.1109/access.2022.3207469

19. Kwon H, Lee S. Detecting textual adversarial examples through text modification on text classification systems. Applied Intelligence. 2023, 53(16): 19161-19185. doi: 10.1007/s10489-022-03313-w

20. Kwon H, Lee S. Ensemble transfer attack targeting text classification systems. Computers & Security. 2022, 117: 102695. doi: 10.1016/j.cose.2022.102695

21. Xue C, Huang J, Zhang W, et al. Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. Published online 2023: 1-14. doi: 10.1109/tpami.2022.3230962

22. Yan X, Fang Z, Jin Y. An adaptive n-gram transformer for multi-scale scene text recognition. Knowledge-Based Systems. 2023, 280: 110964. doi: 10.1016/j.knosys.2023.110964

23. Yang X, Silamu W, Xu M, et al. Display-Semantic Transformer for Scene Text Recognition. Sensors. 2023, 23(19): 8159. doi: 10.3390/s23198159

24. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. In: Proceedings of the 31st on Neural Information Processing Systems (NIPS 2017); Long Beach, CA, USA, 2017: 1-11.

25. Zheng T, Chen Z, Bai J, et al. TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition. Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence. August 2023: 1777-1785. doi: 10.24963/ijcai.2023/197

26. Shi B, Wang X, Lyu P, et al. Robust Scene Text Recognition with Automatic Rectification. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Published online June 2016: 4168-4176. doi: 10.1109/cvpr.2016.452

27. Shi B, Yang M, Wang X, et al. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2019, 41(9): 2035-2048. doi: 10.1109/tpami.2018.2848939

28. Prabu S, Joseph Abraham Sundar K. Enhanced Attention-Based Encoder-Decoder Framework for Text Recognition. Intelligent Automation & Soft Computing. 2023, 35(2): 2071-2086. doi: 10.32604/iasc.2023.029105

29. Bhunia AK, Sain A, Chowdhury PN, et al. Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Published online October 2021: 963-972. doi: 10.1109/iccv48922.2021.00102.

30. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv. 2014, arXiv:1406.2227. 2014: 1-10. doi: 10.48550/arXiv.1406.2227

31. Karatzas D, Gomez-Bigorda L, Nicolaou A, et al. ICDAR 2015 competition on Robust Reading. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). Published online August 2015: 1156-1160. doi: 10.1109/icdar.2015.7333942

32. Borisyuk F, Gordo A, Sivakumar V. Rosetta: Large scale system for text detection and recognition in images. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Published online July 19, 2018: 1-9. doi: 10.1145/3219819.3219861

33. Qiao Z, Zhou Y, Yang D, et al. SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition. in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Published online June 2020: 1-10. doi: 10.1109/cvpr42600.2020.01354




DOI: https://doi.org/10.32629/jai.v7i5.1334

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Prabu Selvam, Saravanan Palani, Marimuthu M, Elakkiya Rajasekar

License URL: https://creativecommons.org/licenses/by-nc/4.0/