banner

CETR: CenterNet-Vision transformer model for wheat head detection

K. G. Suma, Gurram Sunitha, Ramesh Karnati, E. R. Aruna, Kachi Anvesh, Navnath Kale, P. Krishna Kishore

Abstract


Wheat head detection is a critical task in precision agriculture for estimating crop yield and optimizing agricultural practices. Conventional object detection architectures often struggle with detecting densely packed and overlapping wheat heads in complex agricultural field images. To address this challenge, a novel CEnternet-vision TRansformer model for Wheat Head Detection (CETR) is proposed. CETR model combines the strengths of two cutting-edge technologies—CenterNet and Vision Transformer. A dataset of agricultural farm images labeled with precise wheat head annotations is used to train and evaluate the CETR model. Comprehensive experiments were conducted to compare CETR’s performance against convolutional neural network model commonly used in agricultural applications. The higher mAP value of 0.8318 for CETR compared against AlexNet, VGG19, ResNet152 and MobileNet indicates that the CETR model is more effective in detecting wheat heads in agricultural images. It achieves a higher precision in predicting bounding boxes that align well with the ground truth, resulting in more accurate and reliable wheat head detection. The higher performance of CETR can be attributed to the combination of CenterNet and ViT as a two-stage architecture taking advantage of both methods. Moreover, the transformer-based architecture of CETR enables better generalization across different agricultural environments, making it a suitable solution for automated agricultural applications.


Keywords


wheat head detection; CenterNet; vision transformer; object detection; smart agriculture

Full Text:

PDF

References


1. Khaki S, Safaei N, Pham H, et al. WheatNet: A lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. Neurocomputing. 2022, 489: 78-89. doi: 10.1016/j.neucom.2022.03.017

2. Khan S, Naseer M, Hayat M, et al. Transformers in Vision: A Survey. ACM Computing Surveys. 2022, 54(10s): 1-41. doi: 10.1145/3505244

3. Duan K, Bai S, Xie L, et al. Centernet: Keypoint triplets for object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019; IEEE, pp. 6569-6578.

4. Han K, Wang Y, Chen H, et al. A Survey on Vision Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2023, 45(1): 87-110. doi: 10.1109/tpami.2022.3152247

5. Dosovitskiy A, Beyer L, Kolesnikov A et al. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929.

6. Nennuri R, Kumar RH, Prathyusha G, et al. A Multi-Stage Deep Model for Crop Variety and Disease Prediction. 14th International Conference on Soft Computing and Pattern Recognition 2023, 48: 52-59. doi: 10.1007/978-3-031-27524-1_6

7. Charan NS, Narasimhulu T, Bhanu Kiran G, et al. Solid Waste Management using Deep Learning 14th International Conference on Soft Computing and Pattern 2023, 648: 44-51. doi: 10.1007/978-3-031-27524-1_5

8. Shereesha M, Hemavathy C, Teja H, et al. Precision Mango Farming: Using Compact Convolutional Transformer for Disease Detection. 13th International Conference on Innovations in Bio-Inspired Computing and Applications 2023, 649: 458-465. doi: 10.1007/978-3-031-27499-2_43

9. Balakrishna N, Sunitha G, Karthik A, et al. Tomato Leaf Disease Detection Using Deep Learning: A CNN Approach. International Conference on Data Science, Agents & Artificial Intelligence 2022, IEEE.

10. Sudarsana Murthy D. An Investigative Study of Shallow, Deep and Dense Learning Models for Breast Cancer Detection based on Microcalcifications. 2022 International Conference on Data Science, Agents & Artificial Intelligence. pp. 1-6.

11. Thatikonda SS. Vision Transformer based ResNet Model for Pneumonia Prediction. 4th International Conference on Electronics and Sustainable Communication Systems 2023, IEEE.

12. Kumar LA, Renuka DK, Rose SL, et al. Deep learning based assistive technology on audio visual speech recognition for hearing impaired. International Journal of Cognitive Computing in Engineering. 2022, 3: 24-30. doi: 10.1016/j.ijcce.2022.01.003

13. Carion N, Massa F, Synnaeve G, et al. End-to-End Object Detection with Transformers. Lecture Notes in Computer Science. 2020, 213-229. doi: 10.1007/978-3-030-58452-8_13

14. Girshick R, Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision 2015, pp. 1440-1448. doi: 10.1109/iccv.2015.169

15. Henry EU, Emebob O, Omonhinmin CA. Vision transformers in medical imaging: A review. arXiv 2022, arXiv:2211.10043.

16. Luo J, Li B, Leung C, A Survey of Computer Vision Technologies In Urban and Controlled-environment Agriculture. arXiv 2022, arXiv:2210.11318.

17. Wu S, Sun Y, Huang H. Multi-granularity Feature Extraction Based on Vision Transformer for Tomato Leaf Disease Recognition. 3rd International Academic Exchange Conference on Science and Technology Innovation 2021. IEEE, pp. 387-390.

18. Li X, Fan W, Wang Y et al. Detecting Plant Leaves Based on Vision Transformer Enhanced YOLOv5 3rd International Conference on Pattern Recognition and Machine Learning 2022, Springer, pp. 32-37.

19. David E, Madec S, Sadeghi-Tehran PH et al. Global Wheat Head Detection (GWHD) dataset: a large and diverse dataset of high-resolution RGB-labelled images to develop and benchmark wheat head detection methods. Plant Phenomics. 2020. doi: 10.34133/2020/3521852




DOI: https://doi.org/10.32629/jai.v7i3.1189

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 K. G. Suma, Gurram Sunitha, Ramesh Karnati, E. R. Aruna, Kachi Anvesh, Navnath Kale, P. Krishna Kishore

License URL: https://creativecommons.org/licenses/by-nc/4.0/