banner

Dynamic convolution layer based optimization techniques for object classification and semantic segmentation

Jaswinder Singh, B. K. Sharma

Abstract


Providing meaningful classification for each pixel in an image is a primary goal of computer vision, and the tasks of object classification and semantic segmentation are among the field’s greatest challenges. To improve object classification, this study presents a novel method that combines semantic segmentation with dynamic convolution layer-based optimization techniques. In the proposed method, a Refined Convolution Neural Network (R-CNN) is used, which uses non-extensive entropy to dynamically increase the size of its convolutional layers. The Common Objects in Context (COCO) dataset is used to assess the performance of the model. The model performs exceptionally well at different Intersections over Union (IoU) cutoffs, with average precision values of 40.1, 61.9, and 45.4, respectively, for Average Precision (AP), AP50, and AP75. These results demonstrate the model’s efficiency in discriminating between various image contents. Additionally, the model predicts an image’s outcome on average in just 0.901 s. The model has been proven to be superior through various performance evaluation parameters, showing an average mean precision of 91.78%. This study demonstrates the power of combining dynamic convolution layers with semantic segmentation to improve object classification accuracy, a key component in the development of computer vision applications.


Keywords


deep learning; object classification; semantic segmentation; Refined Convolution Neural Network (R-CNN)

Full Text:

PDF

References


1. Qiang B, Zhang S, Zhan Y, et al. Improved convolutional pose machines for human pose estimation using image sensor data. Sensors 2019; 19(3): 718. doi: 10.3390/s19030718

2. Cvar N, Trilar J, Kos A, et al. The use of iot technology in smart cities and smart villages: Similarities, differences, and future prospects. Sensors 2020; 20(14): 3897. doi: 10.3390/s20143897

3. Xia GS, Bai X, Ding J, et al. DOTA: A large-scale dataset for object detection in aerial images. In: Proceedings of the 2018 IEEE/CVF conference on computer vision and pattern recognition; 18–23 June 2018; Salt Lake City, UT, USA. pp. 3974–3983.

4. Wu X, Duan J, Zhong M, et al. VNF chain placement for large scale IoT of intelligent transportation. Sensors 2020; 20(14): 3819. doi: 10.3390/s20143819

5. Cai Z, Vasconcelos N. Cascade R-CNN: Delving into high quality object detection. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 18–23 June 2018; Salt Lake City, UT, USA.

6. He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015; 37(9): 1904–1916. doi: 10.1109/tpami.2015.2389824

7. Sobral A, Vacavant A. A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos. Computer Vision and Image Understanding 2014; 122: 4–21. doi: 10.1016/j.cviu.2013.12.005

8. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 27–30 June 2016. pp. 779–788.

9. Shi G, Suo J, Liu C, et al. Moving target detection algorithm in image sequences based on edge detection and frame difference. In: Proceedings of the 2017 IEEE 3rd Information Technology and Mechatronics Engineering Conference (ITOEC); 3–5 October 2017. pp. 740–744.

10. He S, Yang Q, Lau RW, et al. Visual tracking via locality sensitive histograms. In: Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition; 23–28 June 2013. pp. 2427–2434.

11. Huang L, He M, Tan C, et al. Retracted: Jointly network image processing: Multi‐task image semantic segmentation of indoor scene based on CNN. IET Image Processing 2020; 14(15): 3689–3697. doi: 10.1049/iet-ipr.2020.0088

12. He K, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proceedings of the 2017 IEEE international Conference on Computer Vision; 22–29 October 2017. pp. 2961–2969.

13. Huang W, Kang Y, Zheng S. An improved frame difference method for moving target detection. In: Proceedings of the Chinese Automation Congress (CAC); 2017. pp. 1537–1541.

14. Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. pp. 1440–1448.

15. Zheng C, Chen P, Pang J, et al. A mango picking vision algorithm on instance segmentation and key point detection from RGB images in an open orchard. Biosystems Engineering 2021; 206: 32–54. doi: 10.1016/j.biosystemseng.2021.03.012

16. Qiang B, Chen R, Zhou M, et al. Convolutional neural networks-based object detection algorithm by jointing semantic segmentation for images. Sensors 2020; 20(18): 5080. doi: 10.3390/s20185080

17. Guo Y, Liu Y, Georgiou T, et al. A review of semantic segmentation using deep neural networks. International Journal of Multimedia Information Retrieval 2017; 7(2): 87–93. doi: 10.1007/s13735-017-0141-z

18. Kasarla T, Nagendar G, Hegde GM, et al. Region-based active learning for efficient labeling in semantic segmentation. In: Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV); 2019. pp. 1109–1117.

19. Caesar H, Uijlings J, Ferrari V. Region-based semantic segmentation with end-to-end training. In: Proceedings of the Computer Vision—ECCV 2016: 14th European Conference; 11–14 October 2016; Amsterdam, The Netherlands. pp. 381–397.

20. Sun W, Wang R. Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM. IEEE Geoscience and Remote Sensing Letters 2018; 15(3): 474–478. doi: 10.1109/lgrs.2018.2795531

21. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 7–12 June 2015. pp. 3431–3440.

22. Hao S, Zhou Y, Guo Y. A brief survey on semantic segmentation with deep learning. Neurocomputing 2020; 406: 302–321. doi: 10.1016/j.neucom.2019.11.118

23. Pathak D, Krahenbuhl P, Darrell T. Constrained convolutional neural networks for weakly supervised segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); 07–13 December 2015. pp. 1796–1804.

24. Papandreou G, Chen LC, Murphy KP, Yuille AL. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV); 7–13 December 2015. pp. 1742–1750.

25. Alzahrani N, Al-Baity HH. Object recognition system for the visually impaired: A deep learning approach using Arabic annotation. Electronics 2023; 12(3): 541. doi: 10.3390/electronics12030541

26. Tamulionis M, Sledevič T, Abromavičius V, et al. Finding the least motion-blurred image by reusing early features of object detection network. Applied Sciences 2023; 13(3): 1264. doi: 10.3390/app13031264

27. Wu J, Shen T, Wang Q, et al. Local adaptive illumination-driven input-level fusion for infrared and visible object detection. Remote Sensing 2023; 15(3): 660. doi: 10.3390/rs15030660

28. Zhu Y, Xu R, An H, et al. Anti-noise 3D object detection of multimodal feature attention fusion based on PV-RCNN. Sensors 2022; 23(1): 233. doi: 10.3390/s23010233

29. Fang X, Jiang M, Zhu J, et al. M2RNet: Multi-modal and multi-scale refined network for RGB-D salient object detection. Pattern Recognition 2023; 135: 109139. doi: 10.1016/j.patcog.2022.109139

30. Dharmik RC, Chavhan S, Sathe SR. Deep learning based missing object detection and person identification: An application for smart CCTV. 3C Tecnología_Glosas de Innovación Aplicadas a la Pyme 2022; 11(2): 51–57. doi: 10.17993/3ctecno.2022.v11n2e42.51-57

31. Nguyen T, Hua BS, Le N. 3D-UCaps: 3D capsules unet for volumetric image segmentation. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2021: 24th International Conference; 27 September–1 October 2021; Strasbourg, France. pp. 548–558.

32. Liu Z, Shi P, Qi H, et al. D-S Augmentation: Density-semantics augmentation for 3-D object detection. IEEE Sensors Journal 2023; 23(3): 2760–2772. doi: 10.1109/jsen.2022.3231882

33. Mahayuddin ZR, Saif AFMS. Moving object detection using semantic convolutional features. Journal of Information System and Technology Management 2022; 7(29): 24–41. doi: 10.35631/jistm.729003

34. Xia T, Yang J, Chen L. Automated semantic segmentation of bridge point cloud based on local descriptor and machine learning. Automation in Construction 2022; 133: 103992. doi: 10.1016/j.autcon.2021.103992

35. Sun X, Chen C, Wang X, et al. Gaussian dynamic convolution for efficient single-image segmentation. IEEE Transactions on Circuits and Systems for Video Technology 2022; 32(5): 2937–2948. doi: 10.1109/tcsvt.2021.3096814

36. Rachmatullah MN, Nurmaini S, Sapitri AI, et al. Convolutional neural network for semantic segmentation of fetal echocardiography based on four-chamber view. Bulletin of Electrical Engineering and Informatics 2021; 10(4): 1987–1996. doi: 10.11591/eei.v10i4.3060

37. Shan J, Li X, Jia S, et al. Semantic segmentation based on deep convolution neural network. Journal of Physics: Conference Series 2018; 1069: 012169. doi: 10.1088/1742-6596/1069/1/012169

38. Zhang Y, Chu J, Leng L, et al. Mask-Refined R-CNN: A network for refining object details in instance segmentation. Sensors 2020; 20(4): 1010. doi: 10.3390/s20041010

39. Liu W, Anguelov D, Erhan D, et al. SSD: Single shot multibox detector. In: Proceedings of the Computer Vision—ECCV 2016: 14th European Conference; 11–14 October 2016; Amsterdam, the Netherlands. pp. 21–37.

40. Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 2015; 28.

41. Liu Y, Li J, Wang Y, et al. Refined segmentation R-CNN: A two-stage convolutional neural network for punctate white matter lesion segmentation in preterm infants. In: Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2019: 22nd International Conference; 13–17 October 2019; Shenzhen, China. pp. 193–201.

42. Bazgir O, Zhang R, Dhruba SR, et al. Representation of features as images with neighborhood dependencies for compatibility with convolutional neural networks. Nature Communications 2020; 11(1). doi: 10.1038/s41467-020-18197-y

43. Parmar Y, Natarajan S, Sobha G. DeepRange: Deep‐learning‐based object detection and ranging in autonomous driving. IET Intelligent Transport Systems 2019; 13(8): 1256–1264. doi: 10.1049/iet-its.2018.5144




DOI: https://doi.org/10.32629/jai.v7i3.944

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Jaswinder Singh, B. K. Sharma

License URL: https://creativecommons.org/licenses/by-nc/4.0/