MSDE: Multi-scale disparity estimation model from stereo images
Abstract
Most modern stereo matching algorithms predict an accurate disparity map but demand high memory and processing requirements as well as a huge number of floating-point operations. Consequently, their applicability is constrained to high-powered devices with substantial capacities, posing challenges for implementations on low-power devices. To address this problem, we propose MSDE, an efficient end-to-end neural network model designed to strike a balance between estimation accuracy and resource utilization. MSDE is based on hierarchical disparity estimation along with the computation of low-dimensional residual and error cost volumes. To reduce the operations, 3D convolutional layers are factorized into 2D and 1D convolutional layers to improve the efficiency of filtering and the aggregation cost volume features. As a result, the entire model of our MSDE has 48 K parameters, requires 2.5 G floating-point operations (FLOPs), and runs with comparatively small memory footprint of 730 M with an execution time of 29.5 ms for each frame on the RTX 2080Ti GPU. Compared to state-of-the-art methods, our model is more efficient, offers a trade-off between accuracy and efficiency, and it needs low hardware resources.
Keywords
Full Text:
PDFReferences
1. O’Riordan A, Newe T, Dooly G, et al. Stereo Vision Sensing: Review of existing systems. International Conference on Sensing Technology (ICST), 2018. doi: 10.1109/icsenst.2018.8603605
2. Mendoza Guzmán VM, Mejía Muñoz JM, Moreno Márquez NE, et al. Disparity map estimation with deep learning in stereo vision. CEUR Workshop Proc. 2018, 2304: 27–40.
3. Hamid MS, Manap NA, Hamzah RA, et al. Stereo matching algorithm based on deep learning: A survey. Journal of King Saud University—Computer and Information Sciences. 2022. doi: 10.1016/j.jksuci.2020.08.011
4. Zhou K, Meng X, Cheng B. Review of Stereo Matching Algorithms Based on Deep Learning. Computational Intelligence and Neuroscience, 2020. doi: 10.1155/2020/8562323
5. Luo W, Schwing AG, Urtasun R. Efficient Deep Learning for Stereo Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/cvpr.2016.614
6. Zbontar J, LeCun Y. Computing the stereo matching cost with a convolutional neural network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. doi: 10.1109/cvpr.2015.7298767
7. Hirschmuller H. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2008. doi: 10.1109/tpami.2007.1166
8. Chang JR, Chen YS. Pyramid Stereo Matching Network. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/cvpr.2018.00567
9. Khamis S, Fanello S, Rhemann C, et al. StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. Lecture Notes in Computer Science, 2018. doi: 10.1007/978-3-030-01267-0_35.
10. Cheng X, Zhong Y, Harandi M, et al. Hierarchical Neural Architecture Search for Deep Stereo Matching. arXiv. 2020, arXiv:2010.13501.
11. Duggal S, Wang S, Ma WC, et al. DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch. IEEE/CVF International Conference on Computer Vision (ICCV), 2019. doi: 10.1109/iccv.2019.00448
12. Knobelreiter P, Reinbacher C, Shekhovtsov A, et al. End-to-End Training of Hybrid CNN-CRF Models for Stereo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi: 10.1109/cvpr.2017.159.
13. Seki A, Pollefeys M. SGM-Nets: Semi-Global Matching with Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi: 10.1109/cvpr.2017.703
14. Liang Z, Feng Y, Guo Y, et al. Learning for Disparity Estimation Through Feature Constancy. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/cvpr.2018.00297
15. Zhong Y, Dai Y, Li H. Self-Supervised Learning for Stereo Matching with Self-Improving Ability. ArXiv. 2017.
16. Mayer N, Ilg E, Hausser P, et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/cvpr.2016.438
17. Pang J, Sun W, Ren JSJ, et al. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching. IEEE International Conference on Computer Vision Workshops (ICCVW), 2017. Published online October 2017. doi: 10.1109/iccvw.2017.108
18. Song X, Zhao X, Hu H, et al. EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching. Lecture Notes in Computer Science, 2019. doi: 10.1007/978-3-030-20873-8_2
19. Dovesi PL, Poggi M, Andraghetti L, et al. Real-Time Semantic Stereo Matching. IEEE International Conference on Robotics and Automation (ICRA), 2020. doi: 10.1109/icra40945.2020.9196784
20. Kendall A, Martirosyan H, Dasgupta S, et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression. IEEE International Conference on Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.17
21. Lu C, Uchiyama H, Thomas D, et al. Sparse Cost Volume for Efficient Stereo Matching. Remote Sensing, 2018. doi: 10.3390/rs10111844
22. Tulyakov S, Ivanov A, Fleuret F. Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. Adv Neural Inf Process Syst, 2018, 5871–5881.
23. Zhang F, Prisacariu V, Yang R, et al. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/cvpr.2019.00027
24. Guo X, Yang K, Yang W, et al. Group-Wise Correlation Stereo Network. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/cvpr.2019.00339
25. Jie Z, Wang P, Ling Y, et al. Left-Right Comparative Recurrent Model for Stereo Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi: 10.1109/cvpr.2018.00404
26. Shi X, Chen Z, Wang H. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv. 2015, arXiv:1506.04214v2.
27. Wang Y, Lai Z, Huang G, et al. Anytime Stereo Image Depth Estimation on Mobile Devices. International Conference on Robotics and Automation (ICRA), 2019. doi: 10.1109/icra.2019.8794003
28. Huang Z, Norris TB, Wang P. ES-Net: An Efficient Stereo Matching Network. Available online: http://arxiv.org/abs/2103.03922 (accessed on 13 October 2021).
29. He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/cvpr.2016.90
30. Qiu Z, Yao T, Mei T. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. IEEE International Conference on Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.590
31. Tran D, Wang H, Torresani L, et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/cvpr.2018.00675
32. Gonda F, Wei D, Parag T, Pfister H. Parallel separable 3d convolution for video and volumetric data understanding. ArXiv. 2018.
33. Li X, Lai T, Wang S, et al. Weighted Feature Pyramid Networks for Object Detection. IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), 2019. doi: 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00217
34. Gao Q, Zhou Y, Li G, et al. Compact StereoNet: Stereo Disparity Estimation via Knowledge Distillation and Compact Feature Extractor. IEEE Access, 2020. doi: 10.1109/access.2020.3029832
35. Kang J, Chen L, Deng F, et al. Improving disparity estimation based on residual cost volume and reconstruction error volume. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020. doi: 10.5194/isprs-archives-xliii-b2-2020-135-2020
36. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 32nd International Conference on Machine Learning. ICML, 2015.
37. Maas AL, Hannun AY, Ng AY. Rectifier Nonlinearities Improve Neural Network Acoustic Models, 2013.
38. Barron JT. A General and Adaptive Robust Loss Function. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/cvpr.2019.00446
39. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. doi: 10.1109/cvpr.2012.6248074
40. Menze M, Heipke C, Geiger A. Joint 3D estimation of vehicles and scene flow. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015. doi: 10.5194/isprsannals-ii-3-w5-427-2015
41. Liang Z, Guo Y, Feng Y, et al. Stereo Matching Using Multi-Level Cost Volume and Multi-Scale Feature Constancy. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021. doi: 10.1109/tpami.2019.2928550
42. Xu B, Xu Y, Yang X, et al. Bilateral Grid Learning for Stereo Matching Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. doi: 10.1109/cvpr46437.2021.01231
43. Xu H, Zhang J. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. doi: 10.1109/cvpr42600.2020.00203
44. Wang Q, Shi S, Zheng S, et al. FADNet: A Fast and Accurate Network for Disparity Estimation. IEEE International Conference on Robotics and Automation (ICRA), 2020. doi: 10.1109/icra40945.2020.9197031
45. Zhang J, Skinner KA, Vasudevan R, et al. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation From Stereo Imagery. IEEE Robotics and Automation Letters, 2019. doi: 10.1109/lra.2019.2894913
46. Paszke A, Gross S, Chintala S, et al. Automatic differentiation in pytorch. 2017.
47. Lyon RF. Neural Networks for Machine Learning. Human and Machine Hearing. 2017. doi: 10.1017/9781139051699.031
DOI: https://doi.org/10.32629/jai.v7i5.813
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Ahmed Alghoul, Mhd Rashed Al Koutayni, Ramy Battrawy, Didier Stricker, Wesam Ashour
License URL: https://creativecommons.org/licenses/by-nc/4.0/