banner

MSDE: Multi-scale disparity estimation model from stereo images

Ahmed Alghoul, Mhd Rashed Al Koutayni, Ramy Battrawy, Didier Stricker, Wesam Ashour

Abstract


Most modern stereo matching algorithms predict an accurate disparity map but demand high memory and processing requirements as well as a huge number of floating-point operations. Consequently, their applicability is constrained to high-powered devices with substantial capacities, posing challenges for implementations on low-power devices. To address this problem, we propose MSDE, an efficient end-to-end neural network model designed to strike a balance between estimation accuracy and resource utilization. MSDE is based on hierarchical disparity estimation along with the computation of low-dimensional residual and error cost volumes. To reduce the operations, 3D convolutional layers are factorized into 2D and 1D convolutional layers to improve the efficiency of filtering and the aggregation cost volume features. As a result, the entire model of our MSDE has 48 K parameters, requires 2.5 G floating-point operations (FLOPs), and runs with comparatively small memory footprint of 730 M with an execution time of 29.5 ms for each frame on the RTX 2080Ti GPU. Compared to state-of-the-art methods, our model is more efficient, offers a trade-off between accuracy and efficiency, and it needs low hardware resources.


Keywords


disparity; factorization; computer vision; disparity estimation; stereo matching; 3D convolutions

Full Text:

PDF

References


1. O’Riordan A, Newe T, Dooly G, et al. Stereo Vision Sensing: Review of existing systems. International Conference on Sensing Technology (ICST), 2018. doi: 10.1109/icsenst.2018.8603605

2. Mendoza Guzmán VM, Mejía Muñoz JM, Moreno Márquez NE, et al. Disparity map estimation with deep learning in stereo vision. CEUR Workshop Proc. 2018, 2304: 27–40.

3. Hamid MS, Manap NA, Hamzah RA, et al. Stereo matching algorithm based on deep learning: A survey. Journal of King Saud University—Computer and Information Sciences. 2022. doi: 10.1016/j.jksuci.2020.08.011

4. Zhou K, Meng X, Cheng B. Review of Stereo Matching Algorithms Based on Deep Learning. Computational Intelligence and Neuroscience, 2020. doi: 10.1155/2020/8562323

5. Luo W, Schwing AG, Urtasun R. Efficient Deep Learning for Stereo Matching. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/cvpr.2016.614

6. Zbontar J, LeCun Y. Computing the stereo matching cost with a convolutional neural network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015. doi: 10.1109/cvpr.2015.7298767

7. Hirschmuller H. Stereo Processing by Semiglobal Matching and Mutual Information. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2008. doi: 10.1109/tpami.2007.1166

8. Chang JR, Chen YS. Pyramid Stereo Matching Network. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/cvpr.2018.00567

9. Khamis S, Fanello S, Rhemann C, et al. StereoNet: Guided Hierarchical Refinement for Real-Time Edge-Aware Depth Prediction. Lecture Notes in Computer Science, 2018. doi: 10.1007/978-3-030-01267-0_35.

10. Cheng X, Zhong Y, Harandi M, et al. Hierarchical Neural Architecture Search for Deep Stereo Matching. arXiv. 2020, arXiv:2010.13501.

11. Duggal S, Wang S, Ma WC, et al. DeepPruner: Learning Efficient Stereo Matching via Differentiable PatchMatch. IEEE/CVF International Conference on Computer Vision (ICCV), 2019. doi: 10.1109/iccv.2019.00448

12. Knobelreiter P, Reinbacher C, Shekhovtsov A, et al. End-to-End Training of Hybrid CNN-CRF Models for Stereo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi: 10.1109/cvpr.2017.159.

13. Seki A, Pollefeys M. SGM-Nets: Semi-Global Matching with Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi: 10.1109/cvpr.2017.703

14. Liang Z, Feng Y, Guo Y, et al. Learning for Disparity Estimation Through Feature Constancy. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/cvpr.2018.00297

15. Zhong Y, Dai Y, Li H. Self-Supervised Learning for Stereo Matching with Self-Improving Ability. ArXiv. 2017.

16. Mayer N, Ilg E, Hausser P, et al. A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/cvpr.2016.438

17. Pang J, Sun W, Ren JSJ, et al. Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching. IEEE International Conference on Computer Vision Workshops (ICCVW), 2017. Published online October 2017. doi: 10.1109/iccvw.2017.108

18. Song X, Zhao X, Hu H, et al. EdgeStereo: A Context Integrated Residual Pyramid Network for Stereo Matching. Lecture Notes in Computer Science, 2019. doi: 10.1007/978-3-030-20873-8_2

19. Dovesi PL, Poggi M, Andraghetti L, et al. Real-Time Semantic Stereo Matching. IEEE International Conference on Robotics and Automation (ICRA), 2020. doi: 10.1109/icra40945.2020.9196784

20. Kendall A, Martirosyan H, Dasgupta S, et al. End-to-End Learning of Geometry and Context for Deep Stereo Regression. IEEE International Conference on Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.17

21. Lu C, Uchiyama H, Thomas D, et al. Sparse Cost Volume for Efficient Stereo Matching. Remote Sensing, 2018. doi: 10.3390/rs10111844

22. Tulyakov S, Ivanov A, Fleuret F. Practical deep stereo (PDS): Toward applications-friendly deep stereo matching. Adv Neural Inf Process Syst, 2018, 5871–5881.

23. Zhang F, Prisacariu V, Yang R, et al. GA-Net: Guided Aggregation Net for End-To-End Stereo Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/cvpr.2019.00027

24. Guo X, Yang K, Yang W, et al. Group-Wise Correlation Stereo Network. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/cvpr.2019.00339

25. Jie Z, Wang P, Ling Y, et al. Left-Right Comparative Recurrent Model for Stereo Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi: 10.1109/cvpr.2018.00404

26. Shi X, Chen Z, Wang H. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv. 2015, arXiv:1506.04214v2.

27. Wang Y, Lai Z, Huang G, et al. Anytime Stereo Image Depth Estimation on Mobile Devices. International Conference on Robotics and Automation (ICRA), 2019. doi: 10.1109/icra.2019.8794003

28. Huang Z, Norris TB, Wang P. ES-Net: An Efficient Stereo Matching Network. Available online: http://arxiv.org/abs/2103.03922 (accessed on 13 October 2021).

29. He K, Zhang X, Ren S, et al. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/cvpr.2016.90

30. Qiu Z, Yao T, Mei T. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. IEEE International Conference on Computer Vision (ICCV), 2017. doi: 10.1109/iccv.2017.590

31. Tran D, Wang H, Torresani L, et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/cvpr.2018.00675

32. Gonda F, Wei D, Parag T, Pfister H. Parallel separable 3d convolution for video and volumetric data understanding. ArXiv. 2018.

33. Li X, Lai T, Wang S, et al. Weighted Feature Pyramid Networks for Object Detection. IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), 2019. doi: 10.1109/ispa-bdcloud-sustaincom-socialcom48970.2019.00217

34. Gao Q, Zhou Y, Li G, et al. Compact StereoNet: Stereo Disparity Estimation via Knowledge Distillation and Compact Feature Extractor. IEEE Access, 2020. doi: 10.1109/access.2020.3029832

35. Kang J, Chen L, Deng F, et al. Improving disparity estimation based on residual cost volume and reconstruction error volume. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2020. doi: 10.5194/isprs-archives-xliii-b2-2020-135-2020

36. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. 32nd International Conference on Machine Learning. ICML, 2015.

37. Maas AL, Hannun AY, Ng AY. Rectifier Nonlinearities Improve Neural Network Acoustic Models, 2013.

38. Barron JT. A General and Adaptive Robust Loss Function. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/cvpr.2019.00446

39. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. doi: 10.1109/cvpr.2012.6248074

40. Menze M, Heipke C, Geiger A. Joint 3D estimation of vehicles and scene flow. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015. doi: 10.5194/isprsannals-ii-3-w5-427-2015

41. Liang Z, Guo Y, Feng Y, et al. Stereo Matching Using Multi-Level Cost Volume and Multi-Scale Feature Constancy. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021. doi: 10.1109/tpami.2019.2928550

42. Xu B, Xu Y, Yang X, et al. Bilateral Grid Learning for Stereo Matching Networks. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. doi: 10.1109/cvpr46437.2021.01231

43. Xu H, Zhang J. AANet: Adaptive Aggregation Network for Efficient Stereo Matching. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. doi: 10.1109/cvpr42600.2020.00203

44. Wang Q, Shi S, Zheng S, et al. FADNet: A Fast and Accurate Network for Disparity Estimation. IEEE International Conference on Robotics and Automation (ICRA), 2020. doi: 10.1109/icra40945.2020.9197031

45. Zhang J, Skinner KA, Vasudevan R, et al. DispSegNet: Leveraging Semantics for End-to-End Learning of Disparity Estimation From Stereo Imagery. IEEE Robotics and Automation Letters, 2019. doi: 10.1109/lra.2019.2894913

46. Paszke A, Gross S, Chintala S, et al. Automatic differentiation in pytorch. 2017.

47. Lyon RF. Neural Networks for Machine Learning. Human and Machine Hearing. 2017. doi: 10.1017/9781139051699.031




DOI: https://doi.org/10.32629/jai.v7i5.813

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 Ahmed Alghoul, Mhd Rashed Al Koutayni, Ramy Battrawy, Didier Stricker, Wesam Ashour

License URL: https://creativecommons.org/licenses/by-nc/4.0/