banner

HardMix: Considering Difficult Examples in Mixed Sample Data Augmentation

A. F. M. Shahab Uddin, Md Delowar Hosen, Md. Nasim Adnan, Syed Md Galib, Md. Alam Hossain, Sung-Ho Bae

Abstract


Mixed sample data augmentation (MSDA) techniques enhance the generalization ability of deep learning models where the training samples and their labels are mixed to generate new samples. Those mixed (augmented) samples increase data diversity and combined with mixed labels, offer better localization and generalization ability of the model. The performance of MSDA highly depends on the selection of source patch to be mixed. Consequently, several methods, from random to careful selection of source patch using prior knowledge have been studied, to propose better augmentation strategy. We argue that besides the careful selection of the source patch, selecting the source sample from where the source patch will be cut, also plays an important role. Based on that, we propose HardMix that selects the source patch from hard samples (which are frequently being miss-classified by a model) to let the model better learn the feature of hard samples. We conduct comprehensive experiments on image classification task on several benchmark datasets using various state-of-the-art architectures to verify the effectiveness of the proposed method. HardMix achieves the best known top-1 error of 3.62%, and 3.54% for ResNet-18 and ResNet-50 architectures on CIFAR-10 classification dataset, respectively. Also, it achieves the best known top-1 error of 19.33%, 18.31%, and 16.21% for ResNet-18, ResNet-50, and WideResNet architectures on CIFAR-100 classification dataset, respectively. Moreover, the proposed HardMix data augmentation strategy outperforms state-of-the-art methods with a best known top-1 error of 21.20% and 20.01% on ImageNet validation dataset when applied using ResNet-50 and ResNet-101 architectures, respectively.


Keywords


HardMix; data augmentation; hard sample based data augmentation; generalization; mixed sample data augmentation; MSDA

Full Text:

PDF

References


1. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998; 86(11), 2278–2324. doi: 10.1109/5. 726791

2. Krizhevsky A, Sutskever I, Hinton G. Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems (NeurIPS). 2012, pp. 1097–1105. doi: 10.1145/ 3065386

3. Lu D, Weng Q. A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing. 2007; 28(5): 823-870. doi: 10.1080/01431160600746456

4. Shaoqing R, Kaiming H, Ross G, Jian S. Faster r-CNN: Towards real-time object detection with region proposal networks. In: Neural Information Processing Systems (NeurIPS). 2015.

5. Zou Z, Chen K, Shi Z, et al. Object Detection in 20 Years: A Survey. Proceedings of the IEEE. 2023; 111(3): 257-276. doi: 10.1109/jproc.2023.3238524

6. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015. pp. 3431–3440.

7. Yu C, Wang J, Peng C, et al. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018. pp. 325–341.

8. Guo Y, Liu Y, Georgiou T, et al. A review of semantic segmentation using deep neural networks. International Journal of Multimedia Information Retrieval. 2018; 7(2): 87-93. doi: 10.1007/s13735-017-0141-z

9. Nitish S, Geoffrey H, Alex K, et al. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research. 2014; 15:1929–1958

10. Tompson J, Goroshin R, Jain A, et al. Efficient object localization using convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015; pp. 648–656.

11. Devries T, Taylor GW. Improved regularization of convolutional neural networks with cutout. ArXiv abs/1708.04552. 2047.

12. Zhang H, Ciss´e M, Dauphin YN, LopezPaz D. Mixup: Beyond empirical risk minimization. arXiv preprint. 2017.

13. Yun S, Han D, Chun S, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features. In: International Conference on Computer Vision (ICCV). 2019.

14. Shahab AFM, Mst Sirazam U, Wheemyung M, et al. Saliencymix: A saliency guided data augmentation strategy for better regularization. arXiv preprint arXiv:2006.01791. 2020.

15. Kim JH, Choo W, Song HO. Puzzle Mix: Exploiting Saliency and Local Statistics for Optimal Mixup. 2020.

16. Muhammad A, Zhou F, Xie C, et al. Mixacm: Mixup-based robustness transfer via distillation of activated channel maps. Advances in Neural Information Processing Systems. 2021; 34: 4555–4569.

17. Qin J, Fang J, Zhang Q, et al. Resizemix: Mixing data with preserved object information and true labels. arXiv preprint arXiv:2012.11101. 2020.

18. Wang D, Zhang Y, Zhang K, et al. Focalmix: Semi-supervised learning for 3d medical image detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. pp. 3951–3960.

19. Qiao P, Li H, Song G, et al. Semi-Supervised CT Lesion Segmentation Using Uncertainty-Based Data Pairing and SwapMix. IEEE Transactions on Medical Imaging. 2023; 42(5): 1546-1562. doi: 10.1109/tmi.2022.3232572

20. Shrivastava A, Gupta A, Girshick R. Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 761–769.

21. Tang W, Huang S, Zhang X, et al. Multiple instance learning framework with masked hard instance mining for whole slide image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. pp. 4078–4087.

22. Wang Y, Peng T, Duan J, et al. Pathological Image Classification Based on Hard Example Guided CNN. IEEE Access. 2020; 8: 114249-114258. doi: 10.1109/access.2020.3003070

23. Wu T, Ding X, Zhang H, Gao J, et al. Discrimloss: a universal loss for hard samples and incorrect samples discrimination. IEEE Transactions on Multimedia. 2023.

24. Yang C, Hou B, Chanussot J, et al. N-Cluster Loss and Hard Sample Generative Deep Metric Learning for PolSAR Image Classification. IEEE Transactions on Geoscience and Remote Sensing. 2022; 60: 1-16. doi: 10.1109/tgrs.2021.3099840

25. Zhu C, Chen W, Peng T, et al. Hard Sample Aware Noise Robust Learning for Histopathology Image Classification. IEEE Transactions on Medical Imaging. 2022; 41(4): 881-894. doi: 10.1109/tmi.2021.3125459

26. Cubuk ED, Zoph B, Man´e D, et al. Autoaugment: Learning augmentation strategies from data. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019; pp. 113– 123. doi: 10.1109/CVPR. 2019.00020

27. Lim S, Kim I, Kim T, et al. Fast autoaugment. Advances in Neural Information Processing Systems. 2019; 32.

28. Liu Z, Li S, Wu D, et al. Automix: Unveiling the power of mixup for stronger classifiers. In: European Conference on Computer Vision. 2022. pp. 441–458.

29. Dabouei A, Soleymani S, Taherkhani F, et al. Supermix: Supervising the mixing data augmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021. pp. 13794–13803.

30. Hu C, Zhou R. Synthetic voice spoofing detection based on online hard example mining. arXiv preprint arXiv:2209.11585. 2022.

31. Feng ZH, Kittler J, Wu XJ. Mining Hard Augmented Samples for Robust Facial Landmark Localization with CNNs. IEEE Signal Processing Letters. 2019; 26(3): 450-454. doi: 10.1109/lsp.2019.2895291

32. Wang Y, Lu H, Qin X, et al. Residual Gabor convolutional network and FV-Mix exponential level data augmentation strategy for finger vein recognition. Expert Systems with Applications. 2023; 223: 119874. doi: 10.1016/j.eswa.2023.119874

33. Wang M, Zhu Y, Li G, et al. Image anomaly detection with semanticenhanced augmentation and distributional kernel. In: 2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). 2022. pp. 163–170.

34. Krogh A. Hertz J. A simple weight decay can improve generalization. Advances in Neural Information Processing Systems. 1991; 4.

35. Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning. 2015. pp. 448–456.

36. Huang G, Sun Y, Liu Z, et al. Deep networks with stochastic depth. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam; October 11–14, 2016. The Netherlands. pp. 646–661.

37. Yamada Y, Iwamura M, Akiba T, et al. Shakedrop Regularization for Deep Residual Learning. IEEE Access. 2019; 7: 186126-186136. doi: 10.1109/access.2019.2960566

38. Hu J, Shen L, Sun G. Squeeze-andexcitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. pp. 7132–7141.

39. Hu J, Shen L, Albanie S, et al. Gather-excite: Exploiting feature context in convolutional neural networks. Advances in Neural Information Processing Systems. 2018. 31.

40. Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 2818–2826.

41. Katharopoulos A, Fleuret F. Not all samples are created equal: Deep learning with importance sampling. In: International Conference on Machine Learning. 2018. pp. 2525–2534.

42. Chang HS, Learned-Miller E, McCallum A. Active bias: Training more accurate neural networks by emphasizing high variance samples. Advances in Neural Information Processing Systems. 2017. 30.

43. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. pp. 770–778. doi: 10.1109/CVPR. 2016.90

44. Zagoruyko S, Komodakis N. Wide residual networks. Procedings of the British Machine Vision Conference. 2016.

45. Krizhevsky A. Learning multiple layers of features from tiny images. University of Toronto. 2012.

46. Olga R, Jia D, Hao S, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision. 2015; 115(3): 211–252.

47. Imambi S, Prakash KB, Kanagachidambaresan G. Pytorch. Programming with TensorFlow: Solution for Edge Computing Applications. 2021; 87–104.




DOI: https://doi.org/10.32629/jai.v7i5.1518

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 A. F. M. Shahab Uddin, Md Delowar Hosen, Md. Nasim Adnan, Syed Md Galib, Md. Alam Hossain, Sung-Ho Bae

License URL: https://creativecommons.org/licenses/by-nc/4.0/