banner

Stream learning under concept and feature drift: A literature survey

Abubaker Jumaah Rabash, Mohd Zakree Ahmad Nazri, Azrulhizam Shapii, Abdulmajeed Al-Jumaily

Abstract


Stream data learning is an emerging machine learning topic, and it has many challenges. One of its challenges is the dynamic behavior or changes in the environment which leads to drifts. Two types of drift occur, namely, concept drift and feature drift. This article provides a survey on stream data learning with focusing on the issues of feature drift and the methods developed for handling it. After presenting the fundamental concepts and definition in this field, it provides an overview of the various models and methods developed for detecting feature drift and maintaining the validity of the machine learning models when the drift occurs. Furthermore, the article provides the generators used for creating dataset with feature drift to provide benchmarking for approaches of detecting or handling feature drift. The article provides also taxonomy of feature selection methods in both static and dynamic environment. It concludes that reinforcement-based models are promising for this task, and it lists various open challenges and future works in this area.


Keywords


stream learning; concept drift; data stream; feature drift detection

Full Text:

PDF

References


1. Lu J, Liu A, Dong F, et al. Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering 2019; 31(12): 2346–2363. doi: 10.1109/TKDE.2018.2876857

2. Iwashita AS, Papa JP. An overview on concept drift learning. IEEE Access 2019; 7: 1532–1547. doi: 10.1109/ACCESS.2018.2886026

3. Gama J, Žliobaitė I, Bifet A, et al. A survey on concept drift adaptation. Acm Computing Surveys 2014; 46(4): 1–37. doi: 10.1145/2523813

4. Yang Z, Al-Dahidi S, Baraldi P, et al. A novel concept drift detection method for incremental learning in nonstationary environments. IEEE Transactions on Neural Networks and Learning Systems 2020; 31(1): 309–320. doi: 10.1109/TNNLS.2019.2900956

5. Kunlin Y. A memory-enhanced framework for financial fraud detection. In: Proceedings of 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA); 17–20 December 2018; Orlando, FL, USA. pp. 871–874.

6. Somasundaram A, Reddy S. Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Computing Applications 2019; 31(1): 3–14. doi: 10.1007/s00521-018-3633-8

7. Brzezinski D, Minku LL, Pewinski T, et al. The impact of data difficulty factors on classification of imbalanced and concept drifting data streams. Knowledge Information Systems 2021; 63(6): 1429–1469. doi: 10.1007/s10115-021-01560-w

8. Halstead B, Koh YS, Riddle P, et al. Recurring concept memory management in data streams: Exploiting data stream concept evolution to improve performance and transparency. Data Mining Knowledge Discovery 2021; 35(3): 796–836. doi: 10.1007/s10618-021-00736-w

9. Al-Khaleefa AS, Ahmad MR, Isa AAM, et al. Infinite-term memory classifier for Wi-Fi localization based on dynamic Wi-Fi simulator. IEEE Access 2018; 6: 54769–54785. doi: 10.1109/ACCESS.2018.2870754

10. Zhao D, Koh YS. Feature drift detection in evolving data streams. In: Hartmann S, Küng J, Kotsis G, et al. (editors). Database and Expert Systems Applications, Proceedings of 31st International Conference, DEXA 2020; 14–17 September 2020; Bratislava, Slovakia. Springer Cham; 2020. pp. 335–349.

11. Khamassi I, Sayed-Mouchaweh M, Hammami M, Ghédira K. Discussion and review on evolving data streams and concept drift adapting. Evolving Systems 2018; 9(1): 1–23. doi: 10.1007/s12530-016-9168-2

12. de Barros RSM, de Carvalho Santos SGT. An overview and comprehensive comparison of ensembles for concept drift. Information Fusion 2019; 52: 213–244. doi: 10.1016/j.inffus.2019.03.006

13. Al-Jarrah OY, Maple C, Dianati M, et al. Intrusion detection systems for intra-vehicle networks: A review. IEEE Access 2019; 7: 21266–21289. doi: 10.1109/ACCESS.2019.2894183

14. Žliobaitė I. Learning under concept drift: An overview. arXiv 2010; arXiv:1010.4784. doi: 10.48550/arXiv.1010.4784

15. Schlimmer JC, Granger RH. Incremental learning from noisy data. Machine Learning 1986; 1(3): 317–354. doi: 10.1007/BF00116895

16. Al-Khaleefa AS, Ahmad MR, Esa AAM, et al. Knowledge preserving OSELM model for Wi-Fi-based indoor localization. Sensors 2019; 19(10): 2397. doi: 10.3390/s19102397

17. Halstead B, Koh YS, Riddle P, et al. Analyzing and repairing concept drift adaptation in data stream classification. Machine Learning 2022; 111: 3489–3523. doi: 10.1007/s10994-021-05993-w

18. Al-Jumaily A, Sali A, Jiménez VPG, et al. Evaluation of 5G coexistence and interference signals in the C-band satellite earth station. IEEE Transactions on Vehicular Technology 2022; 71(6): 6189–6200. doi: 10.1109/TVT.2022.3158344

19. D’hooge L, Wauters T, Volckaert B, De Turck F. Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. Journal of Information Security Applications 2020; 54: 102564. doi: 10.1016/j.jisa.2020.102564

20. Fan W, Liu K, Liu H, et al. Autofs: Automated feature selection via diversity-aware interactive reinforcement learning. arXiv 2020; arXiv:2008.12001. doi: 10.48550/arXiv.2008.12001

21. Velayutham C, Thangavel K. Unsupervised quick reduct algorithm using rough set theory. Journal of electronic science technology 2011; 9(3): 193–201.

22. Xu R, Li M, Yang Z, et al. Dynamic feature selection algorithm based on Q-learning mechanism. Applied Intelligence 2021; 51: 7233–7244. doi: 10.1007/s10489-021-02257-x

23. Prasad M, Tripathi S, Dahal K. An efficient feature selection based bayesian and rough set approach for intrusion detection. Applied Soft Computing 2020; 87: 105980. doi: 10.1016/j.asoc.2019.105980

24. Barddal JP, Gomes HM, Enembreck F. Analyzing the impact of feature drifts in streaming learning. In: Arik S, Huang T, Lai W, et al. (editors). Neural Information Processing International, Proceedings of 22nd International Conference, ICONIP 2015; 9–12 November 2015; Istanbul, Turkey. Springer Cham; 2015. pp. 21–28.

25. Barddal JP, Gomes HM, de Souza Britto A, Enembreck F. A benchmark of classifiers on feature drifting data streams. In: Proceedings of 2016 23rd International Conference on Pattern Recognition (ICPR); 4–8 December 2016; Cancun, Mexico. pp. 2180–2185.

26. Thaseen IS, Kumar CA. Intrusion detection model using fusion of chi-square feature selection and multi class SVM. Journal of King Saud University-Computer Information Sciences 2017; 29(4): 462–472. doi: 10.1016/j.jksuci.2015.12.004

27. Di Mauro M, Galatro G, Fortino G, Liotta A. Supervised feature selection techniques in network intrusion detection: A critical review. Engineering Applications of Artificial Intelligence 2021; 101: 104216. doi: 10.1016/j.engappai.2021.104216

28. Selvakumar B, Muneeswaran K. Firefly algorithm based feature selection for network intrusion detection. Computers Security 2019; 81: 148–155. doi: 10.1016/j.cose.2018.11.005

29. SaiSindhuTheja R, Shyam GK. An efficient metaheuristic algorithm based feature selection and recurrent neural network for DoS attack detection in cloud computing environment. Applied Soft Computing 2021; 100: 106997. doi: 10.1016/j.asoc.2020.106997

30. Sarvari S, Sani NFM, Hanapi ZM, Abdullah MT. An efficient anomaly intrusion detection method with feature selection and evolutionary neural network. IEEE Access 2020; 8: 70651–70663. doi: 10.1109/ACCESS.2020.2986217

31. Raman MRG, Somu N, Kirthivasan K, et al. An efficient intrusion detection system based on hypergraph-genetic algorithm for parameter optimization and feature selection in support vector machine. Knowledge-Based Systems 2017; 134: 1–12. doi: 10.1016/j.knosys.2017.07.005

32. Nguyen BH, Xue B, Andreae P, et al. Multiple reference points-based decomposition for multiobjective feature selection in classification: Static and dynamic mechanisms. IEEE Transactions on Evolutionary Computation 2020; 24(1): 170–184. doi: 10.1109/TEVC.2019.2913831

33. Maza S, Touahria M. Feature selection for intrusion detection using new multi-objective estimation of distribution algorithms. Applied Intelligence 2019; 49(2): 4237–4257. doi: 10.1007/s10489-019-01503-7

34. Hu X, Zhou P, Li P, et al. A survey on online feature selection with streaming features. Frontiers of Computer Science 2018; 12(3): 479–493. doi: 10.1007/s11704-016-5489-3

35. Fahy C, Yang S. Dynamic feature selection for clustering high dimensional data streams. IEEE Access 2019; 7: 127128–127140. doi: 10.1109/ACCESS.2019.2932308

36. Li Y, Cheng Y. Streaming feature selection for multi-label data with dynamic sliding windows and feature repulsion loss. Entropy 2019; 21(12): 1151. doi: 10.3390/e21121151

37. Zhou P, Hu X, Li P, Wu X. Online feature selection for high-dimensional class-imbalanced data. Knowledge-Based Systems 2017; 136: 187–199. doi: 10.1016/j.knosys.2017.09.006

38. You D, Wu X, Shen L, et al. Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 2019; 7: 16088–16100. doi: 10.1109/ACCESS.2019.2894121

39. You D, Wu X, Shen L, et al. Online streaming feature selection via conditional independence. Applied Sciences 2018; 8(12): 2548. doi: 10.3390/app8122548

40. Ni P, Zhao S, Wang X, et al. Incremental feature selection based on fuzzy rough sets. Information Sciences 2020; 536: 185–204. doi: 10.1016/j.ins.2020.04.038

41. Liyanage YW, Zois DS, Chelmis C. On-the-fly joint feature selection and classification. Available online: https://arxiv.org/abs/2004.10245 (accessed on 14 July 2023).

42. Wei G, Zhao J, Feng Y, et al. A novel hybrid feature selection method based on dynamic feature importance. Applied Soft Computing 2020; 93: 106337. doi: 10.1016/j.asoc.2020.106337

43. Sahmoud S, Topcuoglu HR. A general framework based on dynamic multi-objective evolutionary algorithms for handling feature drifts on data streams. Future Generation Computer Systems 2020; 102: 42–52. doi: 10.1016/j.future.2019.07.069

44. Wang Z, Wang T, Wan B, Han M. Partial classifier chains with feature selection by exploiting label correlation in Multi-label classification. Entropy 2020; 22(10): 1143. doi: 10.3390/e22101143

45. Lei C, Zhu X. Unsupervised feature selection via local structure learning and sparse learning. Multimedia Tools Applications 2018; 77(22): 29605–29622. doi: 10.1007/s11042-017-5381-7

46. Fan W, Liu K, Liu H, et al. Interactive reinforcement learning for feature selection with decision tree in the loop. IEEE Transactions on Knowledge Data Engineering 2023; 35(2): 1624–1636. doi: 10.1109/TKDE.2021.3102120

47. Fan W, Liu K, Liu H, et al. AutoGFS: Automated group-based feature selection via interactive reinforcement learning. In: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM); 2021. pp. 342–350.

48. Liu K, Huang H, Zhang W, et al. Multi-armed bandit based feature selection. In: Proceedings of the 2021 SIAM International Conference on Data Mining (SDM); 2021. pp. 316–323.




DOI: https://doi.org/10.32629/jai.v6i3.880

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Abubaker Jumaah Rabash, Mohd Zakree Ahmad Nazri, Azrulhizam Shapii, Abdulmajeed Al-Jumaily

License URL: https://creativecommons.org/licenses/by-nc/4.0/