An extensive analysis of several methods for classifying unbalanced datasets
Abstract
In large-scale data applications, handling unbalanced data is a major issue. In order to gather the uneven data at the fastest pace feasible, the imbalanced data categorization system was created. Numerous neural methods have been developed to accurately categorize unbalanced data. However, because of the intricacy of the data, the classification process becomes more challenging due to increased resource utilization, computing costs, and algorithm complexity. As a result, this research has provided specifics on the performances of many classification models in various unbalanced datasets. Ultimately, a performance study was conducted to evaluate each model's categorization performance. For this reason, the precision, specificity, accuracy, and sensitivity have been used to measure the robustness. Each model's advantages and disadvantages are also thoroughly covered. The categorization models then offered future approaches to enhance the unbalanced data based on the drawbacks.
Keywords
Full Text:
PDFReferences
1. Yin X, Liu Q, Pan Y, et al. Strength of Stacking Technique of Ensemble Learning in Rockburst Prediction with Imbalanced Data: Comparison of Eight Single and Ensemble Models. Natural Resources Research. 2021, 30(2): 1795-1815. doi: 10.1007/s11053-020-09787-0
2. Dogan A, Birant D. Machine learning and data mining in manufacturing. Expert Systems with Applications. 2021, 166: 114060. doi: 10.1016/j.eswa.2020.114060
3. Thakkar H, Shah V, Yagnik H, et al. Comparative anatomization of data mining and fuzzy logic techniques used in diabetes prognosis. Clinical eHealth. 2021, 4: 12-23. doi: 10.1016/j.ceh.2020.11.001
4. Pan Y, Zhang L. A BIM-data mining integrated digital twin framework for advanced project management. Automation in Construction. 2021, 124: 103564. doi: 10.1016/j.autcon.2021.103564
5. Espadinha-Cruz P, Godina R, Rodrigues EMG. A Review of Data Mining Applications in Semiconductor Manufacturing. Processes. 2021, 9(2): 305. doi: 10.3390/pr9020305
6. Jedrzejowicz J, Jedrzejowicz P. GEP-based classifier for mining imbalanced data. Expert Systems with Applications. 2021, 164: 114058. doi: 10.1016/j.eswa.2020.114058
7. Liu P, Qingqing W, Liu W. Enterprise human resource management platform based on FPGA and data mining. Microprocessors and Microsystems. 2021, 80: 103330. doi: 10.1016/j.micpro.2020.103330
8. Al-Hashedi KG, Magalingam P. Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019. Computer Science Review. 2021, 40: 100402. doi: 10.1016/j.cosrev.2021.100402
9. Sanad Z, Al-Sartawi A. Financial Statements Fraud and Data Mining: A Review. Lecture Notes in Networks and Systems. Published online 2021: 407-414. doi: 10.1007/978-3-030-77246-8_38
10. Shabtay L, Fournier-Viger P, Yaari R, et al. A guided FP-Growth algorithm for mining multitude-targeted item-sets and class association rules in imbalanced data. Information Sciences. 2021, 553: 353-375. doi: 10.1016/j.ins.2020.10.020
11. Aminian E, Ribeiro RP, Gama J. Chebyshev approaches for imbalanced data streams regression models. Data Mining and Knowledge Discovery. 2021, 35(6): 2389-2466. doi: 10.1007/s10618-021-00793-1
12. Korycki Ł, Krawczyk B. Low-Dimensional Representation Learning from Imbalanced Data Streams. Lecture Notes in Computer Science. 2021, 629-641. doi: 10.1007/978-3-030-75762-5_50
13. Grzyb J, Klikowski J, Woźniak M. Hellinger Distance Weighted Ensemble for imbalanced data stream classification. Journal of Computational Science. 2021, 51: 101314. doi: 10.1016/j.jocs.2021.101314
14. Lu N, Yin T. Transferable common feature space mining for fault diagnosis with imbalanced data. Mechanical Systems and Signal Processing. 2021, 156: 107645. doi: 10.1016/j.ymssp.2021.107645
15. Sisodia D, Sisodia DS. Data sampling strategies for click fraud detection using imbalanced user click data of online advertising: An empirical review. IETE Technical Review. 2021, 39(4): 789–798. doi: 10.1080/02564602.2021.1915892
16. Alican D, Birant D. Machine learning and data mining in manufacturing. Expert Systems with Applications 2021, 166: 114060.
17. Mirzaei B, Nikpour B, Nezamabadi-pour H. CDBH: A clustering and density-based hybrid approach for imbalanced data classification. Expert Systems with Applications. 2021, 164: 114035. doi: 10.1016/j.eswa.2020.114035
18. Chen S xia, Wang X kang, Zhang H, et al. Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine. Expert Systems with Applications. 2021, 173: 114756. doi: 10.1016/j.eswa.2021.114756
19. Zhu S. Analysis of the severity of vehicle-bicycle crashes with data mining techniques. Journal of Safety Research. 2021, 76: 218-227. doi: 10.1016/j.jsr.2020.11.011
20. Yang K, Yu Z, Chen CLP, et al. Incremental weighted ensemble broad learning system for imbalanced data. IEEE Transactions on Knowledge and Data Engineering. 2021, 34(12): 5809-5824. doi: 10.1109/TKDE.2021.3061428
21. Pradipta GA, Wardoyo R, Musdholifah A, et al. Radius-SMOTE: A New Oversampling Technique of Minority Samples Based on Radius Distance for Learning From Imbalanced Data. IEEE Access. 2021, 9: 74763-74777. doi: 10.1109/access.2021.3080316
22. Wang W, Sun D. The improved AdaBoost algorithms for imbalanced data classification. Information Sciences. 2021, 563: 358-374. doi: 10.1016/j.ins.2021.03.042
23. Hou C, Wu J, Cao B, et al. A deep-learning prediction model for imbalanced time series data forecasting. Big Data Mining and Analytics. 2021, 4(4): 266-278. doi: 10.26599/bdma.2021.9020011
24. Pereira RM, Costa YMG, Silla Jr. CN. Toward hierarchical classification of imbalanced data using random resampling algorithms. Information Sciences. 2021, 578: 344-363. doi: 10.1016/j.ins.2021.07.033
25. Wang X, Xu J, Zeng T, et al. Local distribution-based adaptive minority oversampling for imbalanced data classification. Neurocomputing. 2021, 422: 200-213. doi: 10.1016/j.neucom.2020.05.030
26. Vuttipittayamongkol P, Elyan E, Petrovski A. On the class overlap problem in imbalanced data classification. Knowledge-Based Systems. 2021, 212: 106631. doi: 10.1016/j.knosys.2020.106631
27. Dang LM, Kyeong S, Li Y, et al. Deep learning-based sewer defect classification for highly imbalanced dataset. Computers & Industrial Engineering. 2021, 161: 107630. doi: 10.1016/j.cie.2021.107630
28. Sambasivam G, Opiyo GD. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egyptian Informatics Journal. 2021, 22(1): 27-34. doi: 10.1016/j.eij.2020.02.007
29. Rupapara V, Rustam F, Shahzad HF, et al. Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model. IEEE Access. 2021, 9: 78621-78634. doi: 10.1109/access.2021.3083638
30. Asniar, Maulidevi NU, Surendro K. SMOTE-LOF for noise identification in imbalanced data classification. Journal of King Saud University-Computer and Information Sciences. 2021, 34(6): 3413-3423. doi: 10.1016/j.jksuci.2021.01.014
31. Yao P, Shen S, Xu M, et al. Single model deep learning on imbalanced small datasets for skin lesion classification. IEEE Transactions on Medical Imaging. 2021, 41(5): 1242-1254. doi: 10.1109/TMI.2021.3136682
32. Wan X, Zhang X, Liu L. An Improved VGG19 Transfer Learning Strip Steel Surface Defect Recognition Deep Neural Network Based on Few Samples and Imbalanced Datasets. Applied Sciences. 2021, 11(6): 2606. doi: 10.3390/app11062606
33. Fernando KRM, Tsokos CP. Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems. 2022, 33(7): 2940-2951. doi: 10.1109/TNNLS.2020.3047335
34. Yilmaz SF, Kaynak EB, Koç A, et al. Multi-Label Sentiment Analysis on 100 Languages With Dynamic Weighting for Label Imbalance. IEEE Transactions on Neural Networks and Learning Systems. 2023, 34(1): 331-343. doi: 10.1109/TNNLS.2021.3094304
35. Kim Y, Lee Y, Jeon M. Imbalanced image classification with complement cross entropy. Pattern Recognition Letters. 2021, 151: 33-40. doi: 10.1016/j.patrec.2021.07.017
36. Yan Z, Wen H. Electricity Theft Detection Base on Extreme Gradient Boosting in AMI. IEEE Transactions on Instrumentation and Measurement. 2021, 70: 1-9. doi: 10.1109/tim.2020.3048784
37. Nguyen HTT, Chen LH, Saravanarajan VS, et al. Using XG Boost and Random Forest Classifier Algorithms to Predict Student Behavior. 2021 Emerging Trends in Industry 40 (ETI 40). 2021. doi: 10.1109/eti4.051663.2021.9619217
38. Dong Y, Shen X, Jiang Z, et al. Recognition of imbalanced underwater acoustic datasets with exponentially weighted cross-entropy loss. Applied Acoustics. 2021, 174: 107740. doi: 10.1016/j.apacoust.2020.107740
39. Xu Y, Yu Z, Chen CLP, et al. Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification. IEEE Transactions on Neural Networks and Learning Systems. 2023, 34(5): 2284-2297. doi: 10.1109/tnnls.2021.3106306
40. Hassib EslamM, El-Desouky AliI, Labib LabibM, et al. WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. Soft Computing. 2019, 24(8): 5573-5592. doi: 10.1007/s00500-019-03901-y
41. Li Z, Zhang Q, He Y. Modified group theory-based optimization algorithms for numerical optimization. Applied Intelligence. 2022, 1-24.
42. Shaw SS, Ahmed S, Malakar S, et al. Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem. Complex & Intelligent Systems. 2021, 7(4): 2069-2091. doi: 10.1007/s40747-021-00314-z
43. Desuky AS, Hussain S. An Improved Hybrid Approach for Handling Class Imbalance Problem. Arabian Journal for Science and Engineering. 2021, 46(4): 3853-3864. doi: 10.1007/s13369-021-05347-7
44. Pustokhina IV, Pustokhin DA, Nguyen PT, et al. Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector. Complex & Intelligent Systems. 2021, 9(4): 3473-3485. doi: 10.1007/s40747-021-00353-6
DOI: https://doi.org/10.32629/jai.v7i3.966
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Sharaf Alzoubi, Khaled Aldiabat, Mofleh Al-diabat, Laith Abualigah
License URL: https://creativecommons.org/licenses/by-nc/4.0/