Improvement of support vector machine for predicting diabetes mellitus with machine learning approach
Abstract
The prevalence of diabetes is currently increasing worldwide, including in Indonesia, due to the increasing levels of stress and lack of physical activity that led to obesity and related complications such as hypertension. However, only about 25% of diabetes patients are aware of their condition. Therefore, this study aims to find an algorithm that can help predict with better accuracy using the diabetes mellitus dataset obtained from Kaggle. To obtain information about the accuracy level of diabetes diagnosis, the data will be processed using two methods, namely support vector machine and naive bayes. To obtain the most accurate results, we optimize each variant and parameter of every algorithm used. The best method in this study was produced by the support vector machine method with a radial basis function (RBF) kernel, which achieved an accuracy level of 98.25%, superior to the naive bayes method which obtained the highest accuracy of only 77.25%. Additionally, this study also applied the proposed method using the diabetes mellitus dataset from LAB01 DAT263x taken from the Kaggle website. The results of the experiment indicate that the suggested model outperforms other methods in terms of performance, with a tendency for high accuracy generated in every experiment for all datasets.
Keywords
Full Text:
PDFReferences
1. Petersmann A, Müller-Wieland D, Müller UA, et al. Definition, classification and diagnosis of diabetes mellitus. Experimental and Clinical Endocrinology and Diabetes 2019; 127(S01): S1–S7. doi: 10.1055/a-1018-9078
2. John JE, John NA. Imminent risk of COVID-19 in diabetes mellitus and undiagnosed diabetes mellitus patients. Pan African Medical Journal 2020; 36. doi: 10.11604/pamj.2020.36.158.24011
3. Federation D. IDF Diabetes Atlas Tenth Edition 2021. International Diabetes Federation; 2021.
4. Kemenkes RI. Information data center ministry of health 2020 diabetes mellitus (Indonesian). Kementrian Kesehatan RI 2020; 15(2).
5. Tiwari AK, Ramakrishna G, Sharma KL, Kashyap SK. Academic performance prediction algorithm based on fuzzy data mining. IAES International Journal of Artificial Intelligence 2019; 8(1): 26. doi: 10.11591/ijai.v8.i1.pp26-32
6. Alghamdi M, Al-Mallah M, Keteyian S, et al. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The henry ford exercise testing (FIT) project. PLoS One 2017; 12(7): e0179805. doi: 10.1371/journal.pone.0179805
7. Poonia RC, Gupta MK, Abunadi I, et al. Intelligent diagnostic prediction and classification models for detection of kidney disease. Healthcare 2022; 10(2): 371. doi: 10.3390/healthcare10020371
8. Zou Q, Qu K, Luo Y, et al. Predicting diabetes mellitus with machine learning techniques. Front Genet 2018; 9. doi: 10.3389/fgene.2018.00515
9. Vigneswari D, Kumar NK, Ganesh Raj V, et al. Machine learning tree classifiers in predicting diabetes mellitus. In: Proceedings of the 5th International Conference on Advanced Computing and Communication Systems (ICACCS 2019); 15–16 March 2019; Coimbatore, India.
10. Liu Q, Zhang M, He Y, et al. Predicting the risk of incident type 2 diabetes mellitus in Chinese elderly using machine learning techniques. Journal of Personalized Medicine 2022; 12(6): 905. doi: 10.3390/jpm12060905
11. Maulidah N, Supriyadi R, Utami DY, et al. Prediction of diabetes mellitus using support vector machine and naive bayes methods (Indonesian). Indonesian Journal on Software Engineering (IJSE) 2021; 7(1): 63–68. doi: 10.31294/ijse.v7i1.10279
12. Faruque MF, Asaduzzaman A, Hossain SMM, et al. Predicting diabetes mellitus and analysing risk-factors correlation. EAI Endorsed Trans Pervasive Health Technol 2020; 5(20): 164173. doi: 10.4108/eai.13-7-2018.164173
13. Mushtaq Z, Ramzan MF, Ali S, et al. Voting classification-based diabetes mellitus prediction using hypertuned machine-learning techniques. Mobile Information Systems 2022; 2022: 1–16. doi: 10.1155/2022/6521532
14. Diabetes. Available online: https://www.kaggle.com/datasets/johndasilva/diabetes (accessed on 20 April 2023).
15. Diabetes from DAT263x Lab01. Available online: https://www.kaggle.com/datasets/fmendes/diabetes-from-dat263x-lab01 (accessed on 20 April 2023).
16. Ghorbani R, Ghousi R. Comparing different resampling methods in predicting students’ performance using machine learning techniques. IEEE Access 2020; 8: 67899–67911. doi: 10.1109/access.2020.2986809
17. Anggoro DA, Supriyanti W. Improving accuracy by applying Z-score normalization in linear regression and polynomial regression model for real estate data. International Journal of Emerging Trends in Engineering Research 2019; 7(11). doi: 10.30534/ijeter/2019/247112019
18. Li W, Liu Z. A method of SVM with normalization in intrusion detection. Procedia Environmental Sciences 2011; 11: 256–262. doi: 10.1016/j.proenv.2011.12.040
19. Huang S, Cai N, Pacheco PP, et al. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics and Proteomics 2018; 15(1): 41–51. doi: 10.21873/cgp.20063
20. Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press; 2000.
21. Schölkopf B, Smola AJ. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). MIT Press; 2001.
22. González C, Mira-McWilliams J, Juárez I. Important variable assessment and electricity price forecasting based on regression tree models: Classification and regression trees, bagging and random forests. IET Generation, Transmission and Distribution 2015; 9(11): 1120–1128. doi: 10.1049/iet-gtd.2014.0655
23. Golpour P, Ghayour-Mobarhan M, Saki A, et al. Comparison of support vector machine, naïve bayes and logistic regression for assessing the necessity for coronary angiography. International Journal of Environmental Research and Public Health 2020; 17(18): 6449. doi: 10.3390/ijerph17186449
24. Noble WS. What is a support vector machine? Nature Biotechnology 2006; 24(12): 1565–1567. doi: 10.1038/nbt1206-1565
25. Battineni G, Chintalapudi N, Amenta F. Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM). Informatics in Medicine Unlocked 2019; 16: 100200. doi: 10.1016/j.imu.2019.100200
26. Cheng H, Tan PN, Jin R. Efficient algorithm for localized support vector machine. IEEE Transactions on Knowledge and Data Engineering 2010; 22(4): 537–549. doi: 10.1109/tkde.2009.116
27. Achirul Nanda M, Boro Seminar K, Nandika D, Maddu A. A comparison study of kernel functions in the support vector machine and its application for termite detection. Information 2018; 9(1): 5. doi: 10.3390/info9010005
28. Kamble M, Shrivastava P, Jain M. Digitized spiral drawing classification for Parkinson’s disease diagnosis. Measurement: Sensors 2021; 16: 100047. doi: 10.1016/j.measen.2021.100047
29. Wu Y, Lu Y. An intelligent machine vision system for detecting surface defects on packing boxes based on support vector machine. Measurement and Control 2019; 52(7–8): 1102–1110. doi: 10.1177/0020294019858175
30. Sunarya POA, Refianti R, Benny A, Octaviani W. Comparison of accuracy between convolutional neural networks and naïve bayes classifiers in sentiment analysis on twitter. International Journal of Advanced Computer Science and Applications 2019;10(5): 77–86. doi: 10.14569/ijacsa.2019.0100511
31. Malani R, Putra ABW, Rifani M. Implementation of the naive bayes classifier method for potential network port selection. International Journal of Computer Network and Information Security 2020; 12(2): 32–40. doi: 10.5815/ijcnis.2020.02.04
32. Rezaeian N, Novikova G. Persian text classification using naive bayes algorithms and support vector machine algorithm. Indonesian Journal of Electrical Engineering and Informatics 2020; 8(1): 178–188. doi: 10.11591/ijeei.v8i1.1696
33. Guo J, Wan B, Wu H, et al. A virtual reality and online learning immersion experience evaluation model based on SVM and wearable recordings. Electronics 2022; 11(9): 1429. doi: 10.3390/electronics11091429
34. Chen RC, Dewi C, Huang SW, Caraka RE. Selecting critical features for data classification based on machine learning methods. Journal of Big Data 2020; 7(1): 1–26. doi: 10.1186/s40537-020-00327-4
DOI: https://doi.org/10.32629/jai.v7i2.888
Refbacks
- There are currently no refbacks.
Copyright (c) 2023 Christine Dewi,, Jernius Zendrato, Henoch Juli Christanto
License URL: https://creativecommons.org/licenses/by-nc/4.0/