banner

Improvement of support vector machine for predicting diabetes mellitus with machine learning approach

Christine Dewi, Jernius Zendrato, Henoch Juli Christanto

Abstract


The prevalence of diabetes is currently increasing worldwide, including in Indonesia, due to the increasing levels of stress and lack of physical activity that led to obesity and related complications such as hypertension. However, only about 25% of diabetes patients are aware of their condition. Therefore, this study aims to find an algorithm that can help predict with better accuracy using the diabetes mellitus dataset obtained from Kaggle. To obtain information about the accuracy level of diabetes diagnosis, the data will be processed using two methods, namely support vector machine and naive bayes. To obtain the most accurate results, we optimize each variant and parameter of every algorithm used. The best method in this study was produced by the support vector machine method with a radial basis function (RBF) kernel, which achieved an accuracy level of 98.25%, superior to the naive bayes method which obtained the highest accuracy of only 77.25%. Additionally, this study also applied the proposed method using the diabetes mellitus dataset from LAB01 DAT263x taken from the Kaggle website. The results of the experiment indicate that the suggested model outperforms other methods in terms of performance, with a tendency for high accuracy generated in every experiment for all datasets.


Keywords


support vector machine; naive bayes; diabetes mellitus

Full Text:

PDF

References


1. Petersmann A, Müller-Wieland D, Müller UA, et al. Definition, classification and diagnosis of diabetes mellitus. Experimental and Clinical Endocrinology and Diabetes 2019; 127(S01): S1–S7. doi: 10.1055/a-1018-9078

2. John JE, John NA. Imminent risk of COVID-19 in diabetes mellitus and undiagnosed diabetes mellitus patients. Pan African Medical Journal 2020; 36. doi: 10.11604/pamj.2020.36.158.24011

3. Federation D. IDF Diabetes Atlas Tenth Edition 2021. International Diabetes Federation; 2021.

4. Kemenkes RI. Information data center ministry of health 2020 diabetes mellitus (Indonesian). Kementrian Kesehatan RI 2020; 15(2).

5. Tiwari AK, Ramakrishna G, Sharma KL, Kashyap SK. Academic performance prediction algorithm based on fuzzy data mining. IAES International Journal of Artificial Intelligence 2019; 8(1): 26. doi: 10.11591/ijai.v8.i1.pp26-32

6. Alghamdi M, Al-Mallah M, Keteyian S, et al. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The henry ford exercise testing (FIT) project. PLoS One 2017; 12(7): e0179805. doi: 10.1371/journal.pone.0179805

7. Poonia RC, Gupta MK, Abunadi I, et al. Intelligent diagnostic prediction and classification models for detection of kidney disease. Healthcare 2022; 10(2): 371. doi: 10.3390/healthcare10020371

8. Zou Q, Qu K, Luo Y, et al. Predicting diabetes mellitus with machine learning techniques. Front Genet 2018; 9. doi: 10.3389/fgene.2018.00515

9. Vigneswari D, Kumar NK, Ganesh Raj V, et al. Machine learning tree classifiers in predicting diabetes mellitus. In: Proceedings of the 5th International Conference on Advanced Computing and Communication Systems (ICACCS 2019); 15–16 March 2019; Coimbatore, India.

10. Liu Q, Zhang M, He Y, et al. Predicting the risk of incident type 2 diabetes mellitus in Chinese elderly using machine learning techniques. Journal of Personalized Medicine 2022; 12(6): 905. doi: 10.3390/jpm12060905

11. Maulidah N, Supriyadi R, Utami DY, et al. Prediction of diabetes mellitus using support vector machine and naive bayes methods (Indonesian). Indonesian Journal on Software Engineering (IJSE) 2021; 7(1): 63–68. doi: 10.31294/ijse.v7i1.10279

12. Faruque MF, Asaduzzaman A, Hossain SMM, et al. Predicting diabetes mellitus and analysing risk-factors correlation. EAI Endorsed Trans Pervasive Health Technol 2020; 5(20): 164173. doi: 10.4108/eai.13-7-2018.164173

13. Mushtaq Z, Ramzan MF, Ali S, et al. Voting classification-based diabetes mellitus prediction using hypertuned machine-learning techniques. Mobile Information Systems 2022; 2022: 1–16. doi: 10.1155/2022/6521532

14. Diabetes. Available online: https://www.kaggle.com/datasets/johndasilva/diabetes (accessed on 20 April 2023).

15. Diabetes from DAT263x Lab01. Available online: https://www.kaggle.com/datasets/fmendes/diabetes-from-dat263x-lab01 (accessed on 20 April 2023).

16. Ghorbani R, Ghousi R. Comparing different resampling methods in predicting students’ performance using machine learning techniques. IEEE Access 2020; 8: 67899–67911. doi: 10.1109/access.2020.2986809

17. Anggoro DA, Supriyanti W. Improving accuracy by applying Z-score normalization in linear regression and polynomial regression model for real estate data. International Journal of Emerging Trends in Engineering Research 2019; 7(11). doi: 10.30534/ijeter/2019/247112019

18. Li W, Liu Z. A method of SVM with normalization in intrusion detection. Procedia Environmental Sciences 2011; 11: 256–262. doi: 10.1016/j.proenv.2011.12.040

19. Huang S, Cai N, Pacheco PP, et al. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics and Proteomics 2018; 15(1): 41–51. doi: 10.21873/cgp.20063

20. Cristianini N, Shawe-Taylor J. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press; 2000.

21. Schölkopf B, Smola AJ. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). MIT Press; 2001.

22. González C, Mira-McWilliams J, Juárez I. Important variable assessment and electricity price forecasting based on regression tree models: Classification and regression trees, bagging and random forests. IET Generation, Transmission and Distribution 2015; 9(11): 1120–1128. doi: 10.1049/iet-gtd.2014.0655

23. Golpour P, Ghayour-Mobarhan M, Saki A, et al. Comparison of support vector machine, naïve bayes and logistic regression for assessing the necessity for coronary angiography. International Journal of Environmental Research and Public Health 2020; 17(18): 6449. doi: 10.3390/ijerph17186449

24. Noble WS. What is a support vector machine? Nature Biotechnology 2006; 24(12): 1565–1567. doi: 10.1038/nbt1206-1565

25. Battineni G, Chintalapudi N, Amenta F. Machine learning in medicine: Performance calculation of dementia prediction by support vector machines (SVM). Informatics in Medicine Unlocked 2019; 16: 100200. doi: 10.1016/j.imu.2019.100200

26. Cheng H, Tan PN, Jin R. Efficient algorithm for localized support vector machine. IEEE Transactions on Knowledge and Data Engineering 2010; 22(4): 537–549. doi: 10.1109/tkde.2009.116

27. Achirul Nanda M, Boro Seminar K, Nandika D, Maddu A. A comparison study of kernel functions in the support vector machine and its application for termite detection. Information 2018; 9(1): 5. doi: 10.3390/info9010005

28. Kamble M, Shrivastava P, Jain M. Digitized spiral drawing classification for Parkinson’s disease diagnosis. Measurement: Sensors 2021; 16: 100047. doi: 10.1016/j.measen.2021.100047

29. Wu Y, Lu Y. An intelligent machine vision system for detecting surface defects on packing boxes based on support vector machine. Measurement and Control 2019; 52(7–8): 1102–1110. doi: 10.1177/0020294019858175

30. Sunarya POA, Refianti R, Benny A, Octaviani W. Comparison of accuracy between convolutional neural networks and naïve bayes classifiers in sentiment analysis on twitter. International Journal of Advanced Computer Science and Applications 2019;10(5): 77–86. doi: 10.14569/ijacsa.2019.0100511

31. Malani R, Putra ABW, Rifani M. Implementation of the naive bayes classifier method for potential network port selection. International Journal of Computer Network and Information Security 2020; 12(2): 32–40. doi: 10.5815/ijcnis.2020.02.04

32. Rezaeian N, Novikova G. Persian text classification using naive bayes algorithms and support vector machine algorithm. Indonesian Journal of Electrical Engineering and Informatics 2020; 8(1): 178–188. doi: 10.11591/ijeei.v8i1.1696

33. Guo J, Wan B, Wu H, et al. A virtual reality and online learning immersion experience evaluation model based on SVM and wearable recordings. Electronics 2022; 11(9): 1429. doi: 10.3390/electronics11091429

34. Chen RC, Dewi C, Huang SW, Caraka RE. Selecting critical features for data classification based on machine learning methods. Journal of Big Data 2020; 7(1): 1–26. doi: 10.1186/s40537-020-00327-4




DOI: https://doi.org/10.32629/jai.v7i2.888

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Christine Dewi,, Jernius Zendrato, Henoch Juli Christanto

License URL: https://creativecommons.org/licenses/by-nc/4.0/