Hybrid model of unsupervised and supervised learning for multiclass sentiment analysis based on users’ reviews on healthcare web forums
Abstract
Twitter has become a popular platform for sharing health information, including diabetes-related content. Recent research studies have shown that Twitter data can be used for various purposes such as monitoring illnesses, promoting health, analyzing sentiment, and potentially aiding in medical directing. However, detecting fitness-related tweets in the vast amount of data on Twitter can be difficult. This pilot study, therefore, aimed to classify patient text about drugs and disease-associated tweets into meaningful health-related segments. The unlabeled dataset is divided into several groups using an unsupervised learning technique called K-Means Clustering, using this first label the text and followed by a combination of neural networks and machine learning classifiers, they classified 32046 diabetes-related tweets and 161290 drug text lines into five groups. Approximately 66.38% of drug line text was classified as health-related, with 55.14% “treatment and medication”, 7.10% “prevention” and 4.14% “symptoms and causes”. Over 33% were categorized as “Other and News”. If we talk about the tweets as a dataset then the tweet was classified as health-related, with 44.30% “treatment and medication”, 7% “prevention” and 5.3% “symptoms and causes”. Over 56.10% were categorized as “Other and News. After this multiclass classification, we applied three machine learning and two deep learning models to find accuracy, precision, recall, and F1 scores. Drug review was used as a dataset then SVM and LR models provided an accuracy of 98% and when tweets were used as a dataset then LR models provided an accuracy of 97%. This research shows the importance of social media data in the decision-making system in the healthcare domain.
Keywords
Full Text:
PDFReferences
1. Khalifa NM, Elghany MMA, Elghany MMA. The Potential of Social Media in Emerging Supply Chain Management. International Journal of Service Science, Management, Engineering, and Technology. 2021, 12(4): 39-58. doi: 10.4018/ijssmet.2021070103
2. Wahi AK, Medury Y, Misra RK. Social Media. International Journal of Service Science, Management, Engineering, and Technology. 2014, 5(3): 1-15. doi: 10.4018/ijssmet.2014070101
3. O’Leary DE. Twitter Mining for Discovery, Prediction and Causality: Applications and Methodologies. Intelligent Systems in Accounting, Finance and Management. 2015, 22(3): 227-247. doi: 10.1002/isaf.1376
4. Gridach M, Haddad H, Mulki H. Empirical evaluation of word representations on Arabic sentiment analysis. In: Arabic Language Processing: From Theory to Practice: 6th International Conference, ICALP 2017. Fez, Morocco. 11-12 October 2017. Proceedings 6. Springer International Publishing. pp. 147-158.
5. Tariyal A, Goyal S, Tantububay N. Sentiment Analysis of Tweets Using Various Machine Learning Techniques. 2018 International Conference on Advanced Computation and Telecommunication (ICACAT). Published online December 2018. doi: 10.1109/icacat.2018.8933612
6. Kaur R, Kautish S. Multimodal Sentiment Analysis. Research Anthology on Implementing Sentiment Analysis Across Multiple Disciplines. Published online June 10, 2022: 1846-1870. doi: 10.4018/978-1-6684-6303-1.ch098
7. Bansal P, Kaur R. Twitter Sentiment Analysis using Machine Learning and Optimization Techniques. International Journal of Computer Applications. 2018, 179(19): 5-8. doi: 10.5120/ijca2018916321
8. Rathi M, Malik A, Varshney D, et al.. Sentiment Analysis of Tweets Using Machine Learning Approach. 2018 Eleventh International Conference on Contemporary Computing (IC3). Published online August 2018. doi: 10.1109/ic3.2018.8530517
9. Yasen M, Tedmori S. Movies Reviews Sentiment Analysis and Classification. 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT). Published online April 2019. doi: 10.1109/jeeit.2019.8717422
10. Kumar A, Jaiswal A. Empirical study of twitter and tumblr for sentiment analysis using soft computing techniques. In: Proceedings of the world congress on engineering and computer science. 2017. Vol. 1, pp. 1-5.
11. Kamakshi P. Sentiment analysis on healthcare tweets. Indian J. Public Health Res. Develop., 2020. 11(6): 566-568.
12. Kaoud M. Investigation of Customer Knowledge Management. International Journal of Service Science, Management, Engineering, and Technology. 2017, 8(2): 12-22. doi: 10.4018/ijssmet.2017040102
13. Abuelenin S, Elmougy S, Naguib E. Twitter Sentiment Analysis for Arabic Tweets. Advances in Intelligent Systems and Computing. Published online August 31, 2017: 467-476. doi: 10.1007/978-3-319-64861-3_44
14. Al-Hadhrami S, Al-Fassam N, Benhidour H. Sentiment Analysis of English Tweets: A Comparative Study of Supervised and Unsupervised Approaches. 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS). Published online May 2019. doi: 10.1109/cais.2019.8769550
15. Al-Khasawneh A. A Method for Classification Using Data Mining Technique for Diabetes. Psychology and Mental Health.: 738-761. doi: 10.4018/978-1-5225-0159-6.ch030
16. Alhanjouri M. Preprocessing techniques for Arabic documents clustering. International Journal of Engineering and Management Research (IJEMR), 2017. 7(2): 70-79.
17. Alsaeedi A, Zubair M. A Study on Sentiment Analysis Techniques of Twitter Data. International Journal of Advanced Computer Science and Applications. 2019, 10(2). doi: 10.14569/ijacsa.2019.0100248
18. Anitha Avula V, Asha A. Improving Prediction Accuracy Using Hybrid Machine Learning Algorithm on Medical Datasets.
19. Ariestya WW, Praptiningsih YE, Supriatin W. Decision Tree Learning Untuk Penentuan Jalur Kelulusan Mahasiswa. Jurnal Ilmiah FIFO, 2016. 8(1): 97-105. doi: 10.22441/fifo.v8i1.1304
20. Solanki VK, Cuong NHH, Zonghyu (Joan) Lu. Opinion Mining. Extracting Knowledge From Opinion Mining. Published online 2019: 66-82. doi: 10.4018/978-1-5225-6117-0.ch004
21. Gautam G, Yadav D. Sentiment analysis of twitter data using machine learning approaches and semantic analysis. 2014 Seventh International Conference on Contemporary Computing (IC3). Published online August 2014. doi: 10.1109/ic3.2014.6897213
22. Rajurkar, V. P. (2015). A Survey on Sentiment Analysis Techniques for Social Media Data. International Journal of Computer Applications, 120(10): 1–4. doi: 10.5120/21127-9044
23. Poongothai K, Vijayalakshmi R. Sentiment Analysis of Social Media Data using Machine Learning Techniques. International Journal of Advanced Science and Technology, 2021. 30(3): 3671–3681.
24. Priyanka P, Rekha T. Sentiment Analysis of Twitter Data using Machine Learning Techniques. International Journal of Computer Sciences and Engineering, 2020. 8(4): 589–595. doi: 10.26438/ijcse/v8i4.589595
25. Rajendran S, Kousalya M. Sentiment Analysis on Twitter Data Using Machine Learning Techniques. International Journal of Engineering Research in Computer Science and Engineering, 2017. 4(2): 63–67.
26. Sedhain, G. P., Bui, M., Nagarajan, V., & Raj, B. (2015). Hierarchical Attention Networks for Document Classification. Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1480–1489. doi:10.3115/N15-1162
27. Pershad Y, Hangge P, Albadawi H, et al.. Social Medicine: Twitter in Healthcare. Journal of Clinical Medicine. 2018, 7(6): 121. doi: 10.3390/jcm7060121
28. Statista. Most popular social networks as of January 2020, ranked by number of active users (2020). Available online: https://www.statista.com/statistics/272014/global-social-networks-ranked-bynumber- of-users/ Mention. Twitter engagement report 2018 (accessed on 18 November 2018).
29. Finfgeld-Connett D. Twitter and Health Science Research. Western Journal of Nursing Research. 2014, 37(10): 1269-1283. doi: 10.1177/0193945914565056
30. Gabarron E, Dorronzoro E, Rivera-Romero O, et al. Diabetes on Twitter: A Sentiment Analysis. Journal of Diabetes Science and Technology. 2018, 13(3): 439-444. doi: 10.1177/1932296818811679
31. Sedrak MS, Salgia MM, Decat Bergerot C, et al. Examining Public Communication About Kidney Cancer on Twitter. JCO Clinical Cancer Informatics. 2019, (3): 1-6. doi: 10.1200/cci.18.00088
32. Sinnenberg L, Buttenheim AM, Padrez K, et al. Twitter as a Tool for Health Research: A Systematic Review. American Journal of Public Health. 2017, 107(1): e1-e8. doi: 10.2105/ajph.2016.303512
33. Chen K, Zhang Z, Long J, et al. Turning from TF-IDF to TF-IGM for term weighting in text classification. Expert Systems with Applications. 2016, 66: 245-260. doi: 10.1016/j.eswa.2016.09.009
34. Ali D, Missen MMS, Husnain M. Multiclass Event Classification from Text. He L, ed. Scientific Programming. 2021, 2021: 1-15. doi: 10.1155/2021/6660651
35. Divya, Y. A. (2018). Opinion Based Learning Model in Medical Sector. International Journal of Scientific Research in Computer Science. Engineering and Information Technology, 4(6), 607–609.
36. Emadi M, Rahgozar M. Twitter sentiment analysis using fuzzy integral classifier fusion. Journal of Information Science. 2019, 46(2): 226-242. doi: 10.1177/0165551519828627
37. Vijayaraghavan S, Basu D. Sentiment analysis in drug reviews using supervised machine learning algorithms. arXiv:23.11643.
38. Goeuriot L, Na JC, Min Kyaing WY, et al. Sentiment lexicons for health-related opinion mining. Proc. 2nd ACM SIGHIT Symp. Int. Health Informat. (IHI), 212, pp. 219-226.
39. Asghar MZ, Ahmad S, Qasim M, et al. SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus. 2016, 5(1). doi: 10.1186/s40064-016-2809-x
40. Liu S, Lee I. Extracting features with medical sentiment lexicon and position encoding for drug reviews. Health Information Science and Systems. 2019, 7(1). doi: 10.1007/s13755-019-0072-6
41. Go A, Bhayani R, Huang L. Twitter Sentiment Classification using Distant Supervision Processing. Journal on Processing, 2009. 1(12), 1–6.
42. Gohil S, Vuik S, Darzi A. Sentiment Analysis of Health Care Tweets: Review of the Methods Used. JMIR Public Health and Surveillance. 2018, 4(2): e43. doi: 10.2196/publichealth.5789
DOI: https://doi.org/10.32629/jai.v7i4.971
Refbacks
- There are currently no refbacks.
Copyright (c) 2024 Anuj Kumar, Shashi Shekhar
License URL: https://creativecommons.org/licenses/by-nc/4.0/