banner

CROA-based feature selection with BERT model for detecting the offensive speech in Twitter data

R. J. Anandhi, V. S. Anusuya Devi, B. S. Kiruthika Devi, Balasubramanian Prabhu kavin, Gan Hong Seng

Abstract


Online hate speech has flourished on social networking sites due to the widespread availability of mobile computers and other Web knowledge. Extensive research has shown that online exposure to hate speech has real-world effects on marginalized communities. Research into methods of automatically identifying hate speech has garnered significant attention. Hate speech can affect any demographic, while some populations are more vulnerable than others. Relying solely on progressive learning is insufficient for achieving the goal of automatic hate speech identification. It need access to large amounts of labelled data to train a model. Inaccurate statistics on hate speech and preconceived notions have been the biggest obstacles in the field of hate speech research for a long time. This research provides a novel strategy for meeting these needs by combining a transfer-learning attitude-based BERT (Bidirectional Encoder Representations from Transformers) with a coral reef optimization-based approach (CROA). A feature selection (FC) optimization strategy for coral reefs, a coral reefs optimization method mimics coral behaviours for reef location and development. We might think of each potential answer to the problem as a coral trying to establish itself in the reefs. The results are refined at each stage by applying specialized operators from the coral reefs optimization algorithm. When everything is said and done, the optimal solution is chosen. We also use a cutting-edge fine-tuning method based on transfer learning to assess BERT’s ability to recognize hostile contexts in social media communications. The paper evaluates the proposed approach using Twitter datasets tagged for racist, sexist, homophobic, or otherwise offensive content. The numbers show that our strategy achieves 5%–10% higher precision and recall compared to other approaches.


Keywords


natural language processing; bidirectional encoder representations from transformers; coral reefs optimization; hate speech detection; Twitter

Full Text:

PDF

References


1. Saeed Z, Ayaz Abbasi R, Razzak I. EveSense: What Can You Sense from Twitter? Advances in Information Retrieval. 2020, 491-495. doi: 10.1007/978-3-030-45442-5_64

2. Poletto F, Basile V, Sanguinetti M, et al. Resources and benchmark corpora for hate speech detection: a systematic review. Language Resources and Evaluation. 2020, 55(2): 477-523. doi: 10.1007/s10579-020-09502-8

3. Waseem Z, Hovy D. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. Proceedings of the NAACL Student Research Workshop. 2016. doi: 10.18653/v1/n16-2013

4. Watanabe H, Bouazizi M, Ohtsuki T. Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection. IEEE Access. 2018, 6: 13825-13835. doi: 10.1109/access.2018.2806394

5. Al-Hassan A, Al-Dossari H. Detection of hate speech in social networks: A survey on multilingual corpus. Computer Science & Information Technology (CS & IT). 2019. doi: 10.5121/csit.2019.90208

6. Chung YL, Kuzmenko E, Tekiroglu SS, et al. CONAN-COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi: 10.18653/v1/p19-1271

7. Jurgens D, Hemphill L, Chandrasekharan E. A Just and Comprehensive Strategy for Using NLP to Address Online Abuse. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi: 10.18653/v1/p19-1357

8. Burnap P, Williams ML. Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Science. 2016, 5(1). doi: 10.1140/epjds/s13688-016-0072-6

9. Gitari ND, Zhang Z, Damien H, et al. A Lexicon-based Approach for Hate Speech Detection. International Journal of Multimedia and Ubiquitous Engineering. 2015, 10(4): 215-230. doi: 10.14257/ijmue.2015.10.4.21

10. Tulkens S, Hilte L, Lodewyckx E, et al. A dictionary-based approach to racism detection in Dutch social media. arXiv. 2016, arXiv:1608.08738. doi: 10.48550/arXiv.1608.08738

11. Köffer S, Riehle DM, Höhenberger S, Becker J. Discussing the value of automatic hate speech detection in online debates. Multikonferenz Wirtschaftsinformatik (MKWI 2018): Data Driven X-Turning Data in Value; Leuphana, Germany.

12. Akuma S, Lubem T, Adom IT. Comparing Bag of Words and TF-IDF with different models for hate speech detection from live tweets. International Journal of Information Technology. 2022, 14(7): 3629-3635. doi: 10.1007/s41870-022-01096-4

13. William P, Gade R, Chaudhari R esh, et al. Machine Learning based Automatic Hate Speech Recognition System. 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS). 2022. doi: 10.1109/icscds53736.2022.9760959

14. Malik JS, Pang G, Hengel AVD. Deep learning for hate speech detection: a comparative study. arXiv. 2022, arXiv:2202.09517.

15. Turki T, Roy SS. Novel Hate Speech Detection Using Word Cloud Visualization and Ensemble Learning Coupled with Count Vectorizer. Applied Sciences. 2022, 12(13): 6611. doi: 10.3390/app12136611

16. Khan S, Fazil M, Sejwal VK, et al. BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection. Journal of King Saud University-Computer and Information Sciences. 2022, 34(7): 4335-4344. doi: 10.1016/j.jksuci.2022.05.006

17. Patil H, Velankar A, Joshi R. L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models. Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022). 2022, 1-9.

18. Almaliki M, Almars AM, Gad I, et al. ABMM: Arabic BERT-Mini Model for Hate-Speech Detection on Social Media. Electronics. 2023, 12(4): 1048. doi: 10.3390/electronics12041048

19. del Valle-Cano G, Quijano-Sánchez L, Liberatore F, et al. SocialHaterBERT: A dichotomous approach for automatically detecting hate speech on Twitter through textual analysis and user profiles. Expert Systems with Applications. 2023, 216: 119446. doi: 10.1016/j.eswa.2022.119446

20. Bilal M, Khan A, Jan S, et al. Roman Urdu Hate Speech Detection Using Transformer-Based Model for Cyber Security Applications. Sensors. 2023, 23(8): 3909. doi: 10.3390/s23083909

21. Puteri FN, Sibaroni Y, Fitriyani F. Hate Speech Detection in Indonesia Twitter Comments Using Convolutional Neural Network (CNN) and FastText Word Embedding. Jurnal Media Informatika Budidarma. 2023, 7(3): 1154-1161.

22. Castillo-lópez G, Riabi A, Seddah D. Analyzing Zero-Shot transfer Scenarios across Spanish variants for Hate Speech Detection. Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023). 2023. doi: 10.18653/v1/2023.vardial-1.1

23. Awal MR, Lee RKW, Tanwar E, et al. Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection. IEEE Transactions on Computational Social Systems. 2023, 1-10. doi: 10.1109/tcss.2023.3252401

24. Dwivedy V, Roy PK. Deep feature fusion for hate speech detection: a transfer learning approach. Multimedia Tools and Applications. 2023, 82(23): 36279-36301. doi: 10.1007/s11042-023-14850-y

25. Ali R, Farooq U, Arshad U, et al. Hate speech detection on Twitter using transfer learning. Computer Speech & Language. 2022, 74: 101365. doi: 10.1016/j.csl.2022.101365

26. Khan S, Kamal A, Fazil M, et al. HCovBi-Caps: Hate Speech Detection Using Convolutional and Bi-Directional Gated Recurrent Unit With Capsule Network. IEEE Access. 2022, 10: 7881-7894. doi: 10.1109/access.2022.3143799

27. Hovy D, Waseem Z. Hateful symbols or hateful people? predictive features for hate speech detection on Twitter. In: Proceedings of the student research workshop, SRW@HLT-NAACL 2016, The 2016 conference of the North American chapter of the Association for Computational linguistics: human language technologies; 2016; San Diego California, USA. pp. 88-93.

28. Assiri A, Emam A, Al-Dossari H. Towards enhancement of a lexicon-based approach for Saudi dialect sentiment analysis. Journal of Information Science. 2017, 44(2): 184-202. doi: 10.1177/0165551516688143

29. Wiegand M, Ruppenhofer J, Schmidt A, et al. Inducing a Lexicon of Abusive Words—a Feature-Based Approach. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018. doi: 10.18653/v1/n18-1095

30. Gitari ND, Zhang Z, Damien H, et al. Incorporating lexical knowledge via WordNet to latent dirichlet allocation in offensive message detection. Journal of Computational and Theoretical Nanoscience. 2016, 13(5): 215-230. doi: 10.1166/jctn.2016.5243

31. George KS, Joseph S. Text Classification by Augmenting Bag of Words (BOW) Representation with Co-occurrence Feature. IOSR Journal of Computer Engineering. 2014, 16(1): 34-38. doi: 10.9790/0661-16153438

32. Tsai CF. Bag-of-Words Representation in Image Annotation: A Review. ISRN Artificial Intelligence. 2012, 2012: 1-19. doi: 10.5402/2012/376804

33. Burnap P, Williams ML. Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 2016, 5(1).

34. Xiang G, Fan B, Wang L, et al. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. Proceedings of the 21st ACM international conference on Information and knowledge management. 2012. doi: 10.1145/2396761.2398556

35. Lilleberg J, Zhu Y, Zhang Y. Support vector machines and Word2vec for text classification with semantic features. 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC). 2015. doi: 10.1109/icci-cc.2015.7259377

36. Venkatasubramanian S, Suhasini A, Vennila C. Cluster Head Selection and Optimal Multipath detection using Coral Reef Optimization in MANET Environment. International Journal of Computer Network and Information Security. 2022, 14(3): 88-99. doi: 10.5815/ijcnis.2022.03.07




DOI: https://doi.org/10.32629/jai.v7i3.1122

Refbacks

  • There are currently no refbacks.


Copyright (c) 2024 R. J. Anandhi, V. S. Anusuya Devi, B. S. Kiruthika Devi, Balasubramanian Prabhu kavin, Gan Hong Seng

License URL: https://creativecommons.org/licenses/by-nc/4.0/