banner

Designing new student performance prediction model using ensemble machine learning

Rajan Saluja, Munishwar Rai, Rashmi Saluja

Abstract


Academic success for students in any educational institute is the primary requirement for all stakeholders, i.e., students, teachers, parents, administrators and management, industry, and the environment. Regular feedback from all stakeholders helps higher education institutions (HEIs) rise professionally and academically, yet they must use emerging technologies that can help institutions to grow at a faster pace. Early prediction of students’ success using trending artificial intelligence technologies like machine learning, early finding of at-risk students, and predicting a suitable branch or course can help both management and students improve their academics. In our work, we have proposed a new student performance prediction model in which we have used ensemble machine learning with stacking of four multi-class classifiers, decision tree, k-nearest neighbor, Naïve Bayes, and One vs. Rest support vector machine classifiers. The proposed model predicts the final grade of a student at the earliest possible time and the suitable stream for a new student. A student dataset of over a thousand students from five different branches of an engineering institute has been taken to test the results. The proposed model compares the four-machine learning (ML) techniques being used and predicts the final grade with an accuracy of 93%.

Keywords


Ensemble Machine Learning; Decision Tree; K-Nearest Neighbor; Naïve Bayes; One vs. Rest Support Vector Machine

Full Text:

PDF

References


1. Jiang W, Chen Z, Xiang Y, et al. SSEM: A novel self-adaptive stacking ensemble model for classification. IEEE Access 2019; 7: 120337–120349. doi: 10.1109/ACCESS.2019.2933262.2.

2. Bujang SDA, Selamat A, Ibrahim R, et al. Multiclass prediction model for student grade prediction using machine learning. IEEE Access 2021; 9: 95608–95621. doi: 10.1109/ACCESS.2021.3093563.

3. Pang Y, Judd N, O’Brien J, Ben-Avie M. Predicting students’ graduation outcomes through support vector machines. In: 2017 Frontiers in Education Conference (FIE); 2017 Oct 18–21; Indianapolis, IN, USA. New York: IEEE; 2017. doi: 10.1109/FIE.2017.8190666.

4. Ünal F. Data mining for student performance prediction in education. In: Birant D (editor). Data mining—Methods, applications, and systems. London: IntechOpen; 2021. doi: 10.5772/intechopen.91449.

5. Palacios CA, Reyes-Suárez JA, Bearzotti LA, et al. Knowledge discovery for higher education student retention based on data mining: Machine learning algorithms and case study in Chile. Entropy 2021; 23(4): 485. doi: 10.3390/e23040485.

6. Kaunang FJ, Rotikan R. Students’ academic performance prediction using data mining. In: 2018 Third International Conference on Informatics and Computing (ICIC); 2018 Oct 17–18; Palembang, Indonesia. New York: IEEE; 2019. doi: 10.1109/IAC.2018.8780547.

7. Ruiz S, Urretavizcaya M, Rodríguez C, Fernández-Castro I. Predicting students’ outcomes from emotional response in the classroom and attendance. Interactive Learning Environments 2020; 28(1): 107–129. doi: 10.1080/10494820.2018.1528282.

8. Cervera DEM, Parra OJS, Prado MAA. Forecasting model with machine learning in higher education ICFES exams. International Journal of Electrical and Computer Engineering 2021; 11(6): 5402–5410. doi: 10.11591/ijece.v11i6.pp5402-5410.

9. Sethi K, Jaiswal V, Ansari MD. Machine learning based support system for students to select stream (subject). Recent Advances in Computer Science and Communications 2020; 13(3): 336–344. doi: 10.2174/2213275912666181128120527.

10. Marbouti F, Diefes-Dux HA, Madhavan K. Models for early prediction of at-risk students in a course using standards-based grading. Computers & Education 2016; 103: 1–15. doi: 10.1016/j.compedu.2016.09.005.

11. Pushpa SK, Manjunath TN, Mrunal TV, et al. Class result prediction using machine learning. In: 2017 International Conference on Smart Technology for Smart Nation (SmartTechCon); 2017 Aug 17–19; Bengaluru, India. New York: IEEE; 2018. doi: 10.1109/SmartTechCon.2017.8358559.

12. Tuggener L, Amirian M, Rombach K, et al. Automated machine learning in practice: State of the art and recent results. In: 2019 6th Swiss Conference on Data Science (SDS); 2019 Jun 14; Bern, Switzerland. New York: IEEE; 2019. doi: 10.1109/SDS.2019.00-11.

13. Pavlyshenko B. Using stacking approaches for machine learning models. In: 2018 IEEE 2nd International Conference on Data Stream Mining and Processing (DSMP); 2018 Aug 21–25; Lviv, Ukraine. New York: IEEE; 2018. doi: 10.1109/DSMP.2018.8478522.

14. Xu J. An extended one-versus-rest support vector machine for multi-label classification. Neurocomputing 2011; 74(17): 3114–3124. doi: 10.1016/j.neucom.2011.04.024.

15. Trabelsi A, Elouedi Z, Lefevre E. Decision tree classifiers for evidential attribute values and class labels. Fuzzy Sets and Systems 2019; 366: 46–62. doi: 10.1016/j.fss.2018.11.006.

16. Rezaeijo SM, Abedi-Firouzjah R, Ghorvei M, Sarnameh S. Screening of COVID-19 based on the extracted radiomics features from chest CT images. Journal of X-Ray Science and Technology 2021; 29(2): .229–243. doi: 10.3233/XST-200831.

17. Churcher A, Ullah R, Ahmad J, et al. An experimental analysis of attack classification using machine learning in IoT networks. Sensors (Switzerland) 2021; 21(2): 446. doi: 10.3390/s21020446.

18. Akçapınar G, Altun A, Aşkar P. Using learning analytics to develop early-warning system for at-risk students. International Journal of Educational Technology in Higher Education 2019; 16: 40. doi: 10.1186/s41239-019-0172-z.

19. Hutagaol N, Suharjito. Predictive modelling of student dropout using ensemble classifier method in higher education. Advances in Science, Technology and Engineering Systems 2019; 4(4): 206–211. doi: 10.25046/aj040425.

20. Rohilla N, Rai M. Advance machine learning techniques used for detecting and classification of disease in plants: A review. In: 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N); 2021 Dec 17–18; Greater Noida, India. New York: IEEE; 2021. doi: 10.1109/ICAC3N53548.2021.9725616.

21. Saluja R, Rai M. Analysis of existing ML techniques for students success prediction. In: 2022 Seventh International Conference on Parallel, Distributed and Grid Computing (PDGC); 2022 Nov 25–27; Solan, Himachal Pradesh, India. New York: IEEE; 2022. p. 507–512. doi: 10.1109/PDGC56933.2022.10053236.

22. Naseer M, Zhang W, Zhu W. Prediction of coding intricacy in a software engineering team through machine learning to ensure cooperative learning and sustainable education. Sustainability (Switzerland) 2020; 12(21): 8986. doi: 10.3390/su12218986.

23. Wang S, Jiang L, Li C. Adapting naive Bayes tree for text classification. Knowledge and Information Systems 2015; 44: 77–89. doi: 10.1007/s10115-014-0746-y.

24. Hussain S, Khan MQ. Student-Performulator: Predicting students’ academic performance at secondary and intermediate level using machine learning. Annals of Data Science 2021; 10: 637–655. doi: 10.1007/s40745-021-00341-0.

25. Sorour SE, Goda K, Mine T. Evaluation of effectiveness of time-series comments by using machine learning techniques. Journal of Information Processing 2015; 23(6): 784–794. doi: 10.2197/ipsjjip.23.784.

26. Park HS, Yoo SJ. Early dropout prediction in online learning of university using machine learning. International Journal on Informatics Visualization 2021; 5(4): 347–353. doi: 10.30630/JOIV.5.4.732.

27. Singh M, Verma C, Kumar R, Juneja P. Towards enthusiasm prediction of Portuguese school’s students towards higher education in realtime. In: 2020 International Conference on Computation, Automation and Knowledge Management (ICCAKM); 2020 Jan 9–10; Dubai, United Arab Emirates. New York: IEEE; 2020. doi: 10.1109/ICCAKM46823.2020.9051459.

28. Burman I, Som S. Predicting students academic performance using Support Vector Machine. In: 2019 Amity International Conference on Artificial Intelligence (AICAI); 2019 Feb 4–6; Dubai, United Arab Emirates. New York: IEEE; 2019. doi: 10.1109/AICAI.2019.8701260.

29. Altabrawee H, Ali OAJ, Ajmi SQ. Predicting students’ performance using machine learning techniques. Journal of University of Babylon for Pure and Applied Sciences 2019; 27(1): 194–205.

30. Nti IK, Adekoya AF, Weyori BA. A comprehensive evaluation of ensemble learning for stock-market prediction. Journal of Big Data 2020; 7: 20. doi: 10.1186/s40537-020-00299-5.

31. Wibawa AS, Purwarianti A. Indonesian Named-entity Recognition for 15 classes using ensemble supervised learning. Procedia Computer Science 2016; 81: 221–228. doi: 10.1016/j.procs.2016.04.053.

32. Hu X, Zhang H, Mei H, et al. Landslide susceptibility mapping using the stacking ensemble machine learning method in Lushui, Southwest China. Applied Sciences 2020; 10(11): 4016. doi: 10.3390/app10114016.

33. Rahman M, Chen N, Elbeltagi A, et al. Application of stacking hybrid machine learning algorithms in delineating multi-type flooding in Bangladesh. Journal of Environmental Managemen 2021; 295: 113086. doi: 10.1016/j.jenvman.2021.113086.

34. Chung J, Teo J. Single classifier vs. ensemble machine learning approaches for mental health prediction. Brain Informatics 2023; 10: 1. doi: 10.1186/s40708-022-00180-6.

35. Smirani LK, Yamani HA, Menzli LJ, Boulahia JA. Using ensemble learning algorithms to predict student failure and enabling customized educational paths. Scientific Programming 2022; 2022: 3805235. doi: 10.1155/2022/3805235.

36. Barella VJ, Garcia LPF, de Souto MCP, et al. Assessing the data complexity of imbalanced datasets. Information Sciences 2021; 553: 83–109. doi: 10.1016/j.ins.2020.12.006.

37. Bej S, Davtyan N, Wolfien M, et al. LoRAS: An oversampling approach for imbalanced datasets. Machine Learning 2021; 110: 279–301. doi: 10.1007/s10994-020-05913-4.

38. Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research 2017; 18: 1–5.

39. Davagdorj K, Lee JS, Pham VH, Ryu KH. A comparative analysis of machine learning methods for class imbalance in a smoking cessation intervention. Applied Sciences 2020; 10(9): 3307. doi: 10.3390/app10093307.

40. Seo JH, Kim YH. Machine-learning approach to optimize smote ratio in class imbalance dataset for intrusion detection. Computational Intelligence and Neuroscience 2018; 2018: 9704672. doi: 10.1155/2018/9704672.

41. Ijaz MF, Alfian G, Syafrudin M, Rhee J. Hybrid Prediction Model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, Synthetic Minority Over Sampling Technique (SMOTE), and random forest. Applied Sciences 2018; 8(8): 1325. doi: 10.3390/app8081325.




DOI: https://doi.org/10.32629/jai.v6i1.583

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Rajan Saluja, Munishwar Rai, Rashmi Saluja

License URL: https://creativecommons.org/licenses/by-nc/4.0