banner

Marathi text summarization through NLP and deep learning mechanism

Sunil D. Kale, Parikshit N. Mahalle, Renu Kachhoria, Santosh Kumar, Prasad Chaudhari, Vivek D. Patil

Abstract


Every day, an ever-increasing amount of people gain access to the internet platform. This has proven to be efficient in creating cost-effective internet platform deployments and applications. The growth in the amount of people using the platform has resulted in a rise in the quantity of information accessible on the internet in the form of news, media, and other forms of communication. This causes evaluating and comprehending a significant amount of textual information a very challenging task. For the objective of generating textual summaries for Marathi texts, an effective and trustworthy approach is required. Through the use of machine learning methods, a successful strategy for extracting summary for the Marathi text has been generated for this objective. To obtain the Marathi text summary, the proposed method uses feature extraction as well as deep belief networks and decision tree methodologies. The experimentation was carried out on the performance of the Term Frequency-Inverse Document Frequency (TF-IDF) in the stopword elimination procedure, along with the evaluation of the summarization outcome which achieves a Mean Absolute Error (MAE) of 2.8 for the stopword removal approach through TF-IDF technique and a precision of 95.49% with an accuracy of 92.76%.


Keywords


natural language processing; TF-IDF; deep belief network; decision tree

Full Text:

PDF

References


1. Madhuri JN, Kumar RG. Extractive text summarization using sentence ranking. In: Proceedings of the 2019 International Conference on Data Science and Communication (IconDSC); 1–2 March 2019; Bangalore, India. pp. 1–3.

2. Agrawal K. Legal case summarization: An application for text summarization. In: Proceedings of the 2020 International Conference on Computer Communication and Informatics (ICCCI); 22–24 January 2020; Coimbatore, India. pp. 1–6.

3. Ren S, Guo K. Text summarization model of combining global gated unit and copy mechanism. In: Proceedings of the 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS); 18–20 October 2019; Beijing, China. pp. 390–393.

4. Abujar S, Masum AKM, Mohibullah M, et al. An approach for Bengali text summarization using Word2Vector. In: Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT); 6–8 July 2019; Kanpur, India. pp. 1–5.

5. Jadhav A, Jain R, Fernandes S, Shaikh S. Text summarization using neural networks. In: Proceedings of the 2019 International Conference on Advances in Computing, Communication and Control (ICAC3); 20–21 December 2019; Mumbai, India. pp. 1–6.

6. Kale SD, Prasad RS. Influence of language-specific features for author identification on Indian literature in Marathi. In: Soft Computing and Signal Processings. Springer, Singapore; 2020. pp. 639–652.

7. Abuobieda A, Osman AH. An adaptive normalized google distance similarity measure for extractive text summarization. In: Proceedings of the 2020 2nd International Conference on Computer and Information Sciences (ICCIS); 13–15 October 2020; Sakaka, Saudi Arabia. pp. 1–4.

8. Masum AKM, Abujar S, Tusher RTH, et al. Sentence similarity measurement for Bengali abstractive text summarization. In: Proceedings of the 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT); 6–8 July 2019; Kanpur, India. pp. 1–5.

9. Boorugu R, Ramesh G. A survey on NLP based text summarization for summarizing product reviews. In: Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA); 15–17 July 2020; Coimbatore, India. pp. 352–356.

10. Kale S, Prasad R. Author identification on imbalanced class dataset of Indian literature in Marathi. International Journal of Computer Sciences and Engineering 2018; 6(11): 542–547.

11. Talukder MAI, Abujar S, Masum AKM, et al. Comparative study on abstractive text summarization. In: Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies. (ICCCNT); 1–3 July 2020; Kharagpur, India. pp. 1–4.

12. Janjanam P, Reddy CP. Text summarization: An essential study. In: Proceedings of the 2019 International Conference on Computational Intelligence in Data Science (ICCIDS); 21–23 February 2019; Chennai, India. pp. 1–6.

13. Tandel J, Mistree K, Shah P. A review on neural network based abstractive text summarization models. In: Proceedings of the 2019 IEEE 5th International Conference for Convergence in Technology (I2CT); 29–31 March 2019; Bombay, India. pp. 1–4.

14. Amidwar S, Baxi S, Rao K, Kale S. Text analysis for author identification using machine learning. Journal of Emerging Technologies and Innovative Research 2017; 4(6): 138–141.

15. Alfarra MR, Alfarra AM, Alattar JM. Graph-based fuzzy logic for extractive text summarization (GFLES). In: Proceedings of the 2019 International Conference on Promising Electronic Technologies (ICPET); 23–24 October 2019; Gaza, Palestine. pp. 96–101.

16. Hanunggul PM, Suyanto S. The impact of local attention in LSTM for abstractive text summarization. In: Proceedings of the 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI); 5–6 December 2019; Yogyakarta, Indonesia. pp. 54–57.

17. Kale SD, Prasad R, Potdar GP, et al. A comprehensive review of sentiment analysis on Indian regional languages: Techniques, challenges, and trends. International Journal on Recent and Innovation Trends in Computing and Communication 2023; 11(9s): 93–110. doi: 10.17762/ijritcc.v11i9s.7401

18. Digamberrao KS, Prasad RS. Author identification on literature in different languages: A systematic survey. In: Proceedings of the 2018 International Conference on Advances in Communication and Computing Technology (ICACCT); 8–9 February 2018; Sangamner, India. pp. 174–181.

19. Digamberrao KS, Prasad RS. Author identification using sequential minimal optimization with rule based decision tree on Indian literature in Marathi. Procedia Computer Science 2018; 132: 1086–1101. doi: 10.1016/j.procs.2018.05.024

20. Kale SD, Prasad RS. A systematic review on author identification methods. International Journal of Rough Sets and Data Analysis (IJRSDA) 2017; 4(2): 81–91. DOI: 10.4018/IJRSDA.2017040106

21. Geetha JK, Deepamala N. Kannada text summarization using latent semantic analysis. In: Proceedings of the 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 10–13 August 2015; Kochi, India. pp. 1508–1512.




DOI: https://doi.org/10.32629/jai.v6i3.1009

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Sunil D. Kale, Parikshit N. Mahalle, Renu Kachhoria, Santosh Kumar, Prasad Chaudhari, Vivek D. Patil

License URL: https://creativecommons.org/licenses/by-nc/4.0