banner

Research Chinese-Urdu Machine Translation Based on Deep Learning

zeshan ali ali

Abstract


Urdu is Pakistan 's national language. However, Chinese expertise is very negligible in Pakistan and the Asian nations. Yet fewer research has been undertaken in the area of computer translation on Chinese to Urdu. In order to solve the above problems, we designed of an electronic dictionary for Chinese-Urdu, and studied the sentence-level machine translation technology which is based on deep learning. The Design of an electronic dictionary Chinese-Urdu machine translation system we collected and constructed an electronic dictionary containing 24000 entries from Chinese to Urdu. For Sentence we used English as an intermediate language, and based on the existing parallel corpus of Chinese to English and English to Urdu, we constructed a bilingual parallel corpus containing 66000 sentences from Chinese to Urdu. The Corpus has trained by using two NMT Models (LSTM,Transformer Model) and the above two translation model were compared to the desired translation, with the help of bilingual valuation understudy (BLEU) score. On NMT, The LSTM Model is gain of 0.067 to 0.41 in BLEU score while on Transformer model, there is gain of 0.077 to 0.52 in BLEU which is better than from LSTM Model score. Furthermore, we compared the proposed model with Google and Microsoft translation.

Keywords


Chinese, Urdu, Neural Machine Translation, Deep Learning, Bilingual Electronic Dictionary.

Full Text:

PDF

References


1. ZubairAM, AurangzebK, ShakeelA,etal. A unified framework for creating domain dependent polarity lexicons from user generated reviews[J]. PLOSONE2015; 10(10): e0140204.

2. Badaro, Gilbert, Baly, et al. A large scale arabic sentiment lexicon for arabic opinion mining[C]. Acm Emnlp Workshop on Arabic Natural Language Processing. ACM2014.

3. Akshat Bakliwal, PiyushArora, Vasudeva Varma. Hindi subjectivel exicon: A lexical resource for Hindi polarity classification[J]. 2012.

4. DashtipourK, PoriaS, HussainA, et al. Multilingual sentiment analysis: State of the Art and independent comparison of techniques[J]. Cognitive Computation 2016; 8(4): 757-771.

5. Dehkharghani R, Saygin Y, Yanikoglu B,et al. Sentiturknet: A Turkish polarity lexicon for sentiment analysis[J]. Language Resources and Evaluation 2016; 50(3): 667-685.

6. MukhtarN, KhanMA. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis[J]. Cognitive Computation 2017; 9(2): 1-11.

7. ToriiY, DasD, Bandyopadhyay S, et al. Developing Japanese wordnet affect for analyzing emotions[C]. Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis. Association for Computational Linguistics 2011.

8. Antony PJ. Machine translation approaches and survey for Indian languages[J]. Chinese Journal of Computational Linguistics 2013;18(1): 47-78.

9. NalisnickE, Ravi S. Learning the dimensionality of word embeddings[J]. Computer Science 2015.

10. Sutskever I,Vinyals O, LeQV. Sequence to sequence learning with neural networks[J]. Advances in Neural Information Processing Systems 2014.

11. Bahdanau D, ChoK, Bengio Y. Neural machine translation by jointly learning to align and translate[J]. Computer Science 2014.

12. Bilal M, Israr H, Shahid M, et al. Sentiment classification of Roman-Urdu opinions using Navie Baysian, Decision tree and KNN classification techniques[J]. Journal of King Saud University Computer & Information Sciences 2015; 28(3): 330-344.

13. Alam M, Hussain SU. Sequence to sequence networks for Roman-Urdu to Urdu transliteration[J]. 2017.

14. Mukhtar N, Khan MA. Urdu sentiment analysis using supervised machine learnin gapproach [J]. International Journal of Pattern Recognition and Artificial Intelligence 2017.

15. Hussain SA, Zaman S, Ayub M. A selforganizing map based Urdu Nasakh character recognition[C]. International Conferenceon Emerging Technologies, IEEE2009.

16. Auli, Michael, Michel Galley, et al. Joint language and translation modeling with recurrent neuralnetworks. 2013.

17. Yang N, Liu S, Li M, et al. Word alignment modeling with context dependent deep neuralnetwork[C]. Meeting of the Association for Computational Linguistics 2013.

18. Mikolov T, Martin Karafiát, BurgetL, et al. Recurrent neural network based language model [C]. INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, September 26-30, 2010. DBLP2010.

19. PAPINENI, Blue S. A method for automatic evaluation of machine translation [C]. Meeting of the Association for Computational Linguistics. Association for Computational Linguistics 2002.

20. Baker, Paul, Hardie, et al. EMILLE: A67-million word corpus of Indiclanguages: Datacollection, mark-up and harmonization [J]. Blood 2002; 87(11): 4723-30.

21. Cohn T, Callison-Burch C, Lapata M. Constructing Corpora for the development and evaluation of paraphrase systems [J]. Computational Linguistics 2008; 34(4): 597-614.

22. Sennrich R, Haddow B, Birch A. Neural machine translation of rare words with subword units[J]. Computer Science 2015.

23. Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation [J]. Computer Science 2015.

24. Vaswani, Ashish, Noam Shazeer, et al. Attention is all you need. In Advances in Neural Information Processing Systems 2017; pp:5998-6008.

25. Senellart J, Zhang D, Wang B, et al. Opennmt systemdes cription for WNMT 2018: 800words/seconasingle-core CPU. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, Melbourne 2018.




DOI: https://doi.org/10.32629/jai.v3i2.279

Refbacks

  • There are currently no refbacks.


Copyright (c) 2021 zeshan ali ali

License URL: https://creativecommons.org/licenses/by-nc/4.0