Adapting Google translate using dictionary and word embedding for Arabic-Indonesian cross-lingual information retrieval

Maryamah, Maryamah, Arifin, Agus Zainal, Sarno, Riyanarto and Hasan, Ahmad Makki (2021) Adapting Google translate using dictionary and word embedding for Arabic-Indonesian cross-lingual information retrieval. Presented at 2020 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), 27-28 Jan. 2021, Bali, Indonesia.

[img] Text
8509.pdf - Published Version
Restricted to Repository staff only

Download (441kB) | Request a copy

Abstract

The translation has an essential role in Cross-lingual Information Retrieval. Translation using a dictionary is reliable even though it has a limited vocabulary. Translation using google translate, in some cases, using different words used in document target words. The translation process causes word translation to be less accurate to get relevant documents. In this paper, we proposed a new translation approach by adapting google translate using a dictionary and word embedding in Arabic-Indonesian Cross-lingual Information Retrieval. The dictionary is the primary resource used for translation improved by Levenshtein distance and FastText for finding the correct word translation. Google translate is used to complete translation when the word does not exist in the dictionary resource. The proposed method archive a BLEU score of 0.47. This score is higher than the other comparison resource score. The proposed method successfully improves the translated query to retrieve more relevant documents in cross-lingual information retrieval based on this implementation.

Item Type: Conference (Paper)
Keywords: Cross-lingual information retrieval, Dictionary, Google Translate, Levenshtein distance, FastText
Subjects: 08 INFORMATION AND COMPUTING SCIENCES > 0801 Artificial Intelligence and Image Processing > 080101 Adaptive Agents and Intelligent Robotics
08 INFORMATION AND COMPUTING SCIENCES > 0801 Artificial Intelligence and Image Processing > 080107 Natural Language Processing
08 INFORMATION AND COMPUTING SCIENCES > 0801 Artificial Intelligence and Image Processing
Divisions: Faculty of Tarbiyah and Teaching Training > Department of Arabic Language Education
Depositing User: M.Pd Ahmad Makki Hasan
Date Deposited: 15 Jun 2021 23:17

Downloads

Downloads per month over past year

Origin of downloads

Actions (login required)

View Item View Item