Statistical Machine Translation from Indonesian to Regional Languages in Indonesia

SOYUSIAWATY, DEWI and Miranda, Bella Okta Sari (2023) Statistical Machine Translation from Indonesian to Regional Languages in Indonesia. [Teaching Resource]

[thumbnail of doc_0903230129_84.pdf] Text
doc_0903230129_84.pdf

Download (1MB)
[thumbnail of doc_0903230129_84.pdf] Text
doc_0903230129_84.pdf

Download (1MB)

Abstract

The current condition in Indonesia has 617 regional languages.
There are 15 regional languages that are declared extinct and
139 others are in endangered status. Utilization of computer-
based tools can be used as an effort to preserve regional
languages digitally according to current technological
developments, including by building digital dictionaries and
translation machines. The digital dictionary has the ability to
translate regional languages into Indonesian with the approach
used is translating word for word, although it is not effective
when done manually. An alternative solution is to create a
machine translation application. Machine translation can be
dictionary-based or language-parallel corpus data-based.
Statistical Machine Translation (SMT) is a machine translation
approach with translation results generated on the basis of a
statistical model whose parameters are taken from the results
of a parallel corpus analysis. The quality of the SMT
translation results is influenced by several factors. The most
fundamental factor is the number of parallel corpus available
and the quality of the corpus used as the basis for building
translation models and language models. This study aims to
determine the role of parallel corpus in improving SMT
accuracy, especially in regional languages in Indonesia. The
research data used is parallel corpus text of 3000 pairs of
sentences. Based on the results of the research that has been
done, it is found that the optimization of parallel corpus can
increase the value of translation accuracy. Better translation
accuracy can be achieved with optimized parallel corpus.
Besides that, testing with single sentences will provide higher
accuracy than using compound sentences. Testing of 3000
random parallel corpus parallels can increase accuracy by
11.4%, higher than testing with 3000 random parallel corpus.

Item Type: Teaching Resource
Subjects: T Technology > T Technology (General) > T201 Patents. Trademarks
Divisi / Prodi: Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Informatics Engineering (S1-Teknik Informatika)
Depositing User: Mrs Dewi Soyusiawaty
Date Deposited: 29 Mar 2023 02:04
Last Modified: 29 Mar 2023 02:04
URI: http://eprints.uad.ac.id/id/eprint/41695

Actions (login required)

View Item View Item