Wijaya, Kevin Usmayadhy and Setiawan, Erwin Budi (2023) Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 9 (3). pp. 619-631.
Text
10-Hate Speech Detection Using Convolutional Neural Network and Gated Recurrent Unit with FastText Feature Expansion on Twitter.pdf Download (909kB) |
Abstract
Twitter is a popular social media for sending text messages, but the tweets that can send are limited to 280 characters. Therefore, sending tweets is done in various ways, such as slang, abbreviations, or even reducing letters in words which can cause vocabulary mismatch so that the system considers words with the same meaning differently. Thus, using feature expansion to build a corpus of similarity can mitigate this problem. Two datasets constructed the similarity corpus: the Twitter dataset of 63,984 and the IndoNews dataset of 119,488. The research contribution is to combine deep learning and feature expansion with good performance. This study uses FastText as a feature expansion that focuses on word structure. Also, this study uses four deep learning methods: Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU), and a combination of the two CNN-GRU, GRU-CNN classification with boolean representation as feature extraction. This study uses five scenarios to find the best result: best data split, n-grams, max feature, feature expansion, and dropout percentage. In the final model, CNN has the best performance with an accuracy of 88.79% and an increase of 0.97% from the baseline model, followed by GRU with an accuracy of 88.17% with an increase of 0.93%, CNN-GRU with an accuracy of 87.47% with an increase of 1.86%, and GRU-CNN with an accuracy of 87.55% with an increase of 1.32%. Based on the result of several scenarios, the use of feature expansion using FastText succeeded in avoiding vocabulary mismatch, proven by the highest increase in accuracy of the model than other scenarios. However, this study has a limitation is that the dataset is used in Indonesian.
Item Type: | Artikel Umum |
---|---|
Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
Divisi / Prodi: | Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Electrical Engineering (S1-Teknik Elektro) |
Depositing User: | M.Eng. Alfian Ma'arif |
Date Deposited: | 08 Aug 2023 07:55 |
Last Modified: | 08 Aug 2023 07:55 |
URI: | http://eprints.uad.ac.id/id/eprint/43863 |
Actions (login required)
View Item |