Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification

Pratama, Aqsal Harris and Hayaty, Mardhiya (2023) Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 9 (1). pp. 74-84.

[thumbnail of 7-Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification.pdf] Text
7-Performance of Lexical Resource and Manual Labeling on Long Short-Term Memory Model for Text Classification.pdf

Download (657kB)

Abstract

Data labeling is a critical aspect of sentiment analysis that requires assigning labels to text data to reflect the sentiment expressed. Traditional methods of data labeling involve manual annotation by human annotators, which can be both time-consuming and costly when handling large volumes of text data. Automation of the data labeling process can be achieved through the utilization of lexicon resources, which consist of pre-labeled dictionaries or databases of words and phrases in sentiment information. The contribution of this study is an evaluation of the performance of lexicon resources in document labeling. The evaluation aims to provide insight into the accuracy of using lexicon resources and inform future research. In this study, a publicly available dataset was utilized and labeled as negative, neutral, and positive. To generate new labels, a lexicon resource such as VADER, AFINN, SentiWordNet, and Liu & Hu was employed. An LSTM model was then trained using the newly generated labels. The performance of the trained model was evaluated by testing it on data that had been manually labeled. The study found manual labeling led to highest accuracy of 0.79, 0.80, and 0.80 for training, validation, and testing respectively. This is likely due to manual creation of test data labels, enabling the model to learn and capture balanced patterns. Models using lexicon resources (VADER and AFINN) had lower accuracy of 0.54 and 0.56. SentiWordNet had lowest accuracy among all models with 0.49, and Liu&Hu model had the lowest testing score of 0.26. Our research indicates that lexicon resources alone are not sufficient for sentiment data labeling as they are dependent on pre-defined dictionaries and may not fully capture the context of words within a sentence, thus, manual labeling is necessary to complement lexicon-based methods to achieve better result.

Item Type: Artikel Umum
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisi / Prodi: Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Electrical Engineering (S1-Teknik Elektro)
Depositing User: M.Eng. Alfian Ma'arif
Date Deposited: 14 Mar 2023 08:34
Last Modified: 14 Mar 2023 08:34
URI: http://eprints.uad.ac.id/id/eprint/41364

Actions (login required)

View Item View Item