Reducing Overfitting in Neural Networks for Text Classification Using Kaggle's IMDB Movie Reviews Dataset

Poningsih, Poningsih and Windarto, Agus Perdana and Alkhairi, Putrama (2024) Reducing Overfitting in Neural Networks for Text Classification Using Kaggle's IMDB Movie Reviews Dataset. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 10 (3). pp. 534-543.

[thumbnail of 3-Reducing Overfitting in Neural Networks for Text Classification Using Kaggle's IMDB Movie Reviews Dataset.pdf] Text
3-Reducing Overfitting in Neural Networks for Text Classification Using Kaggle's IMDB Movie Reviews Dataset.pdf

Download (803kB)

Abstract

Overfitting presents a significant challenge in developing text classification models using neural networks, as it occurs when models learn too much from the training data, including noise and specific details, resulting in poor performance on new, unseen data. This study addresses this issue by exploring overfitting reduction techniques to enhance the generalization of neural networks in text classification tasks using the IMDB movie review dataset from Kaggle. The research aims to provide insights into effective methods to reduce overfitting, thereby improving the performance and reliability of text classification models in practical applications. The methodology involves developing two LSTM neural network models: a standard model without overfitting reduction techniques and an enhanced model incorporating dropout and early stopping. The IMDB dataset is preprocessed to convert reviews into sequences suitable for input into the LSTM models. Both models are trained, and their performances are compared using various metrics. The model without overfitting reduction techniques shows a test loss of 0.4724 and a test accuracy of 86.81%. Its precision, recall, and F1-score for classifying negative reviews are 0.91, 0.82, and 0.86, respectively, and for positive reviews are 0.84, 0.92, and 0.87. The enhanced model, incorporating dropout and early stopping, demonstrates improved performance with a lower test loss of 0.2807 and a higher test accuracy of 88.61%. For negative reviews, its precision, recall, and F1-score are 0.92, 0.84, and 0.88, and for positive reviews are 0.86, 0.93, and 0.89. Overall, the enhanced model achieves better metrics, with an accuracy of 89%, and macro and weighted averages for precision, recall, and F1-score all at 0.89. The applying overfitting reduction techniques significantly enhances the model's performance.

Item Type: Artikel Umum
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisi / Prodi: Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Electrical Engineering (S1-Teknik Elektro)
Depositing User: M.Eng. Alfian Ma'arif
Date Deposited: 06 Jan 2025 04:59
Last Modified: 06 Jan 2025 04:59
URI: http://eprints.uad.ac.id/id/eprint/78125

Actions (login required)

View Item View Item