Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets

Misdram, Muhammad and Noersasongko, Edi and Purwanto, Purwanto and Muljono, Muljono and Pamuji, Fandi Yulian (2023) Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 9 (4). pp. 973-982.

[thumbnail of 8-Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets .pdf] Text
8-Gaussian Based-SMOTE Method for Handling Imbalanced Small Datasets .pdf

Download (579kB)

Abstract

The problem of dataset imbalance needs special handling, because it often creates obstacles to the classification process. A very important problem in classification is to overcome a decrease in classification performance. There have been many published researches on the topic of overcoming dataset imbalances, but the results are still unsatisfactory. This is proven by the results of the average accuracy increase which is still not significant. There are several common methods that can be used to deal with dataset imbalances. For example, oversampling, undersampling, Synthetic Minority Oversampling Technique (SMOTE), Borderline-SMOTE, Adasyn, Cluster-SMOTE methods. These methods in testing the results of the classification accuracy average are still relatively low. In this research the selected dataset is a medical dataset which is classified as a small dataset of less than 200 records. The proposed method is Gaussian Based-SMOTE which is expected to work in a normal distribution and can determine excess samples for minority classes. The Gaussian Based-SMOTE method is a contribution of this research and can produce better accuracy than the previous research. The way the Gaussian Based-SMOTE method works is to start by determining the random location of synthesis candidates, determining the Gaussian distribution. The results of these two methods are substituted to produce perfect synthetic values. Generated synthetic values are combined with SMOTE sampling of the majority data from the training data, produce balanced data. The result of the balanced data classification trial from the influence of the Gaussian Based SMOTE result in a significant increase in accuracy values of 3% on average.

Item Type: Artikel Umum
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisi / Prodi: Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Electrical Engineering (S1-Teknik Elektro)
Depositing User: M.Eng. Alfian Ma'arif
Date Deposited: 18 Oct 2023 06:13
Last Modified: 18 Oct 2023 06:13
URI: http://eprints.uad.ac.id/id/eprint/50862

Actions (login required)

View Item View Item