Classification of Imbalanced Data Represented as Binary Features

Mahmudah, Kunti Robiatul and Indriani, Fatma and Takemori-Sakai, Yukiko and Iwata, Yasunori and Wada, Takashi and Satou, Kenji (2021) Classification of Imbalanced Data Represented as Binary Features. [Artikel Dosen]

[thumbnail of Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features  classification mutation feature extraction oversampling.pdf] Text
Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features classification mutation feature extraction oversampling.pdf

Download (1MB)
[thumbnail of HASIL CEK - Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features  classification mutation feature extraction oversampling.pdf] Text
HASIL CEK - Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features classification mutation feature extraction oversampling.pdf

Download (3MB)
[thumbnail of Peer Review] Text (Peer Review)
peer review-Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features classification mutation feature extraction oversampling.pdf

Download (978kB)

Abstract

Typically, classification is conducted on a dataset that consists of numerical features and target classes. For instance, a grayscale image, which is usually represented as a matrix of integers varying from 0 to 255, enables one to apply various classification algorithms to image classification tasks. However, datasets represented as binary features cannot use many standard machine learning algorithms optimally, yet their amount is not negligible. On the other hand, oversampling algorithms such as synthetic minority oversampling technique (SMOTE) and its variants are often used if the dataset for classification is imbalanced. However, since SMOTE and its variants synthesize new minority samples based on the original samples, the diversity of the samples synthesized from binary features is highly limited due to the poor representation of original features. To solve this problem, a preprocessing approach is studied. By converting binary features into numerical ones using feature extraction methods, succeeding oversampling methods can fully display their potential in improving the classifiers’ performances. Through comprehensive experiments using benchmark datasets and real medical datasets, it was observed that a converted dataset consisting of numerical features is better for oversampling methods (maximum improvements of accuracy and F1-score were 35.11% and 42.17%, respectively). In addition, it is confirmed that feature extraction and oversampling synergistically contribute to the improvement of classification performance.

Item Type: Artikel Dosen
Additional Information: binary feature classification; mutation; feature extraction; oversampling
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisi / Prodi: Master (Magister) > Magister Pendidikan Matematika
Depositing User: S.Pd., M.S Kunti Robiatul Mahmudah
Date Deposited: 22 Nov 2022 06:37
Last Modified: 22 Nov 2022 06:37
URI: http://eprints.uad.ac.id/id/eprint/37567

Actions (login required)

View Item View Item