Mahmudah, Kunti Robiatul and Indriani, Fatma and Takemori-Sakai, Yukiko and Iwata, Yasunori and Wada, Takashi and Satou, Kenji (2021) Classification of Imbalanced Data Represented as Binary Features. [Artikel Dosen]
Text
Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features classification mutation feature extraction oversampling.pdf Download (1MB) |
|
Text
HASIL CEK - Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features classification mutation feature extraction oversampling.pdf Download (3MB) |
|
Text (Peer Review)
peer review-Mahmudah Indriani Takemori-Sakai Iwata Wada Satou-binary features classification mutation feature extraction oversampling.pdf Download (978kB) |
Abstract
Typically, classification is conducted on a dataset that consists of numerical features and target classes. For instance, a grayscale image, which is usually represented as a matrix of integers varying from 0 to 255, enables one to apply various classification algorithms to image classification tasks. However, datasets represented as binary features cannot use many standard machine learning algorithms optimally, yet their amount is not negligible. On the other hand, oversampling algorithms such as synthetic minority oversampling technique (SMOTE) and its variants are often used if the dataset for classification is imbalanced. However, since SMOTE and its variants synthesize new minority samples based on the original samples, the diversity of the samples synthesized from binary features is highly limited due to the poor representation of original features. To solve this problem, a preprocessing approach is studied. By converting binary features into numerical ones using feature extraction methods, succeeding oversampling methods can fully display their potential in improving the classifiers’ performances. Through comprehensive experiments using benchmark datasets and real medical datasets, it was observed that a converted dataset consisting of numerical features is better for oversampling methods (maximum improvements of accuracy and F1-score were 35.11% and 42.17%, respectively). In addition, it is confirmed that feature extraction and oversampling synergistically contribute to the improvement of classification performance.
Item Type: | Artikel Dosen |
---|---|
Additional Information: | binary feature classification; mutation; feature extraction; oversampling |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisi / Prodi: | Master (Magister) > Magister Pendidikan Matematika |
Depositing User: | S.Pd., M.S Kunti Robiatul Mahmudah |
Date Deposited: | 22 Nov 2022 06:37 |
Last Modified: | 22 Nov 2022 06:37 |
URI: | http://eprints.uad.ac.id/id/eprint/37567 |
Actions (login required)
View Item |