K-means clustering based filter feature selection on high dimensional data

ISMI, DEWI PRAMUDI and Panchoo, Shireen and Murinto, Murinto (2016) K-means clustering based filter feature selection on high dimensional data. [Artikel Dosen]

[thumbnail of HASIL CEK_Murinto_K-means clustering based filter feature selection on high dimensional data.pdf] Text
HASIL CEK_Murinto_K-means clustering based filter feature selection on high dimensional data.pdf

Download (1MB)

Abstract

With hundreds or thousands of features in high dimensional data,
computational workload is challenging. In classification process,
features which do not contribute significantly to prediction of
classes, add to the computational workload. Therefore the aim of this
paper is to use feature selection to decrease the computation load by
reducing the size of high dimensional data. Selecting subsets of
features which represent all features were used. Hence the process is
two-fold; discarding irrelevant data and choosing one feature that
representing a number of redundant features. There have been many
studies regarding feature selection, for example backward feature
selection and forward feature selection. In this study, a k-means
clustering based feature selection is proposed. It is assumed that
redundant features are located in the same cluster, whereas irrelevant
features do not belong to any clusters. In this research, two different
high dimensional datasets are used: 1) the Human Activity Recognition Using Smartphones (HAR) Dataset, containing 7352 data points each of 561 features and 2) the National Classification of Economic Activities Dataset, which contains 1080 data points each of 857 features. Both datasets provide class label information of each data point. Our experiment shows that k-means clustering based feature selection can be performed to produce subset of features. The latter returns more than 80% accuracy of classification result.

Item Type: Artikel Dosen
Keyword: feature selection, dimensionality reduction, clustering, k-means clustering, classification, high dimensional data
Subjects: Q Science > Q Science (General)
Divisi / Prodi: Faculty of Industrial Technology (Fakultas Teknologi Industri) > S1-Informatics Engineering (S1-Teknik Informatika)
Depositing User: murinto murinto
Date Deposited: 18 Apr 2023 09:44
Last Modified: 23 Sep 2023 05:49
URI: http://eprints.uad.ac.id/id/eprint/42972

Actions (login required)

View Item View Item