Perbandingan Metode Cluster Validity pada Jenis Data Numerik dan Kategorik
Abstract
Clustering is one of the important methods to determine the similarity of the objects set. Based on the similarity of these characteristics, it will form classes and get a pattern from a collection of unlabeled data. The resulting clusters cannot be ascertained if the accuracy has not been analyzed and tested using the method of cluster validity. This study implements the k-means clustering algorithm to classify three types of data: numerical, categorical, and a combination of numerical and categorical. The three data will be validated using the cluster validity methods: Dunn index, Hubert's statistic, and silhouette coefficient. Results of the three methods will be compared based on the three types of data used. Based on the result of the study, the data type used influences the cluster validation algorithm. The test result showed that Hubert's statistic can be used in the three types of data. Research involving bigger sized data is still needed to be able to conclude the most appropriate algorithm for the validation of numerical data, categorical, or a combination of both.
Collections
- UT - Computer Science [2236]