Modified Method of Selection Initial Centroid in K-Means Clustering
Sumertajaya, I Made
M Afendi, Farit
MetadataShow full item record
Cluster analysis is one of multivariate technique, clustering is a process of classifying object into groups which have similarity. The result of clustering will show that objects in one cluster will be more homogeneous than others. There are two methods in classic clustering analysis i.e. hierarchical clusters method and non-hierarchical cluster method. Determination of the number of clusters which formed by them is done subjectively or based on literature. One of the non-hierarchical cluster method is k-means clustering. Characteristics of k-means clustering is a good computational performance, but k-means clustering is very sensitive to select initial centroid because k-means method select initial centroid random from data so that the results of the k-means clustering are not unique. In this research, k-means method was compared with modified method of selection initial centroid in k-means. According to Sona and Sujatha (2013), the method of centroid selection is focused on improving performance of k-means clustering algorithm. Performance of modification method in selecting initial centroid will be compared in simulation data, and then apply the modified method in secondary data. The data in this research consist of two sources i.e. simulated data and secondary data. Simulated data were generated data multivariate normal distribution (μ,Ʃ) which useful to measure the performance of modified method of selection initial centroid in k-means and k-means method. Secondary data which used in this research, BPS’s data in Bengkulu province was village potential data in 2011. Simulation data were the generated data numeric type which consisted of three clusters, and each cluster consist of three variables. Simulation data is divided into three conditions i.e a). distance between each centroid near, b). distance between each centroid medium, c). distance between each centroid far. Every data condition applied in small (n=300), medium (n=900), and lagre number (n=1500). Modification method of selection initial centroid in k-means has better performance than k-means clustering method. It is based on number iterations is formed, there is no member of each clusters that switch positions (convergent). The number of iterations on the modified method of selection initial centroid in k-means method will increase if the variance from data is enhanced. The results of modified method of seletion initial centroid in k-means method is determined into three clusters. It is based on purpose of clustering by village potential data for view infrastructure and facilities in Bengkulu province. This clustering is villages with adequate infrastructure, villages with inadequate infrastructure, and villages with lack adequate infrastructure.