Kajian Metode Penggerombolan Dua Tahap untuk Data yang Mengandung Pencilan
Abstract
Cluster analysis is often encountered in various studies. Analysis of classical clusters, such as hierarchical clustering method and k-means clustering cannot handle categorical variables or a mixture of numerical and categorical. In addition, the determination of the optimal number of clusters are still dependent on the subjectivity of the researcher and cannot handle very large datasets, which is larger than 500. One approach to addressing this problem is to use a two-step clustering method. The accuracy of the two-step clustering method of predicting the number of clusters generated as well as the classification of cluster membership, especially in the data containing outliers is important to be studied. Outliers in the data containing a small (1%), this method provides more accurate compared with the results of data containing a large outliers (5% or 15%). Scale use of outliers handling in the data containing outliers must be greater than the amount of outliers itself. Two-step clustering method is very accurate in producing a number of clusters associated with the actual number of population clusters that do not contain data outliers, especially in the most variable of type numeric and categorical rest. Clustering villages in Indonesia by a factor of progress and backwardness villages using a two-step clustering method generates optimal cluster 7.