Optimalisasi Penggerombolan Provinsi Berdasarkan Tujuan Pembangunan Berkelanjutan dengan Penggerombolan Berhierarki, K-Medoids, dan DBSCAN

Saputri, Ferista Wahyu

View/Open

Cover (322.5Kb)

Fulltext (3.004Mb)

Lampiran (96.34Kb)

Date

2024

Author

Saputri, Ferista Wahyu

Soleh, Agus Mohamad

Alamudi, Aam

Metadata

Show full item record

Abstract

Analisis gerombol merupakan teknik statistika untuk mengidentifikasi gerombol bermakna dari objek data multivariat berdasarkan ukuran kemiripan. Selain memilih ukuran kemiripan, ketepatan dalam menerapkan metode pembentukan gerombol juga menjadi hal fundamental. Karakteristik data dapat memengaruhi performa analisis gerombol. Data multidimensional dengan jumlah dimensi yang besar memiliki tantangan adanya curse of dimensionality. Salah satu cara mengatasi masalah tersebut adalah dengan melakukan reduksi dimensi. Penelitian ini mengevaluasi performa metode reduksi dimensi analisis komponen utama dan t-SNE, serta membandingkan tiga metode penggerombolan yaitu metode hierarki, metode partisi dengan algoritma K-Medoids, dan metode berbasis kepekatan dengan algoritma Density-Based Spatial Clustering of Applications with Noise (DBSCAN) pada data multidimensi, studi kasus penggerombolan provinsi berdasarkan tujuan pembangunan berkelanjutan (TPB). Evaluasi kinerja hasil penggerombolan didasarkan pada tiga indeks validitas. Hasil penelitian menunjukkan bahwa kombinasi t-SNE dengan algoritma hierarki pautan tunggal atau dengan DBSCAN menghasilkan gerombol paling optimal dengan nilai Silhouette tertinggi (0,56), indeks Davies-Bouldin terendah (0,47), dan indeks Calinski-Harabasz tertinggi (75,04) dibandingkan hasil penggerombolan algoritma lainnya. Penggerombolan optimal tersebut menghasilkan tujuh gerombol provinsi dengan karakteristik pencapaian TPB yang berbeda. Terdapat dua gerombol yang menunjukkan pencapaian TPB rendah, sehingga memerlukan perhatian ekstra sesuai dengan karakteristiknya masing-masing. Gerombol-gerombol tersebut terdiri atas provinsi di wilayah timur Indonesia.

Cluster analysis is a statistical technique used to identify meaningful clusters from multivariate data objects based on similarity measures. In addition to choosing the right similarity measures, the accuracy in applying clustering methods is also fundamental. The characteristics of the data can influence the performance of cluster analysis. Multidimensional data with many dimensions face the challenge known as the curse of dimensionality. Dimensionality reduction helps address this issue. This study evaluates the performance of two dimensionality reduction methods, principal component analysis and t-SNE, and compares three clustering methods: hierarchical clustering, K-Medoids, and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The study focuses on clustering provinces based on sustainable development goals (SDGs). Clustering performance is evaluated using three validity indices. Results indicate that combining t-SNE with the single-linkage hierarchical algorithm or DBSCAN produces the most optimal clusters. These methods achieve the highest Silhouette score (0.56), the lowest Davies-Bouldin index (0.47), and the highest Calinski-Harabasz index (75.04) compared to other algorithms. The optimal clustering results in seven distinct clusters of provinces with different SDGs achievement characteristics. Two clusters, comprising provinces in eastern Indonesia, show low SDGs achievement and require extra attention based on their specific characteristics.

URI

http://repository.ipb.ac.id/handle/123456789/153683

Collections

UT - Statistics and Data Sciences [2260]