Integrasi Metode Knn Dan Svd Untuk Penanganan Missing Value Pada Data Curah Hujan
Abstract
Dalam melaksanakan kebijakan lingkungan seperti penangulangan banjir, pengelolaan sumber daya air dan lainnya data curah hujan diperlukan. Hasil analisis menggunakan Data curah hujan yang ada di stasiun kurang akurat karena sering memiliki masalah missing value dikarenakan berbagai faktor diantaranya human error dan machine error. Penelitian ini bertujuan untuk mengatasi masalah missing value pada data curah hujan di jawa barat. Sampel pada penelitian ini adalah lima stasiun BMKG yang ada di jawa barat. Metode integrasi digunakan untuk mengatasi keterbatasan dari metode tunggal. Metode K-Nearest Neighbors (KNN) dan Singular value decomposition (SVD) dipilih pada dataset ini. Data dibagi menjadi data training dan testing dengan proporsi 95:5%, 90:10%, 80:20%, 70:30%, dan 64:40%. Berdasarkan hasil analisis data, metode integrasi KNN-SVD lebih baik dibandingkan metode KNN dan SVD tunggal. Hasil MAE dan RMSE pada proporsi 95:5% lebih kecil dibandingkan proporsi lainnya. Metode integrasi KNN-SVD memberikan nilai MAE dan RMSE yang paling kecil berturut-turut 7,35 dan 13,22. Imputasi missing value menggunakan model integrasi KNN-SVD dengan metode weight linear combination memberikan hasil lebih baik dibandingkan model imputasi tunggal. Rainfall data is essential to implement environmental policies such as flood mitigation and water resource management. The rainfall data used in this study were obtained from five BMKG observation stations in West Java. Using station-based data provides better analytical accuracy because the data are actual and directly measured. However, the analysis results may be affected by missing values caused by various factors, including human error and machine malfunction. This study aims to address the issue of missing values in rainfall data across West Java by applying an integrated method designed to overcome the limitations of single techniques such as K-Nearest Neighbors (KNN) and Singular Value Decomposition (SVD). The dataset was divided into training and testing subsets with proportions of 95:5%, 90:10%, 80:20%, 70:30%, and 64:40%. Based on the analysis results, the integrated KNN–SVD method outperformed both the standalone KNN and SVD methods. The Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for the 95:5% data split were smaller than those of other proportions, with the integrated KNN–SVD method achieving the lowest MAE and RMSE values of 7.35 and 13.22, respectively. The imputation of missing values using the integrated KNN–SVD model with the Weighted Linear Combination (WLC) approach provided better performance compared to single-model imputations.
