Perbandingan Kinerja Best Subset-Bayesian Model Averaging dan Metode Regularisasi untuk Prediksi Hotspot di Kalimantan
Abstract
Luas wilayah hutan di Kalimantan terus berkurang akibat terjadinya kebakaran hutan dan lahan (karhutla). Penelitian-penelitian sebelumnya menunjukkan bahwa pencegahan karhutla di Kalimantan dapat dilakukan dengan memprediksi jumlah hotspot menggunakan machine learning berdasarkan indikator iklim. Penelitian ini bertujuan untuk membangun model regresi regularisasi dan Bayesian Model Averaging (BMA) berbasis model regresi polinomial berdasarkan hasil best subset selection. Model terbaik pada penelitian ini dipilih berdasarkan metrik evaluasi RMSE dan R^2. Selain itu, pengaruh normalisasi dan standardisasi juga diukur terhadap model yang dibangun. Hasilnya diperoleh kombinasi enam variabel prediktor terbaik dan diperoleh model terbaik yaitu BMA yang merupakan gabungan dari enam model regresi polinomial. Selain itu, penerapan normalisasi dan standardisasi data mempengaruhi nilai parameter seluruh model, tetapi hanya sedikit mempengaruhi hasil pengujian model. Dengan demikian, model BMA dengan data asli lebih dipilih karena nilai koefisiennya lebih mendekati nol dengan performa yang tidak jauh berbeda. The area of forests in Kalimantan continues decreasing due to forest and land fires. Previous studies have shown that the prevention of forest and land fires in Kalimantan can be implemented by predicting the number of hotspots using machine learning based on climate indicators. This study aims to build regularized regression models and Bayesian Model Averaging (BMA) based on polynomial regression models based on best subset selection. The best model in this study is assessed by using the RMSE and R^2 evaluation metrics. In addition, the effect of normalization and standardization is also evaluated on the constructed models. The results shows that a combination of the six best predictor variables is the best and it also shows that the best model is BMA which is a combination of the six polynomial regression models. In addition, the application of data normalization and standardization affects the parameter values of all models, but slightly affects the model testing results. Therefore, the BMA model with original data is more preferable because its coefficients are much closer to zero with similar performance.
Collections
- UT - Mathematics [1434]