Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/169556
Title: Model Prediksi Spasial-Temporal Konsentrasi PM2.5 Menggunakan Stacking Regressor dan Ensemble Learning di Provinsi Riau
Other Titles: Spatial-Temporal Model for Predicting PM2.5 Concentrations Using Stacking Regressor and Ensemble Learning in Riau Province
Authors: Sitanggang, Imas Sukaesih
Syaufina, Lailan
Jaya, I Nengah Surati
Unik, Mitra
Issue Date: 2025
Publisher: IPB University
Abstract: Penelitian ini bertujuan mengembangkan model spasial-temporal prediksi konsentrasi partikel halus PM2.5 di Provinsi Riau, Indonesia, melalui pendekatan ensemble learning berbasis stacking regressor. Model dirancang untuk memberikan estimasi akurat pada wilayah yang rentan terhadap kebakaran hutan dan lahan gambut, dengan keterbatasan infrastruktur pemantauan kualitas udara. Sumber data terdiri atas: (1) pengukuran in situ dari Stasiun Pemantau Kualitas Udara BMKG Pekanbaru (PM2.5 serta parameter meteorologi: suhu, tekanan udara, kelembaban, curah hujan, kecepatan angin); dan (2) data satelit MODIS melalui Google Earth Engine (GEE) yang memuat NDVI, LST, ET, dan AOD. Dataset berisi 171 variabel dengan resolusi spasial 1 km2 dan temporal harian untuk periode 1 Maret 2022–29 Februari 2024. Integrasi dilakukan melalui transformasi sistem proyeksi dan resampling, diikuti rekayasa fitur (feature engineering) mencakup lag features serta rolling window guna menangkap dinamika temporal PM2.5. Dua skenario disusun untuk menguji sensitivitas spasial, yaitu pendekatan teragregasi menggunakan nilai rata-rata parameter spasial dan pendekatan spasial rinci untuk menangkap variasi lokal. Tujuh algoritma dasar dievaluasi pada masing- masing skenario. Tiga model terbaik Random Forest, Gradient Boosting, dan XGBoost ditetapkan sebagai base learners dalam arsitektur stacking regressor dengan RidgeCV sebagai meta-learner. Seleksi variabel menggunakan feature importance; hyperparameter tuning dilakukan melalui grid search. Pada evaluasi akhir, stacking regressor mencapai R2 = 0,84, MAE = 4,70 µg/m3, dan MSE = 35,53 (µg/m3)2. Validasi eksternal menggunakan data 1 Maret 2024–31 Maret 2025 menunjukkan kemampuan generalisasi, termasuk untuk kategori “Tidak Sehat” (55,4–150,4 µg/m3). Model juga diuji pada skema gridding beresolusi 1 km × 1 km dalam radius 30 km dari Stasiun BMKG Pekanbaru, mencakup empat wilayah administratif (Kota Pekanbaru, Kabupaten Kampar, Kabupaten Siak, Kabupaten Pelalawan) dan terdiri atas 2.927 sel grid menggunakan proyeksi UTM Zone 47N. Pada konfigurasi ini, stacking dengan RidgeCV sebagai meta-learner menunjukkan performa terbaik dibanding model individu dengan MAE = 1,82 µg/m3 dan MSE = 5,01 (µg/m3)2, sekaligus menghasilkan peta distribusi PM2.5 yang representatif terhadap kondisi spasial lapangan. Kontribusi kebaruan penelitian meliputi integrasi dataset multisensor komprehensif (171 variabel) sebagai representasi terlengkap di wilayah studi; rekayasa fitur inovatif (lag features, rolling window) untuk menangkap interaksi non-linear faktor lingkungan; serta arsitektur stacking regressor dengan meta- learner RidgeCV yang meningkatkan akurasi sekaligus mengurangi overfitting. Temuan penelitian memberikan landasan ilmiah untuk pengembangan sistem peringatan dini berbasis lokasi dan kebijakan mitigasi polusi udara di wilayah rawan kebakaran seperti Riau. Rekomendasi pengembangan mencakup integrasi data multisensor tambahan, implementasi transfer learning untuk wilayah lain, dan penerapan edge computing untuk aplikasi real-time.
This study aims to develop a spatial-temporal model for predicting PM2.5 concentrations in Riau Province, Indonesia, using an ensemble learning approach based on stacking regressors. The model is designed to provide accurate estimates in areas prone to forest and peatland fires, where air quality monitoring infrastructure is limited. The data sources consist of: (1) in situ measurements from the BMKG Pekanbaru Air Quality Monitoring Station (PM2.5 and meteorological parameters: temperature, air pressure, humidity, rainfall, wind speed); and (2) MODIS satellite data via Google Earth Engine (GEE) containing NDVI, LST, ET, and AOD. The dataset contains 171 variables with a spatial resolution of 1 km2 and daily temporal resolution for the period 1 March 2022–29 February 2024. Integration was performed through projection system transformation and resampling, followed by feature engineering, including lag features and rolling windows to capture the temporal dynamics of PM2.5. Two scenarios were designed to test spatial sensitivity: an aggregated approach using the average values of spatial parameters and a detailed spatial approach to capture local variations. Seven basic algorithms were evaluated in each scenario. The three best models—Random Forest, Gradient Boosting, and XGBoost—were set as base learners in the stacking regressor architecture with RidgeCV as the meta-learner. Variable selection was performed using feature importance; hyperparameter tuning was conducted via grid search. In the final evaluation, the stacking regressor achieved R2 = 0.84, MAE = 4.70 µg/m3, and MSE = 35.53 (µg/m3)2. External validation using data from 1 March 2024 to 31 March 2025 demonstrated generalisation capability, including for the ‘Unhealthy’ category (55.4–150.4 µg/m3). The model was also tested on a 1 km × 1 km gridding scheme within a 30 km radius of the Pekanbaru BMKG Station, covering four administrative regions (Pekanbaru City, Kampar District, Siak District, Pelalawan District) and consisting of 2,927 grid cells using the UTM Zone 47N projection. In this configuration, stacking with RidgeCV as the meta-learner demonstrated the best performance compared to individual models with MAE = 1.82 µg/m3 and MSE = 5.01 (µg/m3)2, while also producing a PM2.5 distribution map representative of the spatial conditions on the ground. The novelty contributions of this research include the integration of a comprehensive multi-sensor dataset (171 variables) as the most complete representation in the study area; innovative feature engineering (lag features, rolling window) to capture non-linear interactions of environmental factors; and a stacking regressor architecture with the RidgeCV meta-learner, which improves accuracy while reducing overfitting. The research findings provide a scientific basis for the development of location-based early warning systems and air pollution mitigation policies in fire-prone areas such as Riau. Development recommendations include the integration of additional multisensor data, the implementation of transfer learning for other regions, and the application of edge computing for real-time applications.
URI: http://repository.ipb.ac.id/handle/123456789/169556
Appears in Collections:DT - School of Data Science, Mathematic and Informatics

Files in This Item:
File Description SizeFormat 
cover_G661190231_7ab238317448458aaa64a57e1ba26de9.pdfCover523.63 kBAdobe PDFView/Open
fulltext_G661190231_0b845f7a60174852bc83237a7e472de7.pdf
  Restricted Access
Fulltext4.02 MBAdobe PDFView/Open
lampiran_G661190231_e9658263e7f7437695e3f4debc1b7163.pdf
  Restricted Access
Lampiran1.53 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.