Pemodelan Machine Learning terhadap Data Citra Satelit untuk Memprediksi Fase Pertumbuhan Padi

Tamara, Novian

Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/107246

Title:	Pemodelan Machine Learning terhadap Data Citra Satelit untuk Memprediksi Fase Pertumbuhan Padi
Other Titles:	Machine Learning Modeling of Satellite Imagery Data for Predicting Paddy Growth Phases
Authors:	Wigena, Aji Hamim Sartono, Bagus Tamara, Novian
Issue Date:	2021
Publisher:	IPB University
Abstract:	Padi memegang peranan penting bagi masyarakat Indonesia. Data padi yang akurat dapat membantu pemerintah dalam menyusun perencanaan, mengeksekusi program, dan membuat keputusan yang tepat. Saat BPS menyediakan data padi berdasarkan metode Kerangka Sampel Area (KSA), masih ada masalah di titik sampel KSA yang nilai pengamatannya tidak tercatat oleh petugas survei. Tujuan penelitian ini adalah untuk membangun model klasifikasi terbaik untuk menduga data KSA padi yang tidak terdata tersebut. Labelnya yaitu data KSA padi Kabupaten Serang tahun 2018 sedangkan fiturnya berasal dari citra satelit Landsat-8 dan Sentinel-2. Penelitian ini juga bertujuan untuk mengetahui fitur mana yang lebih baik antara berbasis kombinasi citra Landsat-8 dan Sentinel-2 atau berbasis citra Landsat-8 saja. Performa dari dua teknik machine learning, random forest dan Support Vector Machine (SVM), juga akan diperbandingkan. Pada penelitian ini, dibangun model klasifikasi 2 tahap. Model klasifikasi tahap 1 bertujuan untuk mengidentifikasi sawah padi. Kelasnya yaitu sawah padi, sawah bukan padi, dan bukan sawah. Model klasifikasi tahap 2 bertujuan untuk mengklasifikasikan fase pertumbuhan padi. Kelasnya terdiri atas vegetatif awal, vegetatif akhir, generatif, panen, persiapan lahan, dan puso. Data dibagi menjadi data uji dan data latih. Data latih adalah data dari bulan Januari sampai September 2018 dan Data uji adalah data dari bulan Oktober sampai Desember 2018. Permasalahan ketidakseimbangan kelas ditangani dengan teknik SMOTE+TL. Sebelum model klasifikasi terbentuk, 1239 fitur dibangkitkan pada feature engineering. Jumlah tersebut diperoleh dari agregasi 18 indeks spektral Sentinel-2 dengan 38 jenis agregasi dan 15 indeks spektral Landsat-8 dengan 37 jenis agregasi. Proses seleksi fitur pada klasifikasi 2 tahap saling bebas sehingga memungkinkan model klasifikasi tahap 1 dan tahap 2 memiliki fitur yang berbeda. Fitur dipilih menggunakan teknik stepwise forward selection and backward elimination yang memaksimalkan nilai MCC model. Indeks spektral pembentuk fitur tidak boleh ada yang saling berkorelasi kuat untuk menghindari overfitting pada model. Fitur terseleksi digunakan untuk membangun model RFL8, RFL8S2, SVML8, SVML8S2 baik pada klasifikasi tahap 1 maupun tahap 2. Inisial RF atau SVM pada nama model artinya model tersebut menggunakan metode random forest atau Support Vector Machine. Inisial L8 atau L8S2 artinya fiturnya dari Landsat-8 saja atau kombinasi antara Landsat-8 dan Sentinel-2. Model dengan MCC yang tertinggi dipilih sebagai model terbaik. Hasil penelitian ini menunjukkan bahwa model RFL8S2 merupakan model klasifikasi terbaik untuk tahap 1 dan tahap 2. Pada kasus ini, metode random forest mengungguli SVM pada 2 tahap klasifikasi tersebut. Penambahan fitur berbasis citra Sentinel-2 ke model klasifikasi fase pertumbuhan padi dengan fitur berbasis citra Landsat-8 meningkatkan MCC sebesar 6%. Pada klasifikasi tahap 1, performa model RFL8S2 dalam memprediksi sawah padi, sawah bukan padi, dan bukan sawah memiliki akurasi 0,95 dan MCC 0,84. Pada klasifikasi tahap 2, performa model RFL8S2 dalam memprediksi fase pertumbuhan padi memiliki akurasi 0,87 dan MCC 0,73. Namun, model masih kesulitan dalam memprediksi kelas persiapan lahan dan puso. Kesulitan tersebut disebabkan oleh kurangnya sampel di kelas-kelas tersebut dan kesamaan bentuk fisiknya dengan kelas lain. Hasil dari proses pengujian menunjukkan bahwa model klasifikasi 2 tahap RFL8S2 memiliki kinerja yang baik dalam mengestimasi kelas asli dari data KSA. Dengan akurasi global 0,84, model dapat memprediksi 84% data uji sesuai dengan kelas aktualnya. Sebagian besar kelas dapat dideteksi dengan baik. Model memiliki sensitivitas tinggi pada 4 kelas, sensitivitas sedang pada 3 kelas, dan sensitivitas rendah pada 2 kelas. Empat kelas dengan sensitivitas tinggi yang mudah dideteksi oleh model adalah kelas panen, sawah, bukan sawah, dan generatif. Terlepas dari kelebihannya, model klasifikasi yang dihasilkan dalam penelitian ini tidak dapat memprediksi fase pertumbuhan padi di suatu wilayah secara lengkap jika ada banyak data citra satelit yang tertutup awan. Paddy plays an important role in Indonesian society. An accurate paddy data can assist the government for planning, executing programs, and formulating the right decisions. While Statistics Indonesia does provide paddy data based on Area Sampling Frame (ASF) method, there still were issues in ASF sample points in which their observational values were not recorded by survey officers. The aim of this research was to build the best classification model to predict the unrecorded ASF paddy data. The label was ASF paddy data from Serang District in 2018 while the features were from Landsat-8 and Sentinel-2 satellite imagery. This research also aimed to find out which features were better based on a combination of Landsat-8 and Sentinel-2 imagery or based on Landsat-8 imagery alone. The performance of two machine learning techniques, random forest and Support Vector Machine (SVM), would also be compared. In this research, a 2-stage classification model was developed. Stage 1 classification model aimed to identify paddy fields. The classes were paddy field, non-paddy field, and non-crop field. Stage 2 classification model aimed to classify paddy growth phases. The classes comprised of early vegetative, late vegetative, generative, harvesting, land preparation, and crop failure. The data was divided into the testing data and the training data. The training data was data from January to September 2018 and the testing data was data from October to December 2018. The class imbalance problem was handled by SMOTE+TL technique. Before the classification model was formed, 1239 features were generated in the feature engineering. It was obtained from the aggregation of 18 Sentinel-2 spectral indices with 38 types of aggregation and 15 Landsat-8 spectral indices with 37 types of aggregation. The feature selection process in the 2-stage classification was independent so that it allowed the stage 1 and stage 2 classification models to have different features. The features were selected using stepwise forward selection and backward elimination methods technique which maximized the MCC value of the model. The feature-forming spectral indices should not have been strongly correlated to avoid model overfitting. Selected features were used to build RFL8, RFL8S2, SVML8, SVML8S2 models in both stage 1 and stage 2 classifications. The initials RF or SVM in the model name means that the model used a random forest or support vector machine method. The initials L8 or L8S2 means that the features were from Landsat-8 only or a combination of Landsat-8 and Sentinel-2. The model with the highest MCC was chosen as the best model. The results showed that the RFL8S2 model was the best classification model for stage 1 and stage 2. In this case, the random forest method outperformed SVM in the two classification stages. The addition of features based on Sentinel-2 imagery to the paddy growth phases classification model with features based on Landsat-8 imagery increased the MCC by 6%. In stage 1 classification, the performance of the RFL8S2 model in predicting paddy field, non-paddy field, and non-crop field had an accuracy of 0.95 and MCC of 0.84. In stage 2 classification, the performance of the RFL8S2 model in predicting the paddy growth phases had an accuracy of 0.87 and MCC of 0.73. However, the model still had difficulty in predicting the land preparation and crop failure classes. The difficulty was caused by the lack of samples in these classes and the similarity of their physical form to other class. The result of the testing process showed that the RFL8S2 2-stage classification model had a good performance in estimating the original classes of the ASF data. With a global accuracy of 0.84, the model could predict 84% of the testing data according to their actual classes. Most classes could be well detected. The model had high sensitivity in 4 classes, medium sensitivity in 3 classes, and low sensitivity in 2 classes. The four classes with high sensitivity that were easily detected by the model were harvest, rice field, non-rice field, and generative classes. Despite its advantages, the classification model produced in this study could not predict the paddy growth phases in an complete area if there was a lot of cloud-covered satellite imagery data.
URI:	http://repository.ipb.ac.id/handle/123456789/107246
Appears in Collections:	MT - Mathematics and Natural Science

Files in This Item:

File	Description	Size	Format
Cover.pdf Restricted Access	Cover	2.85 MB	Adobe PDF	View/Open
G152184474_Novian Tamara.pdf Restricted Access	Fullteks	9.03 MB	Adobe PDF	View/Open
Lampiran.pdf Restricted Access	Lampiran	762.96 kB	Adobe PDF	View/Open

Show full item record Recommend this item

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets