Pendeteksian Anomali Data Jumlah Wisatawan Mancanegara di Indonesia dengan Metode Copula-Based Outlier Detection
Abstract
Anomali adalah pengamatan yang menyimpang dari pola historis dan dapat muncul dari hubungan ketergantungan antarpeubah. Copula-based outlier detection (COPOD) mengakomodasi beberapa peubah menggunakan distribusi marginal empiris dan struktur ketergantungan ekor untuk mengidentifikasi anomali. Penelitian ini bertujuan mendeteksi anomali data jumlah wisatawan mancanegara di Indonesia dengan mempertimbangkan peubah inflasi dan nilai tukar rupiah serta mengevaluasi penanganan anomali terhadap performa peramalan long short-term memory (LSTM). Data bulanan dari Januari 2000 hingga Desember 2025 yang diperoleh dari CEIC Data, Badan Pusat Statistik, dan Bank Indonesia digunakan dalam penelitian. Analisis mencakup eksplorasi data dan pembentukan feature engineering berdasarkan struktur data serta identifikasi lag, pendeteksian anomali menggunakan COPOD, dilanjutkan peramalan LSTM. Pendeteksian dilakukan berdasarkan dua belas peubah hasil feature engineering dan teridentifikasi sebelas periode sebagai anomali. Hasil peramalan menunjukkan akurasi lebih baik pada model sesudah penanganan dengan mean absolute percentage error, root mean square error, dan korelasi sebesar 7,494%, 99233, dan 0,864 pada data uji. Keefektifan tersebut tidak dapat dipisahkan dari ketepatan pendeteksian anomali. Penelitian lanjutan diharapkan dapat menambah peubah relevan dan mengembangkan feature engineering. Anomalies are observations that deviate from historical patterns and can arise from dependencies between variables. Copula-based outlier detection (COPOD) accommodates multiple variables using empirical marginal distributions and tail dependence structures to identify anomalies. This study aims to detect anomalies in the number of international tourists in Indonesia by considering the variables of inflation and the rupiah exchange rate, as well as to evaluate the handling of anomalies on the forecasting performance of long short-term memory (LSTM). Monthly data from January 2000 to December 2025 obtained from CEIC Data, Statistics Indonesia, and Bank Indonesia were used in the study. The analysis includes data exploration and the development of feature engineering based on the data structure and lag identification, anomaly detection using COPOD, followed by LSTM forecasting. Detection was carried out based on twelve variables resulting from feature engineering, and eleven periods were identified as anomalies. The forecasting results show better accuracy in the model after handling with a mean absolute percentage error, root mean square error, and correlation of 7.494%, 99233, and 0.864 on the test data. This effectiveness cannot be separated from the accuracy of anomaly detection. Further research is expected to add relevant variables and develop feature engineering.

