Perbandingan Metode Regresi Logistik dan Random Forest dalam Memprediksi Churn pada Pemegang Polis Asuransi Mobil
Abstract
Penelitian ini membandingkan regresi logistik dan random forest dalam
memprediksi churn pada pemegang polis asuransi mobil dengan teknik
undersampling proporsi 30%, 50%, dan 70%. Model random forest setelah
undersampling 30% menunjukkan akurasi (99,07%) dan AUC tertinggi (99,91%),
sementara sensitivitas tertinggi (98,85%) pada model random forest sebelum
undersampling. F1-score tertinggi (99,29%) diperoleh dari model random forest
setelah undersampling 70%. Variabel paling penting yang memengaruhi keputusan
pemegang polis untuk melakukan churn dalam model regresi logistik berdasarkan
nilai odds ratio adalah status pekerjaan (pensiun), jumlah pembayaran bulanan
pemegang polis asuransi (kategori 500-600), dan tingkat pendidikan tertinggi
pemegang polis asuransi (doktor). Pada model random forest, variabel paling
penting berdasarkan nilai mean decrease gini adalah tipe penawaran perpanjangan
polis, status pekerjaan, dan tingkat pendidikan terakhir pemegang polis asuransi.
Berdasarkan analisis nilai SHAP, variabel seperti status pekerjaan (pensiun), tipe
penawaran perpanjangan polis (tipe 4), dan jumlah pembayaran per bulan (500-600)
secara konsisten meningkatkan peluang churn. This study compares logistic regression and random forest models in predicting
churn among car insurance policyholders using undersampling techniques at
proportions of 30%, 50%, and 70%. The random forest model with 30%
undersampling achieved the highest accuracy (99.07%) and AUC (99.91%), while
the highest sensitivity (98.85%) was observed in the random forest model before
undersampling. The highest F1-score (99.29%) was obtained from the random
forest model with 70% undersampling. The most influential variables affecting
policyholders churn decisions in the logistic regression model, based on odds ratio
values, were employment status (retired), monthly insurance payment amount
(category 500–600), and highest level of education (doctoral degree). In the random
forest model, the most important variables based on mean decrease in Gini were
type of policy renewal offer, employment status, and highest level of education.
Based on the SHAP value analysis, variables such as employment status (retired),
policy renewal offer type (type 4), and monthly payment amount (500–600)
consistently increased the likelihood of churn.
Collections
- UT - Actuaria [54]
