Please use this identifier to cite or link to this item:
http://repository.ipb.ac.id/handle/123456789/163361| Title: | Perbandingan Metode Regresi Logistik dan Random Forest dalam Memprediksi Churn pada Pemegang Polis Asuransi Mobil |
| Other Titles: | Comparison of Logistic Regression and Random Forest Methods in Predicting Churn Among Auto Insurance Policyholders |
| Authors: | Suharjo, Budi Ruhiyat Alya, Hana |
| Issue Date: | 2025 |
| Publisher: | IPB University |
| Abstract: | Penelitian ini membandingkan regresi logistik dan random forest dalam
memprediksi churn pada pemegang polis asuransi mobil dengan teknik
undersampling proporsi 30%, 50%, dan 70%. Model random forest setelah
undersampling 30% menunjukkan akurasi (99,07%) dan AUC tertinggi (99,91%),
sementara sensitivitas tertinggi (98,85%) pada model random forest sebelum
undersampling. F1-score tertinggi (99,29%) diperoleh dari model random forest
setelah undersampling 70%. Variabel paling penting yang memengaruhi keputusan
pemegang polis untuk melakukan churn dalam model regresi logistik berdasarkan
nilai odds ratio adalah status pekerjaan (pensiun), jumlah pembayaran bulanan
pemegang polis asuransi (kategori 500-600), dan tingkat pendidikan tertinggi
pemegang polis asuransi (doktor). Pada model random forest, variabel paling
penting berdasarkan nilai mean decrease gini adalah tipe penawaran perpanjangan
polis, status pekerjaan, dan tingkat pendidikan terakhir pemegang polis asuransi.
Berdasarkan analisis nilai SHAP, variabel seperti status pekerjaan (pensiun), tipe
penawaran perpanjangan polis (tipe 4), dan jumlah pembayaran per bulan (500-600)
secara konsisten meningkatkan peluang churn. This study compares logistic regression and random forest models in predicting churn among car insurance policyholders using undersampling techniques at proportions of 30%, 50%, and 70%. The random forest model with 30% undersampling achieved the highest accuracy (99.07%) and AUC (99.91%), while the highest sensitivity (98.85%) was observed in the random forest model before undersampling. The highest F1-score (99.29%) was obtained from the random forest model with 70% undersampling. The most influential variables affecting policyholders churn decisions in the logistic regression model, based on odds ratio values, were employment status (retired), monthly insurance payment amount (category 500–600), and highest level of education (doctoral degree). In the random forest model, the most important variables based on mean decrease in Gini were type of policy renewal offer, employment status, and highest level of education. Based on the SHAP value analysis, variables such as employment status (retired), policy renewal offer type (type 4), and monthly payment amount (500–600) consistently increased the likelihood of churn. |
| URI: | http://repository.ipb.ac.id/handle/123456789/163361 |
| Appears in Collections: | UT - Actuaria |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| cover_G5402211069_af26211011684ff692fd32834fe05acf.pdf | Cover | 826.84 kB | Adobe PDF | View/Open |
| fulltext_G5402211069_24ab0d6696604d049e12d522d94ae9ac.pdf Restricted Access | Fulltext | 5.46 MB | Adobe PDF | View/Open |
| lampiran_G5402211069_adaefcd403cf43de9c4051670477a7f3.pdf Restricted Access | Lampiran | 2.45 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.