Show simple item record

dc.contributor.advisorSuharjo, Budi
dc.contributor.advisorRuhiyat
dc.contributor.authorAlya, Hana
dc.date.accessioned2025-07-01T02:32:18Z
dc.date.available2025-07-01T02:32:18Z
dc.date.issued2025
dc.identifier.urihttp://repository.ipb.ac.id/handle/123456789/163361
dc.description.abstractPenelitian ini membandingkan regresi logistik dan random forest dalam memprediksi churn pada pemegang polis asuransi mobil dengan teknik undersampling proporsi 30%, 50%, dan 70%. Model random forest setelah undersampling 30% menunjukkan akurasi (99,07%) dan AUC tertinggi (99,91%), sementara sensitivitas tertinggi (98,85%) pada model random forest sebelum undersampling. F1-score tertinggi (99,29%) diperoleh dari model random forest setelah undersampling 70%. Variabel paling penting yang memengaruhi keputusan pemegang polis untuk melakukan churn dalam model regresi logistik berdasarkan nilai odds ratio adalah status pekerjaan (pensiun), jumlah pembayaran bulanan pemegang polis asuransi (kategori 500-600), dan tingkat pendidikan tertinggi pemegang polis asuransi (doktor). Pada model random forest, variabel paling penting berdasarkan nilai mean decrease gini adalah tipe penawaran perpanjangan polis, status pekerjaan, dan tingkat pendidikan terakhir pemegang polis asuransi. Berdasarkan analisis nilai SHAP, variabel seperti status pekerjaan (pensiun), tipe penawaran perpanjangan polis (tipe 4), dan jumlah pembayaran per bulan (500-600) secara konsisten meningkatkan peluang churn.
dc.description.abstractThis study compares logistic regression and random forest models in predicting churn among car insurance policyholders using undersampling techniques at proportions of 30%, 50%, and 70%. The random forest model with 30% undersampling achieved the highest accuracy (99.07%) and AUC (99.91%), while the highest sensitivity (98.85%) was observed in the random forest model before undersampling. The highest F1-score (99.29%) was obtained from the random forest model with 70% undersampling. The most influential variables affecting policyholders churn decisions in the logistic regression model, based on odds ratio values, were employment status (retired), monthly insurance payment amount (category 500–600), and highest level of education (doctoral degree). In the random forest model, the most important variables based on mean decrease in Gini were type of policy renewal offer, employment status, and highest level of education. Based on the SHAP value analysis, variables such as employment status (retired), policy renewal offer type (type 4), and monthly payment amount (500–600) consistently increased the likelihood of churn.
dc.description.sponsorship
dc.language.isoid
dc.publisherIPB Universityid
dc.titlePerbandingan Metode Regresi Logistik dan Random Forest dalam Memprediksi Churn pada Pemegang Polis Asuransi Mobilid
dc.title.alternativeComparison of Logistic Regression and Random Forest Methods in Predicting Churn Among Auto Insurance Policyholders
dc.typeSkripsi
dc.subject.keywordrandom forestid
dc.subject.keywordregresi logistikid
dc.subject.keywordundersamplingid
dc.subject.keyworddata churnid
dc.subject.keywordmetode supervised learningid


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record