Klasifikasi imbalanced data menggunakan weighted k-nearest neighbor pada data debitur kartu kredit bank

Syahidah, Aisyah

Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/71870

Title:	Klasifikasi imbalanced data menggunakan weighted k-nearest neighbor pada data debitur kartu kredit bank
Other Titles:	Classification of imbalanced data using weighted k-nearest neighbor in data bank credit card debtors
Authors:	Kustiyo, Aziz Syahidah, Aisyah
Issue Date:	2014
Publisher:	Bogor Agricultural University (IPB)
Abstract:	Manajemen risiko kredit bertujuan untuk meminimalkan potensi kerugian dari kredit macet. Analisis data debitur bermasalah yang sudah ada dapat menjadi model dalam kualifikasi pemberian kredit selanjutnya. Data debitur bank termasuk kasus data tidak seimbang. Proses klasifikasi menjadi tidak optimal karena kelas dengan jumlah data lebih banyak memberikan pengaruh yang sangat besar dalam hasil klasifikasi. Penelitian ini bertujuan untuk mengembangkan model klasifikasi data debitur kartu kredit menggunakan algoritme weighted k-nearest neighbor dan metode sampling yang bertujuan meningkatkan kualitas klasifikasi pada data tidak seimbang. Metode sampling yang digunakan yaitu oversampling dan undersampling. Metode oversampling acak menghasilkan nilai f-measure terbaik sebesar 86.51%. Metode oversampling duplikasi menghasilkan nilai recall terbaik sebesar 100%. Credit risk management aims to minimize potential losses of non-performing loans. The classification results of existing data debtors can be referred for credit qualifications. The debtors data, most likely, are imbalanced due to the good debtors dominated the bad one. Classification process could not be optimum because of the class with more data had tremendous influence in the classification result. This research aims to develop a data classification model based on credit card debtors using weighted k-nearest neighbor and sampling method which aimed to improve the quality of classification on the imbalanced data. The sampling methods used are the oversampling and undersampling. The random oversampling method obtains the best performance with F-measure of 86.51%. Moreover, the duplication oversampling can obtain 100% recall. Keywords: imbalanced data, oversampling, undersampling, weighted k-nearest neighbor
URI:	http://repository.ipb.ac.id/handle/123456789/71870
Appears in Collections:	UT - Computer Science

Files in This Item:

File	Description	Size	Format
G14asy.pdf Restricted Access	Full Text	550.93 kB	Adobe PDF	View/Open

Show full item record Recommend this item

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets