Classifier Learning For Imbalanced Dataset Using Modified SMOTEBoost Algorithm And Its Application On Credit Scorecard Modeling

Kurnia, Rifan

Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/67070

Title:	Classifier Learning For Imbalanced Dataset Using Modified SMOTEBoost Algorithm And Its Application On Credit Scorecard Modeling
Authors:	Sartono, Bagus Sumertajaya, I Made Kurnia, Rifan
Issue Date:	2013
Abstract:	In many real cases, imbalanced class problem are often founded. A dataset is called imbalanced if there is one or more classes that are dominating the whole dataset as majority classes, and other classes becoming minority. In statistical classification, imbalanced class is a serious problem. This could lead the model less sensitive in classifying minority class objects. In many cases, misclassifying the minority class objects could have bigger problem than misclassifying the majority class. For instance in credit scorecard, accepting “bad” applicant will be much risky than rejecting “good” applicant. This study will discuss a hybrid technique between SMOTE and Boosting, called SMOTEBoost to overcome imbalanced class issue. SMOTE generates synthetic samples based on the objects’ characteristics and the k-nearest neighbor. Synthetic samples generation has different procedure for each numerical and categorical variable. Euclidian distance is used for numerical variable, while the value’s mode can be simply used for categorical variable. “Boosting” is a general method for improving the performance of any learning algorithm. In theory, boosting can be used to significantly reduce the error of any “weak” learning algorithm that consistently generates classifiers which only a bit better than random guessing. The popular variant of boosting called “AdaBoost”, an abbreviation for adaptive boosting. SMOTEBoost is a combination of SMOTE and boosting algorithm. The purpose of this combination is to create a powerful model in classifying imbalanced class dataset without sacrificing the overall accuracy. Decision tree, with CART algorithm, will be used in each boosting iteration. From the analysis result, the combination of SMOTE and Boosting has proven that it gives significantly better performance than CART. While SMOTE let the classifier improves the performance on the minority class, boosting procedure improves the predictive accuracy of any classifier by focusing on difficult objects. On comparison, SMOTEBoost produce better separation between good and bad class which is represents by higher performance measures (KS-Statistics, Area under ROC, and Accuracy) than CART. It also turns out that SMOTEBoost produces stable percentage between sensitivity and specificity, and this is the difference between SMOTEBoost and CART. SMOTEBoost produces good performance in predicting minority class (bad) as well as maintaining good performance in predicting majority class (good). Meanwhile if CART used, it will only gives good performance on predicting majority class. Besides having better performance compare to CART, SMOTEBoost also maintains stable performance across different bad rate. The stability assessment result shows SMOTEBoost gives good and stable performance, even though the bad rate is set to be significantly lower than the good rate.
URI:	http://repository.ipb.ac.id/handle/123456789/67070
Appears in Collections:	MT - Mathematics and Natural Science

Files in This Item:

File	Description	Size	Format
2013rku.pdf Restricted Access	Fultext	2.33 MB	Adobe PDF	View/Open

Show full item record Recommend this item

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets