Overcoming Imbalanced and Overlapping Data in Multiclass Classification
Date
2024Author
Siahaan, Dessy Rotua Natalina
Fitrianto, Anwar
Notodiputro, Khairil Anwar
Metadata
Show full item recordAbstract
Classification model suffers when the dataset contains imbalanced and overlapping data. Imbalanced data tends to produce models that are only good at classifying the majority class but bat at the minority class. Meanwhile, overlapping data makes classification difficult because of similar characteristics between classes. These two conditions are already challenging separately and even more complex if they occur together. Problems are getting more complicated when it is a multiclass classification case. Multiple Classifier System (MCS) model, a combination of Sequential Logistic Regression (LR) and K-Nearest Neighbour (KNN), is used to handle the existing problems. Synthetic Minority Oversampling Technique (SMOTE) method was also applied to balance the dataset. One Versus One (OVO) Decomposition technique helps the multiclass classification process. Simulation data with 18 scenarios proves the MCS-SMOTE model can handle problems by providing good performance. Empirical Data: Poverty on Jawa Barat in 2021 is also used to prove the model's performance. Its performance, proven by accuracy, F1 Score, and G-Mean, is superior to the others.
