Please use this identifier to cite or link to this item:
http://repository.ipb.ac.id/handle/123456789/171079Full metadata record
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Wijayanto, Hari | |
| dc.contributor.advisor | Angraini, Yenni | |
| dc.contributor.author | Fitri, Zafira Ilma | |
| dc.date.accessioned | 2025-09-15T06:15:19Z | |
| dc.date.available | 2025-09-15T06:15:19Z | |
| dc.date.issued | 2025 | |
| dc.identifier.uri | http://repository.ipb.ac.id/handle/123456789/171079 | |
| dc.description.abstract | Permasalahan class imbalanced dataset pada klasifikasi menimbulkan tantangan serius karena algoritma cenderung lebih fokus pada kelas mayoritas dan mengabaikan kelas minoritas. Penelitian ini dilakukan untuk mengkaji penerapan teknik synthetic oversampling dan algoritma ensemble classification Random Forest serta XGBoost pada data tidak seimbang, serta mengeksplorasi peran struktur dan skala peubah terhadap performa model. Data yang digunakan merupakan data risiko kredit dengan 32.581 observasi, yang dikontruksi menjadi tiga tipe, yaitu data campuran numerik–kategorik (Data 1), data murni kategorik(Data 2), dan data murni numerik (Data 3). Analisis meliputi tahap preprocessing, pembagian data secara stratified, penerapan synthetic oversampling (SMOTE, Borderline-SMOTE, ADASYN), pembangunan model dengan Random Forest dan XGBoost, serta evaluasi menggunakan metrik balanced accuracy, precision, recall, F1-score. Hasil penelitian menunjukkan bahwa synthetic oversampling dapat memperbaiki representasi kelas minoritas, tetapi efektivitasnya sangat bergantung pada karakteristik data dan algoritma yang digunakan. Random Forest cenderung lebih stabil pada data kategorik, sedangkan XGBoost lebih unggul pada data numerik dan campuran dengan bantuan balancing internal. Oleh karena itu, pemilihan strategi balancing dan algoritma perlu disesuaikan dengan struktur dataagar diperoleh hasil klasifikasi optimal pada kondisi class imbalanced dataset. | |
| dc.description.abstract | The problem of class imbalanced datasets in classification poses a serious challenge because algorithms tend to focus more on the majority class and ignore the minority class. This study aims to examine the application of synthetic oversampling techniques and ensemble classification algorithms, namely Random Forest and XGBoost, on imbalanced data, as well as to explore the role of variable structure and scale on model performance. The dataset used is a credit risk dataset with 32,581 observations, which was constructed into three types: mixed numerical–categorical data (Data 1), purely categorical data (Data 2), and purely numerical data (Data 3). The analysis included preprocessing, stratified data splitting, application of synthetic oversampling (SMOTE, Borderline-SMOTE, ADASYN), model construction using Random Forest and XGBoost, and evaluation using balanced accuracy, precision, recall, and F1-score metrics. The results show that synthetic oversampling can improve the representation of the minority class,but its effectiveness strongly depends on the data characteristics and the algorithm applied. Random Forest tends to be more stable on categorical data, while XGBoost performs better on numerical and mixed data with the support of internal balancing. Therefore, the choice of balancing strategy and algorithm should be adjusted to the data structure in order to achieve optimal classification results under class imbalanced conditions. | |
| dc.description.sponsorship | ||
| dc.language.iso | id | |
| dc.publisher | IPB University | id |
| dc.title | Kajian Penerapan Metode Synthetic Oversampling dan Ensemble Classification pada Class Imbalanced Dataset | id |
| dc.title.alternative | A Study on Application of Synthetic Oversampling and Ensemble Classification Methods on Class Imbalanced Dataset | |
| dc.type | Skripsi | |
| dc.subject.keyword | class imbalanced dataset | id |
| dc.subject.keyword | ensemble classification | id |
| dc.subject.keyword | synthetic oversampling | id |
| Appears in Collections: | UT - Statistics and Data Sciences | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| cover_G1401211054_875cd44e0de34a20bc9216afa111b1f6.pdf | Cover | 625.4 kB | Adobe PDF | View/Open |
| fulltext_G1401211054_a918058149ab437398257d507dceffa6.pdf Restricted Access | Fulltext | 1.19 MB | Adobe PDF | View/Open |
| lampiran_G1401211054_d5023b4a11234ee2bb0e92187a1f46ad.pdf Restricted Access | Lampiran | 645.87 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.