Addressing Class Imbalance in Under-5 Mortality in ASEAN: A Study of SMOTEBoost, RUSBoost, SMOTE, and BRF
Date
2025Author
Fulazzaky, Tahira
Saefuddin, Asep
Soleh, Agus Mohamad
Metadata
Show full item recordAbstract
This research aimed to evaluate the effectiveness of several machine learning techniques in addressing the challenge of class imbalance and predicting under-five child mortality in Southeast Asia. The primary objective was to identify the best model among SMOTEBoost, RUSBoost, SMOTE-Random Forest (SMOTE-RF), and Balanced Random Forest (BRF) in handling imbalanced class data. Additionally, the research sought to apply the most effective algorithm to datasets from Southeast Asia, focusing on Myanmar, the Philippines, and Cambodia, to identify the key risk factors contributing to under-five child mortality in these regions. The study followed a two-phase approach. In the first phase, 13 datasets with varying levels of class imbalance were analyzed to determine the best method for handling imbalanced data. The machine learning techniques evaluated included SMOTEBoost, RUSBoost, SMOTE-Random Forest (SMOTE-RF), and Balanced Random Forest (BRF). These methods were compared to baseline models like Random Forest (RF) and AdaBoost. The pre-processing of the datasets involved data cleaning, label encoding, and splitting them into training and testing subsets. Hyperparameter tuning was conducted using randomized search cross-validation to optimize the performance of the models. The models were evaluated based on balanced accuracy, recall, and computational efficiency. The second phase of the research applied the best-performing algorithm, based on the first phase, to empirical data obtained from the Demographic and Health Surveys (DHS) Program. The data focused on under-five child mortality in Southeast Asia, specifically in countries like Myanmar, the Philippines, and Cambodia, where mortality rates exceed the SDG 3.2 target of 25 deaths per 1,000 live births. Feature importance analysis was conducted to identify significant risk factors associated with under-five child mortality. These factors included maternal characteristics, socio-economic status, and environmental conditions. The findings from this research revealed that Balanced Random Forest (BRF) outperformed other methods, achieving the highest balanced accuracy, recall, and computational efficiency. BRF was particularly effective for datasets with highly and moderately imbalanced classes, and its performance remained stable across various levels of class imbalance. This indicates that BRF is a reliable and robust method for handling imbalanced datasets, especially in health-related applications where data imbalance is a common issue. Moreover, the computational efficiency of BRF makes it suitable for large-scale applications, particularly in resource-limited environments. Feature importance analysis revealed several critical factors contributing to under-five child mortality in Southeast Asia. The total number of children ever born emerged as the most significant predictor, indicating that larger family sizes were associated with higher mortality risks. Maternal age at first birth and birth order number (BORD) also played crucial roles, highlighting the importance of maternal health and the sequence of births. Socio-economic factors, such as the wealth index and the education level of the husband/partner, were found to significantly influence child mortality. Furthermore, environmental factors like access to clean drinking water were critical in determining child survival rates, underscoring the importance of basic infrastructure in improving child health outcomes. In conclusion, this research successfully identified Balanced Random Forest (BRF) as the most effective model for predicting under-five child mortality in Southeast Asia, particularly when dealing with imbalanced datasets. The study also highlighted several key risk factors, such as maternal age, birth order, socio-economic status, and environmental conditions, which provide valuable insights for public health interventions. By addressing these risk factors through targeted policies, such as improving maternal health, increasing access to education, and enhancing living conditions, public health strategies can be better aligned to reduce child mortality in the region. The use of BRF ensures a robust and reliable assessment of these predictors, making it a powerful tool for health-related data analysis in resource-constrained settings.