A Study of 3Trees Methods to Enhance Hierarchical Mixed-Effects Models for Analyzing Per Capita Expenditure and Poverty Status
Date
2026Author
Asrirawan
Notodiputro, Khairil Anwar
Susetyo, Budi
Oktarina, Sachnaz Desta
Metadata
Show full item recordAbstract
BPS produces household welfare indicators based on the National Socioeconomic Survey (SUSENAS). This survey employs a two-stage stratified sampling design, where census blocks are first selected using probability proportional to size (PPS), followed by a random selection of households within the chosen blocks. This sampling strategy results in a hierarchical data structure, with households nested within census blocks, which in turn are nested within villages, districts, and provinces. Consequently, analyses of household welfare, such as per capita expenditure and poverty, need to account for the multidimensional influences of both individual and contextual factors.
This study focuses on developing a hierarchical mixed-effects model with three trees (3Trees) using machine learning approaches, and applying it to per capita expenditure and poverty data in West Java. Hierarchical mixed-effects models have long been one of the main approaches for analyzing data with nested structures, such as households within regions, students within schools, or patients within health facilities. However, conventional mixed-effects models have several limitations, particularly in handling assumptions of linearity, nonlinearity, and cross-level interactions.
With the advancement of machine learning, tree-based and ensemble methods have been increasingly used to extend the capabilities of mixed-effects models. One of the recent approaches is the 3Trees model, which integrates a linear component with three trees: an individual-level tree, a group-level tree, and a cross-level tree. While this model introduces a flexible additive structure, the implementation of the Classification and Regression Trees (CART) algorithm within it still faces several challenges, including the risk of overfitting, variable selection bias, local optimality, and limitations in handling only normal (Gaussian) response variables.
This research proposes three main developments. First, it modifies the 3Trees algorithm using single-tree methods based on Conditional Inference Trees (CTree) and Evolutionary Trees (EvTree). Second, it develops an ensemble-based version using Random Forests (RF), resulting in a new variant called 3Trees-RF. Third, it extends the framework into a Generalized Hierarchical Mixed-effects Model with Trees (G3Trees) to handle non-Gaussian responses, particularly binary outcomes. These developments aim not only to address the technical limitations of existing models but also to broaden their applicability to more complex socioeconomic phenomena.
To evaluate the performance of the proposed models, both simulation studies and empirical analyses are conducted. The simulation studies assess model performance under various scenarios, including different functional forms between response and covariates, fixed and random effects structures, correlations, and interaction effects. Model performance is evaluated using several metrics, including Mean Square Error (MSE), Predictive Mean Square Error (PMSE), ClusMSE, ClusPMSE, PMAD, clusPMAD, PMCR, clusPMCR, and bias in fixed and random effect parameters. The empirical analyses apply the models to two real-world cases: predicting household per capita expenditure and classifying poverty status in West Java.
Simulation results indicate that the 3Trees-EvTree model effectively addresses the weaknesses of CART, particularly in terms of variable selection bias and the risk of local optima, and yields better predictive performance than both 3Trees-CART and conventional linear mixed models. Meanwhile, the 3Trees-RF model demonstrates the best predictive accuracy, as measured by MSE, PMSE, clusMSE, and clusPMSE, while also offering computational efficiency and enhanced model interpretability through variable importance measures. These advantages make 3Trees-RF a more adaptive alternative for hierarchical data with nonlinear relationships and complex interactions. Empirical applications further show that 3Trees-RF consistently provides more accurate per capita expenditure predictions than classical mixed models or single-tree methods. This finding suggests that combining mixed-effects structures with ensemble methods produces more reliable estimates to support socioeconomic policy formulation.
The final development involves the G3Trees model for binary responses, applied to poverty status classification in West Java. This model integrates the Generalized Linear Mixed Model (GLMM) framework with three CART-based trees. Its strength lies in identifying complex nonlinear patterns and interactions. Simulation and empirical evaluations demonstrate that G3Trees improves classification accuracy, as measured by PMAD, PMCR, clusPMAD, clusPMCR, AUC, accuracy, sensitivity, specificity, F1-score, and ROC, while effectively identifying individual- and group-level poverty determinants and capturing cross-level interactions that conventional models fail to detect. Therefore, this model offers a more comprehensive analytical tool for socioeconomic issues involving non-Gaussian data.
The implications of this research are twofold. From an academic perspective, the proposed models enrich the methodological literature on statistical and machine learning approaches for hierarchical data, providing more adaptive alternatives to conventional models. From a practical perspective, the empirical findings provide a stronger analytical basis for evidence-based policymaking, particularly in poverty alleviation and welfare improvement. Overall, this study bridges classical statistical approaches and modern machine learning techniques by integrating mixed-effects modeling with tree-based methods.
