Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/133461
Title: A Study of Generalized Mixed-Effects Trees Model for Classification of Household Poverty in West Java
Authors: Notodiputro, Khairil Anwar
Sadik, Kusman
Rahmawati, Fardilla
Issue Date: 2023
Publisher: IPB University
Abstract: Combining machine learning and statistical modeling approaches allows for handling of both random and fixed effects. This thesis looks at how to represent fixed and random effects using a statistical machine-learning technique. Generalized mixed-effects trees (GMET) is a mixed-effects model based on trees that is used to deal with response variables from the exponential family of distributions. The GMET model is a generalization of generalized linear mixed models (GLMM) where in the fixed effect part the GMET model does not require to be linear. In performing classifications involving fixed effects and random effects, both models can be used. The GMET and GLMM technique were applied in this research to identify suitable data circumstances for using this methodology, using both simulation and actual/empirical data. Different generations of response variables (mixed-effects tree structure, linear fixed effect function, and non-linear function fixed effects) were used to generate the simulation data. Additionally, different values of the variance of the random effect (small and large) and fixed effect coefficients (small and large) were used. The GMET and GLMM algorithms are also used to analyze actual/empirical data in order to identify the variables or factors that can distinguish between households with low and high levels of poverty (non-poor and poor), as well as the impact of regional typology on the classification of household poverty status in West Java Province in 2019. Based on a simulation study, analysis of variance was carried out for each performance evaluation value of the predicted value of mean absolute deviation (PMAD) and predictive classification error rate (PMCR). In the analysis of variance for the PMAD value, it was found that there is an interaction effect of the four factors (scenarios) that has p-value less than 0.05 (significant). It means that there is a relationship of dependence on a certain method, a certain generation, certain fixed effects, and certain random effects that cannot be separated. Therefore, the GMET and GLMM algorithms have their respective performance depending on the generation of a response variable, a fixed effect, and a certain random effect. In the analysis of variance for PMCR values, it was found that interaction effects of three factors is significant at alpha 5%, namely generation scenarios, fixed effect scenarios, and random effect scenarios. It means that the performance of the GMET and GLMM models could not be distinguished. This is in line with classifying household poverty status in actual/empirical study, where there is no visible difference in performance between the two models. Performance examination of the GMET and GLMM models in identifying household poverty status shows that all performance evaluations (PMCR, AUC, sensitivity, specificity, precision, recall, F1-score, accuracy, and MCC) have absolute values from the difference between the two models which tend to be small. This is consistent with simulated study on the evaluation of PMCR values, which reveal that the performance of the GMET and GLMM models cannot be separated. Thus, in this classification of household poverty status, the GMET model delivers performance that is similar to the GLMM model. The results show that the GMET performs comparably for different response variable generation scenarios; however, it performs better when the fixed effect value and the variance of random effects were large. The GMET method, when applied to the empirical data, describes fixed effects and random effects and classifies household poverty status quite well based on the area under curve (AUC) value. It was also revealed that the fixed effects that have a strong influence in classifying poor households are the number of household members, land ownership, the main type of fuel used for cooking, and the main water source used for drinking. These variables may raise the government's attention in an effort to alleviate the socioeconomic inequality that leads to poverty. In addition, the inclusion of regional typology as a random element in the model has contributed to the variation of household poverty status. The GMET method can compete with other approaches/methods since research indicates that fixed effects in mixed models do not always have to be linear and can be used to grouped data structures.
URI: http://repository.ipb.ac.id/handle/123456789/133461
Appears in Collections:MT - Mathematics and Natural Science

Files in This Item:
File Description SizeFormat 
Cover, Statement Page, Abstract, Approval Page, Foreword, and Table of Contents.pdf
  Restricted Access
Cover4.36 MBAdobe PDFView/Open
G1501211067_Fardilla Rahmawati.pdf
  Restricted Access
Full Text13.76 MBAdobe PDFView/Open
Attachment.pdf
  Restricted Access
Attachment5.07 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.