Enhancing Multivariate Linear Mixed Model with Two-Level Random Effects for Longitudinal Data and Its Application to PISA and National Examination Scores

Santi, Vera Maya

Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/155402

Title:	Enhancing Multivariate Linear Mixed Model with Two-Level Random Effects for Longitudinal Data and Its Application to PISA and National Examination Scores
Other Titles:
Authors:	Notodiputro, Khairil Anwar Indahwati Sartono, Bagus Santi, Vera Maya
Issue Date:	2024
Publisher:	IPB University
Abstract:	Increasing the breadth to involve multivariate responses in cross-sectional data analysis becomes essential when examining interconnected and multidimensional phenomena. Including multivariate responses enables researchers to examine relationships and dependencies among numerous variables concurrently. The necessity to analyze multivariate responses in cross-sectional data emerges when researchers seek to understand complex patterns, dependencies, and interactions among various factors within a specific population. In actual scenarios, a series of observations is frequently taken multiple times, a phenomenon commonly known as repeated measurements. Repeated measurements entail evaluating a research subject at different observation points in time. Several researchers have devised alternative methods for analyzing longitudinal data. The linear mixed model is the more prevalent approach (LMM). Another challenge in longitudinal data analysis arises from the presence of multivariate response variables, where multiple responses are observed in the survey. One approach to tackle this is by simultaneously modeling them using a multivariate linear mixed model. The complexity of real-world data sets further increases when random effects are of a multilevel or hierarchical nature. Substantial research questions emerging in longitudinal data structures, where the response variables are multivariate and the random effects are multilevel, often involve issues such as interrelated changes between response variables. An example of hierarchical data with multiple observations is the PISA scores and national exam (UN) data for high school (SMA) students. This research is conducted repeatedly to observe the trend of average PISA scores of Indonesian students and the average national exam scores at the high school level. PISA score data is designed to include a representative sample from the target population, which is the result of a survey conducted by the OECD. There are three test subjects in the OECD survey: reading literacy, mathematics, and science. Meanwhile, the average SMA national exam scores data is sourced from the Basic Education Data. (DAPODIK). Cross-sectional studies have also been conducted to analyze PISA score data. In this case, it is crucial to capture whether there is any association or relationship between students' PISA scores across the three test subjects and also the school as the random effect. Furthermore, longitudinal studies have also been conducted. This study is applied to the average national exam (UN) scores data for high school students majoring in science in the West Java province. In this longitudinal study, it is essential to capture trends among the average UN scores for six subjects: Indonesian language, English language, mathematics, physics, chemistry, and biology over time. Another important aspect is to ascertain whether schools contribute as a random effect to the response variables. Empirical analysis using MLMM has been conducted for both cross-sectional and longitudinal data structures. Although PISA score data has been the subject of several studies in Indonesia, simultaneous analysis of all three PISA score data using MLMM has not been done before. Similarly, simultaneous analysis of average UN scores for high school students majoring in science in West Java with longitudinal data structure and the addition of random effects using MLMM has not been conducted. The results indicate that with the addition of random effects in the model, it is possible to reduce the standard errors of the fixed parameter estimates. Additionally, for PISA score data, schools as random effects contribute significantly to the variance. This finding is consistent with empirical studies conducted on the average UN scores for high school students majoring in science in West Java. The estimates of the variance components obtained show significant variation between schools regarding the average UN scores for the six subjects. To handle complex data structures involving multiple outcomes and two-level random effects in longitudinal data, MMLMM (Multivariate Multilevel Linear Mixed Models) has been developed. In this model, the random effects are assumed to follow a normal distribution, and the measurement time is neither considered a fixed effect nor a random effect. Parameter estimation for fixed effects is carried out using the Maximum Likelihood Estimation (MLE) method, while the estimation of variance components utilizes the Restricted Maximum Likelihood (REML) method. The performance of MMLMM is assessed using the properties of the estimators, namely relative bias and Root Mean Square Error prediction (RMSEp), which are evaluated through simulation studies. The results indicate that the relative bias value produces only a slight bias in all of the parameter estimators. The relative bias will decrease and approach 0 as the sample size increases. The RMSEp value also becomes smaller as the sample size increases. Thus, the proposed model shows higher performance with large sample sizes. This indicates that the proposed model that was built produces fixed parameter estimates that are unbiased, have minimum variance and consistent. Based on the simulation study results, it is evident that the performance of MMLMM is superior to that of MLMM. The subsequent empirical study is conducted by applying MMLMM to the average scores of national exams for senior high schools specializing in natural sciences in West Java Province. The results indicate that the standard error of nearly all estimated parameter values from the MMLMM analysis is smaller than that of MLMM. It is revealed that the proposed model resulted in much lower AIC and BIC when compared to the common model. This is strong evidence that MMLMM's performance is superior to MLMM. In this dissertation, we have partially estimated the fixed effects of the model and solely conducted analytical parameter estimation for the fixed parameters. In future studies, there will be a challenge to develop computational algorithms and simulations that enable simultaneous estimation of both fixed and random effects, as well as completing the estimation of variance components analytically
URI:	http://repository.ipb.ac.id/handle/123456789/155402
Appears in Collections:	DT - Mathematics and Natural Science

Files in This Item:

File	Description	Size	Format
cover_G161180021_ee5fe76d85c2436c9bbd3db1dd903a9a.pdf	Cover	2.59 MB	Adobe PDF	View/Open
fulltext_G161180021_3c39fd8176884a659373220a3ab2c141.pdf Restricted Access	Fulltext	11.47 MB	Adobe PDF	View/Open
lampiran_G161180021_8a644fda151a42fe8b28292e41c3e939.pdf Restricted Access	Lampiran	2.77 MB	Adobe PDF	View/Open

Show full item record Recommend this item

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets