Mixed effect models for high –dimensional longitudinal data with latent variables
Date
2022Author
Angraini, Yenni
Saefuddin, Asep
Notodiputro, Khairil Anwar
Toharudin, Toni
Metadata
Show full item recordAbstract
A set of data that consist of repeated measurements on a large number of
outcomes and covariates are known as high-dimensional longitudinal data. The
high-dimensional longitudinal data may consist of several effects of interest that
cannot be directly measured, known as latent factors or latent variables. During the
last decade, the multivariate responses or high-dimensional in longitudinal data
have been a big issue.
The analysis of high-dimensional longitudinal data is complicated due to its
complex correlation structures between outcomes. To understand changes over time
of outcome variables, having correlations in individuals with explanatory variables
is not enough. Hence, complex correlation structures between outcomes need to be
considered. One approach that is commonly used to overcome high-dimensional
longitudinal data is simultaneous equation modeling. A well-known simultaneous
equation modeling method is the structural equation model (SEM). This approach
has several appealing modeling abilities and can be used for high-dimensional
longitudinal data. Under the SEM framework, the continuous time SEM is
developed to avoid some issues associated with autoregressive and cross-lagged
problems in SEM. Another simultaneous equation modeling method is by
combining factor analysis and multivariate analysis methods to overcome high
dimensional longitudinal data, namely latent factor linear mixed model (LFLMM).
The factor analysis is used to reduce the high-dimensional outcomes, and the
multivariate linear mixed model is used to study the longitudinal trends of several
latent factors.
One example of high-dimensional longitudinal data is the General Election
Study. This study is carried out repeatedly to observe tendencies towards political
attitudes and behavior over time in Belgium. The data contain political information,
knowledge, perceptions, and preferences of a political party and the level of
participation in politics. One of the most interesting things to study from the data is
to analyze the change of political attitudes and behavior of respondents over time.
Also, the relationship of changes in these outcomes is important to analyze. The
General Election Study in Belgium was designed to include a representative sample
of the target population under the Belgian electorate, so accurate estimates about
the population could be made. This sampling design was created by the Institute for
Social and Political Opinion (ISPO) and the Inter-university Center for Political
Opinion Research (PIOP). The Belgian data set contained three subsamples, the
Flemish (Dutch-speaking), the Walloon (French-speaking), and the Brussels
Capital Region (Dutch and French-speaking). Several studies conducted on the
Belgium data are carried out to understand the relationships between the latent
variables Individualism (I), Ethnocentrism (E), and Authoritarianism (A) in
Flanders. Cross-sectional or a longitudinal studies have also been carried out. In
such cases, it is critical to capture the trend of the latent variables over time and,
more importantly, whether there is any association or relationships between the
development of nationalism (N), ethnocentrism (E), individualism (I), and
authoritarianism (A) in Belgium.
An empirical analysis of CT-SEM has been done to present the
interdependencies among the four latent variables mentioned above on the basis of
the General Election Studies for Belgium in 1991, 1995, and 1999 (Interuniversitair
steunpunt politieke-opinieonderzoek, 1991, 1995, 1999). Although the four
variables have been the subject of several studies in Flanders, a longitudinal
analysis of all four concepts using CT analysis and their relationships has not been
performed. Reciprocal effects between A and E and between E and I as well as a
unidirectional effect from A on I were found in the CT-SEM analysis. The finding
also revealed relatively small but significant, effects from both I and E on N, but no
effect from A on N or from N on any of the other variables.
Similar to the CT-SEM method, the latent factor linear mixed model
(LFLMM) is also a common method used to analyze the change in highdimensional
longitudinal data. Analysis of change from several previously
mentioned latent variables I, N, E, and A in Flanders, Belgium, is interesting as
Belgium is feared to fall apart as a nation. Two stages of modeling have been carried
out. The first stage involved modeling Individualism (I), Nationalism (N), and
Ethnocentrism (E), and in the next step, Authoritarianism (A) was added to the
model. The results showed that I, N, and A increased over time while E decreased
over time. The correlation of random effects in LFLMM has geared several exciting
findings, including positive correlations between E and A; I and E; and I and A.
Apart from advantages of the LFLMM method, disadvantages related to
assumptions and performance of the EM algorithm used to estimate the model
parameters were identified. One disadvantage is that the EM algorithm cannot
automatically produce the calculation of standard errors. This dissertation extended
the EM algorithm called the Supplemented EM algorithm and used a simulation
study to investigate the computational aspects of the algorithm in the latent factor
linear mixed model (LFLMM) to produce the standard errors of the estimator of
fixed variables. We also calculate the variance matrix of beta using the second
moment as a benchmark to compare with the asymptotic variance matrix of beta of
Supplemented EM. Both the second moment and Supplemented EM produce
symmetrical results, the variance estimates of beta are getting smaller when number
of subjects in the simulation increases. The algorithm was implemented to analyze
the data on political attitudes and behavior in Flanders-Belgium. This algorithm was
also implemented on Belgian data involving cohorts from Flanders and Wallonia.
It was found that all latent are positively correlated over time as indicated by the
correlation matrix of random effects.