| dc.description.abstract | Advancements in developing prediction models based on the characteristics of data are critical
as they influence the accuracy of predictions across various fields. These advancements include
models designed for data constrained by specific maximum and minimum values, such as the
Four Parameter Beta distribution, which is recognized for its flexibility in accommodating
diverse shapes, skewness, and heavy tails. Next, prediction models should also consider the
variability of data that is commonly found in conditions where data is effected by geographical
differences, temporal fluctuations, environmental variability, and socio-economic disparities.
Furthermore, the development of prediction models should also account for the complexity of
relationships that may occur when applying different datasets. Particularly in agriculture,
developing prediction models can benefit from the integration of survey data and satellite data.
Understanding and modelling these complex relationships is key to improving prediction
accuracy.
Therefore, we have developed three studies that discuss the methods and models suitable for
predicting response data characterized by a Four Parameter Beta distribution, high variability,
and inherit complex relationships within datasets. This study specifically highlights the
significance of these advancements in predicting paddy productivity, with a particular focus on
their application to the development of Area Yield Index (AYI) crop insurance for paddy.
Accurate predictions of paddy productivity are essential for determining insurance premiums
and assessing risks.
The first study in Chapter 3 develops the Four Parameter Beta GLMM model by implementing
a transformation process. The transformation maps the actual response variable ?? that has an
interval (??, ?? ) to ??
* with the interval (0,1). This process enables us to model and predict data
by applying beta GLM or GLMM models. We use a GLMM model when the response variable
is measured groups/areas (i=1, 2, …. ,q) and j = 1,2,… ???? or apply a GLM model if the data
structure is more straightforward. Results show that the GLMM model is better than the GLM
approach, indicating that random effects and fixed effects are needed for predicting paddy
productivity. Eventhough, the results of the Four Parameter Beta GLMM model is promising,
the transformation process can cause bias in parameter estimates and complications in the
interpretation of coefficient values.
In the next study presented in Chapter 4, we have further developed Zhou and Huang’s (2022)
which means Four Parameter Beta regression model by introducing a random effect within the
model. This model was developed based on a Bayesian approach through a Stan package in R
software. Simulation studies showed that the parameter estimates of the model are considered
relatively unbiased, except for precision parameter (??^). Furthermore, empirical study shows
that the proposed Four Parameter Beta GLMM predictions are more accurate than Zhou and
Huang’s benchmark model.
For the third study in Chapter 5, we can further improve the prediction accuracy by addressing
more complex data. As an example, when integrating farmer survey and satellite data in our
model, both linear and non linear relationships emerge. Thus, the four-parameter beta GLMM
has been further developed into a Generalized Mixed Effect Tree (GMET) and a Generalized
Mixed Effect Random Forest (GMERF). Empirical case study wise, these models proved to be
more suitable for predicting paddy productivity compared to the Four Parameter Beta GLMM
when using satellite data and farmer surveys.
At the end of Chapter 5, We have also evaluated the developed model’s prediction accuracy
and selected the best model. By calibrating the best model to empirical data, extensive
Bootstrap studies were performed to estimate the pure premium and VaR of AYI. It was shown
that designing AYI at district level is more appropriate when productivity among areas vary.
Consideration must also be given in defining the benchmark productivity when there is proof
that the distribution of paddy productivity follows a Four Parameter Beta distribution. The use
of satellite data in the model has proven a beneficiary as it provides valuable, large-scale, and
temporally consistent information. It is also more efficient compared to conducting massive
field surveys. However, satellite data still may need to be combined with survey data to capture
localized, context-specific factors that satellites alone might not fully address
Typically, AYI premiums are calculated using average historical productivity data, which lacks
flexibility and does not account for dynamic factors like climate or pest outbreaks. In contrast,
predictive models such as the Four Parameter Beta GLMM, GMET, and GMERF offer a more
refined, data-driven approach to estimating paddy productivity, improving adaptability,
accuracy, and responsiveness to agricultural changes. Hence, enhancing risk assessment and
leading to more effective insurance products. Consequently, farmers are ensured fair and
adequate compensation in cases of crop failure, while insurers maintain financial stability. | |