View Item 
      •   IPB Repository
      • Dissertations and Theses
      • Master Theses
      • MT - Mathematics and Natural Science
      • View Item
      •   IPB Repository
      • Dissertations and Theses
      • Master Theses
      • MT - Mathematics and Natural Science
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Multilabel Classification: Methods Comparison, Performance Improvement, And Model Explainability

      Thumbnail
      View/Open
      Cover (667.6Kb)
      Fulltext (1.697Mb)
      Lampiran (253.7Kb)
      Date
      2024
      Author
      Prasetyo, Teguh
      Susetyo, Budi
      Kurnia, Anang
      Metadata
      Show full item record
      Abstract
      Optimizing tax revenues is difficult in Indonesia due to obstacles such as tax evasion and tax avoidance. It is closely related to an organization's compliance with tax regulations, known as the Taxpayers Risk Profile. However, this mechanism does not accurately detect tax avoidance and tax evasion risks. To overcome this limitation, we use a multilabel classification machine learning method in this study, which classifies a single observation into one or more labels at once. The approach involves Problem Transformation (Binary Relevance and Label Powerset), Algorithm Adaptation (ML-kNN and ML-ARAM)¸ and Ensemble (Label Space Partitioning and RAkELd). Based on the model performance comparisons, we discovered that the ML-ARAM method based on deep learning is the best, with an average F1-score of 95.5% and a hamming loss of 7.4%. We also examine the feature importance of the best model to reduce the dimensions of features so that we can identify the dominant factors that encourage a taxpayer entity to engage in tax avoidance or tax evasion. The findings of this study improve the accuracy of tax avoidance risk detection and tax evasion Risk Profiles using machine learning methods, ensuring maximum tax revenues in Indonesia. Multilabel classification modelling using deep learning-based methods is superior in the predictions' accuracy but is difficult for researchers to interpret and understand. This limitation is because deep learning-based methods use complex algorithms compared to probabilistic-based methods. Even though it has good prediction accuracy, the condition of the data also affects the model performance, such as class imbalance. These two things are challenges in implementing classification models, especially in the multilabel classification (MLC), which has a specific characteristic compared to other classification models. To overcome this limitation, in this study we offer a solution in addressing class imbalance using a combination of stratification and resampling methods, as well as applying the SHAP (SHapley Additive exPlanations) method to get an understandable interpretation of the deep learning-based MLC method. This study uses empirical tax avoidance and tax evasion data from the Indonesian Ministry of Finance from 2018 to 2022. We use the MLC method based on Deep Learning - ML-ARAM, applied on several combinations of stratification and resampling methods to address class imbalance. We measure model performance using F1-score and hamming loss, as well as model interpretation and explanation uses feature importance values based on the SHAP method. We found that the combination of the stratification and resampling methods increased the prediction accuracy on data indicated to have a class imbalance, compared to initial conditions without treatment or the use of one method alone. Furthermore, we successfully generate an understandable interpretation of the MLC model using the SHAP method, including identifying the dominant explanatory variables (features) and their influence on the prediction results for each observed value based on the Feature Importance value. The findings of this study provide benefits in the form of alternative methods for addressing class imbalance in MLC data using a combination of the stratification and resampling methods as well as an understandable interpretation of the MLC model. In this case, the interpretation of the deep learning-based model is helpful for decision-making and improving risk profile policies at the Ministry of Finance - Indonesia.
      URI
      http://repository.ipb.ac.id/handle/123456789/156306
      Collections
      • MT - Mathematics and Natural Science [4143]

      Copyright © 2020 Library of IPB University
      All rights reserved
      Contact Us | Send Feedback
      Indonesia DSpace Group 
      IPB University Scientific Repository
      UIN Syarif Hidayatullah Institutional Repository
      Universitas Jember Digital Repository
        

       

      Browse

      All of IPB RepositoryCollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

      My Account

      Login

      Application

      google store

      Copyright © 2020 Library of IPB University
      All rights reserved
      Contact Us | Send Feedback
      Indonesia DSpace Group 
      IPB University Scientific Repository
      UIN Syarif Hidayatullah Institutional Repository
      Universitas Jember Digital Repository