View Item 
      •   IPB Repository
      • Dissertations and Theses
      • Undergraduate Theses
      • UT - Faculty of Mathematics and Natural Sciences
      • UT - Statistics and Data Sciences
      • View Item
      •   IPB Repository
      • Dissertations and Theses
      • Undergraduate Theses
      • UT - Faculty of Mathematics and Natural Sciences
      • UT - Statistics and Data Sciences
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Klasifikasi Tweet Berkecenderungan Bunuh Diri Menggunakan Random Forest

      Thumbnail
      View/Open
      Cover (747.7Kb)
      Fulltext (1.694Mb)
      Lampiran (160.9Kb)
      Date
      2024
      Author
      Audina, Alifya
      Anisa, Rahma
      Aidi, Muhammad Nur
      Metadata
      Show full item record
      Abstract
      Bunuh diri adalah salah satu penyebab kematian tertinggi secara global. Data Twitter dapat digunakan dalam pengidentifikasian pengguna internet yang berkecenderungan bunuh diri. Random forest merupakan salah satu metode yang populer digunakan untuk data berdimensi tinggi dan pada analisis data teks. Penelitian ini bertujuan untuk menggunakan metode random forest untuk memprediksi tweet berkecenderungan bunuh diri dan mengidentifikasi kata-kata yang dianggap penting dalam klasifikasi tweet yang berkecenderungan bunuh diri. Data sebanyak 10,006 tweets memiliki proporsi label 30:70. Penelitian menerapkan model random forest dengan menguji hyperparameter ntree dan mtry dengan 10- fold cross validation. Penelitian ini menerapkan oversampling dan undersampling. Model random forest dengan oversampling dan ntree 50 dan mtry 176 dapat mengklasifikasikan tweet berkecenderungan bunuh diri dengan sensitivitas 0.594 dan f1-score 0.649. Berdasarkan mean decrease accuracy, kata ‘nyerah’ memiliki tingkat kepentingan tertinggi dalam klasifikasi tweet berkecenderungan bunuh diri.
       
      Suicide is one of the leading causes of death globally. Twitter data can be used in identifying internet users with suicidal tendencies. Random forest is one of the popular classification method to be used on high dimension data and on text data analysis. This study aims to apply the random forest method to predict suicidal tweets and to identify important words on suicidal tweets classification. The data consists of 10,006 tweets which has label proportion of 30:70. The research applied the random forest model by testing the hyperparameters ntree and mtry with 10- fold cross validation. This study applied oversampling and undersampling. The random forest model with oversampling with ntree value of 50 and mtry value of 176 could classify suicidal tweets with a sensitivity of 0.594 and f1-score of 0.649. Based on its mean decrease accuracy, the word ‘nyerah’ was considered as the most important word in classifying suicidal tweet this study.
       
      URI
      http://repository.ipb.ac.id/handle/123456789/154373
      Collections
      • UT - Statistics and Data Sciences [2260]

      Copyright © 2020 Library of IPB University
      All rights reserved
      Contact Us | Send Feedback
      Indonesia DSpace Group 
      IPB University Scientific Repository
      UIN Syarif Hidayatullah Institutional Repository
      Universitas Jember Digital Repository
        

       

      Browse

      All of IPB RepositoryCollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

      My Account

      Login

      Application

      google store

      Copyright © 2020 Library of IPB University
      All rights reserved
      Contact Us | Send Feedback
      Indonesia DSpace Group 
      IPB University Scientific Repository
      UIN Syarif Hidayatullah Institutional Repository
      Universitas Jember Digital Repository