Prediksi Struktur Sekunder Protein menggunakan Hidden Markov Model pada Imbalanced Data
Abstract
This research aimed to predict protein secondary structure using Hidden Markov Model. A total of 780 data, will be conducted with 600 training data and 180 testing data. Training data obtained protein secondary structure 394052 with 152782 alpha-helix (H), 82355 betha-sheets (B) , and 158915 coil (C). Seen from a percentage of the result, the data retrieved is still imbalanced therefore used oversampling to increase the smallest class randomly until it equal to the largest class. The result of this research show that the Hidden Markov Model (HMM) can be applied to predict the secondary structure of proteins. The data has been oversampled produced Q3 score 45.49% for training data and 43.21% for testing data. For data that was not done oversampling produced Q3 score 43.50% for training data and 43.19% for testing data.
Collections
- UT - Computer Science [2335]