Prediksi Interaksi Senyawa-Protein untuk Drug Repurposing Anti COVID-19 Menggunakan Metode Convolutional Neural Network

Safitri, Bella Anggita

View/Open

Cover (437.4Kb)

Fullteks (2.443Mb)

Lampiran (99.65Kb)

Date

2021

Author

Safitri, Bella Anggita

Wijaya, Sony Hartono

Metadata

Show full item record

Abstract

COVID-19 menyebabkan masalah kesehatan seperti demam, batuk kering, gangguan pernapasan, dan bahkan kematian. Penemuan obat secara tradisional memerlukan banyak sumber daya, sehingga pendekatan komputasional menjadi salah satu pendekatan yang efisien untuk screening senyawa potensial melalui prediksi interaksi senyawa-protein. Model deep learning yang digunakan pada penelitian ini adalah Convolutional Neural Network (CNN). Hasil pemodelan CNN dibandingkan dengan model Support Vector Machine dan Naive Bayes dengan representasi fitur protein Amino Acid Composition (AAC) dan Dipeptide Composition (DC). Selain itu, juga diamati pengaruh penggunaan seleksi fitur pada model. Selanjutnya, kinerja dari metode untuk memprediksi interaksi senyawa dan protein diukur dengan menggunakan akurasi, precision, recall, F-measure, dan AUROC. Hasil penelitian menunjukkan bahwa pemodelan dengan representasi fitur protein DC lebih baik dibandingkan dengan AAC. Pemodelan interaksi senyawa-protein menggunakan PubChem fingerprint sebagai representasi senyawa dan DC sebagai representasi protein pada CNN dengan seleksi fitur ANOVA menghasilkan kinerja terbaik dengan nilai akurasi sebesar 0.9475, recall 0.9687, precision 0.9679, F-measure 0.9683, dan AUROC 0.9751.

COVID-19 is a disease that causes health problems. Traditional drug discovery requires many resources. Thus, the computational approach is one of the approaches that can be employed to screen potential compounds through the prediction of compound-protein interactions. The deep learning model used in this study is Convolutional Neural Network (CNN). The results of the CNN model were compared to Support Vector Machine (SVM) and Naive Bayes (NB) with representations of proteins using Amino Acid Composition (AAC) and Dipeptide Composition (DC). We also examined the effect of the feature selection approach using ANOVA. The results were evaluated in terms of accuracy, precision, recall, F-measure, and AUROC. Results showed that modeling with a representation of DC protein features was better than AAC. Prediction of compound-protein interaction modeling using PubChem fingerprint as a compound representation and DC as protein representation on CNN using ANOVA feature selection resulted in the best performance with an accuracy value of 0.9475, recall 0.9687, precision 0.9679, F-measure 0.9683, and AUROC 0.9751.

URI

http://repository.ipb.ac.id/handle/123456789/108557

Collections

UT - Computer Science [2482]