Perbandingan Metode Seleksi Fitur pada Spam Filter Menggunakan Klasifikasi Multinomial Naïve Bayes
Abstract
Nowadays lots of unwanted email called spam may freely get into the inbox entry. Therefore spam filter software made to classify spam and non-spam email (ham) automatically. Naïve Bayes frequently used today as classification method for it simple and easy to be implemented. Naïve bayes has a good performance to classify multinomial document compared to multivariate Bernoulli when it comes to large vocabulary. Feature selection needed to improve classification model accuracy and make computation process more efficient. There are three feature selection methods used such as inverse document frequency (IDF), mutual information (MI), and chi-square. Based on accuracy level, the result of this study shows that MI is the best feature selection method with 93.77% accuracy and 9507 vocabulary as an identifier
Collections
- UT - Computer Science [2323]