Text Document Classification with Minor Component Analysis.
Abstract
Document classification can improve information retrieval process by decreasing the search time and increase the relevance of the results. Many classification algorithms have been developed, e.g., Naïve Bayes Classifier, Nearest Neighbor, Principal Component Analysis, and Minor Component Analysis (MCA). This research investigates the performance of MCA in classifying text documents in Bahasa Indonesia. MCA has been applied for image classification, but has not been widely used in text classification. The dataset used in this research contains 750 documents from Media Indonesia Online, consisting of five classes, i.e., economics, education, crime, environment, and badminton. This research also observes the influence of stemming and stoplist in preprocessing to the classification performance. The experiment results show that MCA achieves more 90% accuracy and the preprocessing methods do not have significant effect to the performance. Keywords: document text classification, minor component analysis.
Collections
- UT - Computer Science [2324]