Clustering konsep dokumen berbahasa Indonesia menggunakan Bisecting K-means
Abstract
In recent years, we have seen a tremendous growth in the volume of text documents available on the Internet, digital libraries, news sources, and company-wide intranets. This has led to an increased interest in developing methods that can efficiently categorize and retrieve relevant information. Concept indexing (CI) is a dimensionality reduction algorithm. Recently, techniques based on dimensionality reduction have been explored for capturing the concepts present in a collection of documents. In this research we investigate concept indexing as interpretation concept in Indonesian documents for clustering documents using bisecting K-means. This research showed concept-based documents clustering was achievable and that it increased the F-measure up to 38% as compared to word-based clustering.
Collections
- UT - Computer Science [2236]