Mesin Pencari Dokumen Bahasa Indonesia Menggunakan Latent Semantic Indexing dengan Pembobotan Global
Abstract
Current users tend to like search engine based on semantic of word. This is caused by the existence of synonymy and polysemy problems in the selection of the use of the word. One technique to resolve these issue is Latent Semantic Indexing (LSI). LSI has the ability to find relevant documents even if the word of the query are not written in the document. Currently, TF-IDF term weight algorithm is widely applied in search engines. Xia and Chai (2011) stated that, in a document collection, the term with higher frequency and hypo-dispersion distribution usually contains less information. The purpose of this research is to implement LSI using Singular Value Decomposition (SVD) method with term distribution based global term weight. This research used 1000 Indonesian agricultural documents. The performance of search engine using LSI with term-distribution-based global term weight gave highest average precision around 40.47%. The test result also showed that LSI with term-distribution-based global term weight gives better acuracy than LSI with TF-IDF.
Collections
- UT - Computer Science [2322]