Metode Pembobotan Kata Berbasis Sebaran untuk Temu Kembali Informasi Dokumen Bahasa Indonesia
Abstract
Term weight algorithm plays an important role in the process of document searching, which is greatly influenced by the precision and recall results of the Search Engine. Currently, TF-IDF term weight algorithm is widely applied in language models to build the search engine systems. Since term frequency is not the only discriminator which is necessary to be considered in term weighting and make each weight suitable to indicate the term’s importance, term weighting algorithm based on term distribution has been developed. In a single document, a term with higher frequency and closer to hypo-dispersion distribution usually contains more semantic information and should be given higher weight. One the other hand, in collection of documents, the term with higher frequency and hypo-dispersion distribution usually contains less information. This research implements term weight based on term distribution, with Local Term Weight Algorithm and Global Term Weight Algorithm for the documents in Indonesian Language. The result of this research is a Search Engine with an average precision of 84.8%.
Collections
- UT - Computer Science [2322]