Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/68276
Title: Web Document Clustering Through Metafile Generation for Digraph Structuring Using Document Index Graph Algorithm
Authors: Nurdiati, Sri
Silalahi, Bib Paruhum
Budi
Issue Date: 2014
Abstract: Nowaday, the increased volume of data, especially on text documents and their implications for the issue of the accuracy of the search results and information retrieval has led to the development and the use of data management and analysis techniques. The technique is used to split the document into different groups so that the documents contained in a group will contain the same topic and related to each other. Therefore we need a method of grouping documents in order to facilitate the retrieval of information according to user needs. Clustering is a technique that can be used to discover linkages between documents. This technique separates a set of documents into several groups or clusters by calculating the similarity between documents. Documents that have been clustered, will help users finding the information needed and will increase the speed of access to that information. The scope of this research consists of : 1) the test and training documents using REUTERS newswire-21578; 2) algorithm generates output in metafile form that will be used as input to represent the structure of digraphs. Research methods perform literature studies, data preprocessing, implementation of Document Index Graph (DIG) algorithm, generating the metafile for digraphs construction, digraphs representation, and analysis of clustering result. Instead of three core processes tokenization , stop-word removal and stemming, data preprocessing stage is concerned with dimentional reduction mechanism. Dimentional reduction will determine the document frequency threshold values before clustering process. The results of data preprocessing will be followed by the implementation of the DIG algorithm. The algorithm calculates the weight of words that often appears in the document being processed. The results bring a bag of words that frequently appear more than 20 times. The output of this result is written into a metafile that will be used as input for the digraph structuring and representation. This research analyzes the results by calculating precision, recall and accuracy percentage on clustering result. DIG algorithm implementations using dimentional reduction mechanism through data preprocessing stage is able to produce an accuracy above 70 %.
URI: http://repository.ipb.ac.id/handle/123456789/68276
Appears in Collections:MT - Mathematics and Natural Science

Files in This Item:
File Description SizeFormat 
2014bud.pdf
  Restricted Access
Fulltext1.15 MBAdobe PDFView/Open
BAB I Pendahuluan.pdf
  Restricted Access
Fulltext497.08 kBAdobe PDFView/Open
BAB II Tinjauan Pustaka.pdf
  Restricted Access
BAB II437.67 kBAdobe PDFView/Open
BAB III Metode.pdf
  Restricted Access
BAB III405.74 kBAdobe PDFView/Open
BAB IV Hasil dan Pembahasan.pdf
  Restricted Access
BAB IV427.15 kBAdobe PDFView/Open
BAB V Kesimpulan dan Saran.pdf
  Restricted Access
BAB V277.68 kBAdobe PDFView/Open
Cover.pdf
  Restricted Access
cover279.82 kBAdobe PDFView/Open
Daftar Pustaka.pdf
  Restricted Access
Daftar Pustaka413.02 kBAdobe PDFView/Open
Lampiran.pdf
  Restricted Access
Lampiran357.3 kBAdobe PDFView/Open
Ringkasan.pdf
  Restricted Access
Ringkasan286.62 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.