Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/68276
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNurdiati, Sri
dc.contributor.advisorSilalahi, Bib Paruhum
dc.contributor.authorBudi
dc.date.accessioned2014-03-17T03:02:40Z
dc.date.available2014-03-17T03:02:40Z
dc.date.issued2014
dc.identifier.urihttp://repository.ipb.ac.id/handle/123456789/68276
dc.description.abstractNowaday, the increased volume of data, especially on text documents and their implications for the issue of the accuracy of the search results and information retrieval has led to the development and the use of data management and analysis techniques. The technique is used to split the document into different groups so that the documents contained in a group will contain the same topic and related to each other. Therefore we need a method of grouping documents in order to facilitate the retrieval of information according to user needs. Clustering is a technique that can be used to discover linkages between documents. This technique separates a set of documents into several groups or clusters by calculating the similarity between documents. Documents that have been clustered, will help users finding the information needed and will increase the speed of access to that information. The scope of this research consists of : 1) the test and training documents using REUTERS newswire-21578; 2) algorithm generates output in metafile form that will be used as input to represent the structure of digraphs. Research methods perform literature studies, data preprocessing, implementation of Document Index Graph (DIG) algorithm, generating the metafile for digraphs construction, digraphs representation, and analysis of clustering result. Instead of three core processes tokenization , stop-word removal and stemming, data preprocessing stage is concerned with dimentional reduction mechanism. Dimentional reduction will determine the document frequency threshold values before clustering process. The results of data preprocessing will be followed by the implementation of the DIG algorithm. The algorithm calculates the weight of words that often appears in the document being processed. The results bring a bag of words that frequently appear more than 20 times. The output of this result is written into a metafile that will be used as input for the digraph structuring and representation. This research analyzes the results by calculating precision, recall and accuracy percentage on clustering result. DIG algorithm implementations using dimentional reduction mechanism through data preprocessing stage is able to produce an accuracy above 70 %.en
dc.language.isoid
dc.titleWeb Document Clustering Through Metafile Generation for Digraph Structuring Using Document Index Graph Algorithmen
dc.subject.keywordDocument Index Graphen
dc.subject.keywordClusteringen
dc.subject.keywordREUTERS dataseten
dc.subject.keywordmetafileen
dc.subject.keyworddigraphen
Appears in Collections:MT - Mathematics and Natural Science

Files in This Item:
File Description SizeFormat 
2014bud.pdf
  Restricted Access
Fulltext1.15 MBAdobe PDFView/Open
BAB I Pendahuluan.pdf
  Restricted Access
Fulltext497.08 kBAdobe PDFView/Open
BAB II Tinjauan Pustaka.pdf
  Restricted Access
BAB II437.67 kBAdobe PDFView/Open
BAB III Metode.pdf
  Restricted Access
BAB III405.74 kBAdobe PDFView/Open
BAB IV Hasil dan Pembahasan.pdf
  Restricted Access
BAB IV427.15 kBAdobe PDFView/Open
BAB V Kesimpulan dan Saran.pdf
  Restricted Access
BAB V277.68 kBAdobe PDFView/Open
Cover.pdf
  Restricted Access
cover279.82 kBAdobe PDFView/Open
Daftar Pustaka.pdf
  Restricted Access
Daftar Pustaka413.02 kBAdobe PDFView/Open
Lampiran.pdf
  Restricted Access
Lampiran357.3 kBAdobe PDFView/Open
Ringkasan.pdf
  Restricted Access
Ringkasan286.62 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.