Web Document Clustering Through Metafile Generation for Digraph Structuring Using Document Index Graph Algorithm

Budi

Please use this identifier to cite or link to this item: http://repository.ipb.ac.id/handle/123456789/68276

Title:	Web Document Clustering Through Metafile Generation for Digraph Structuring Using Document Index Graph Algorithm
Authors:	Nurdiati, Sri Silalahi, Bib Paruhum Budi
Issue Date:	2014
Abstract:	Nowaday, the increased volume of data, especially on text documents and their implications for the issue of the accuracy of the search results and information retrieval has led to the development and the use of data management and analysis techniques. The technique is used to split the document into different groups so that the documents contained in a group will contain the same topic and related to each other. Therefore we need a method of grouping documents in order to facilitate the retrieval of information according to user needs. Clustering is a technique that can be used to discover linkages between documents. This technique separates a set of documents into several groups or clusters by calculating the similarity between documents. Documents that have been clustered, will help users finding the information needed and will increase the speed of access to that information. The scope of this research consists of : 1) the test and training documents using REUTERS newswire-21578; 2) algorithm generates output in metafile form that will be used as input to represent the structure of digraphs. Research methods perform literature studies, data preprocessing, implementation of Document Index Graph (DIG) algorithm, generating the metafile for digraphs construction, digraphs representation, and analysis of clustering result. Instead of three core processes tokenization , stop-word removal and stemming, data preprocessing stage is concerned with dimentional reduction mechanism. Dimentional reduction will determine the document frequency threshold values before clustering process. The results of data preprocessing will be followed by the implementation of the DIG algorithm. The algorithm calculates the weight of words that often appears in the document being processed. The results bring a bag of words that frequently appear more than 20 times. The output of this result is written into a metafile that will be used as input for the digraph structuring and representation. This research analyzes the results by calculating precision, recall and accuracy percentage on clustering result. DIG algorithm implementations using dimentional reduction mechanism through data preprocessing stage is able to produce an accuracy above 70 %.
URI:	http://repository.ipb.ac.id/handle/123456789/68276
Appears in Collections:	MT - Mathematics and Natural Science

Files in This Item:

File	Description	Size	Format
2014bud.pdf Restricted Access	Fulltext	1.15 MB	Adobe PDF	View/Open
BAB I Pendahuluan.pdf Restricted Access	Fulltext	497.08 kB	Adobe PDF	View/Open
BAB II Tinjauan Pustaka.pdf Restricted Access	BAB II	437.67 kB	Adobe PDF	View/Open
BAB III Metode.pdf Restricted Access	BAB III	405.74 kB	Adobe PDF	View/Open
BAB IV Hasil dan Pembahasan.pdf Restricted Access	BAB IV	427.15 kB	Adobe PDF	View/Open
BAB V Kesimpulan dan Saran.pdf Restricted Access	BAB V	277.68 kB	Adobe PDF	View/Open
Cover.pdf Restricted Access	cover	279.82 kB	Adobe PDF	View/Open
Daftar Pustaka.pdf Restricted Access	Daftar Pustaka	413.02 kB	Adobe PDF	View/Open
Lampiran.pdf Restricted Access	Lampiran	357.3 kB	Adobe PDF	View/Open
Ringkasan.pdf Restricted Access	Ringkasan	286.62 kB	Adobe PDF	View/Open

Show full item record Recommend this item

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets