View Item 
      •   IPB Repository
      • Dissertations and Theses
      • Master Theses
      • MT - Mathematics and Natural Science
      • View Item
      •   IPB Repository
      • Dissertations and Theses
      • Master Theses
      • MT - Mathematics and Natural Science
      • View Item
      JavaScript is disabled for your browser. Some features of this site may not work without it.

      Web Document Clustering Through Metafile Generation for Digraph Structuring Using Document Index Graph Algorithm

      Thumbnail
      View/Open
      Fulltext (1.126Mb)
      Fulltext (497.0Kb)
      BAB II (437.6Kb)
      BAB III (405.7Kb)
      BAB IV (427.1Kb)
      BAB V (277.6Kb)
      cover (279.8Kb)
      Daftar Pustaka (413.0Kb)
      Lampiran (357.3Kb)
      Ringkasan (286.6Kb)
      Date
      2014
      Author
      Budi
      Nurdiati, Sri
      Silalahi, Bib Paruhum
      Metadata
      Show full item record
      Abstract
      Nowaday, the increased volume of data, especially on text documents and their implications for the issue of the accuracy of the search results and information retrieval has led to the development and the use of data management and analysis techniques. The technique is used to split the document into different groups so that the documents contained in a group will contain the same topic and related to each other. Therefore we need a method of grouping documents in order to facilitate the retrieval of information according to user needs. Clustering is a technique that can be used to discover linkages between documents. This technique separates a set of documents into several groups or clusters by calculating the similarity between documents. Documents that have been clustered, will help users finding the information needed and will increase the speed of access to that information. The scope of this research consists of : 1) the test and training documents using REUTERS newswire-21578; 2) algorithm generates output in metafile form that will be used as input to represent the structure of digraphs. Research methods perform literature studies, data preprocessing, implementation of Document Index Graph (DIG) algorithm, generating the metafile for digraphs construction, digraphs representation, and analysis of clustering result. Instead of three core processes tokenization , stop-word removal and stemming, data preprocessing stage is concerned with dimentional reduction mechanism. Dimentional reduction will determine the document frequency threshold values before clustering process. The results of data preprocessing will be followed by the implementation of the DIG algorithm. The algorithm calculates the weight of words that often appears in the document being processed. The results bring a bag of words that frequently appear more than 20 times. The output of this result is written into a metafile that will be used as input for the digraph structuring and representation. This research analyzes the results by calculating precision, recall and accuracy percentage on clustering result. DIG algorithm implementations using dimentional reduction mechanism through data preprocessing stage is able to produce an accuracy above 70 %.
      URI
      http://repository.ipb.ac.id/handle/123456789/68276
      Collections
      • MT - Mathematics and Natural Science [4143]

      Copyright © 2020 Library of IPB University
      All rights reserved
      Contact Us | Send Feedback
      Indonesia DSpace Group 
      IPB University Scientific Repository
      UIN Syarif Hidayatullah Institutional Repository
      Universitas Jember Digital Repository
        

       

      Browse

      All of IPB RepositoryCollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

      My Account

      Login

      Application

      google store

      Copyright © 2020 Library of IPB University
      All rights reserved
      Contact Us | Send Feedback
      Indonesia DSpace Group 
      IPB University Scientific Repository
      UIN Syarif Hidayatullah Institutional Repository
      Universitas Jember Digital Repository