Please use this identifier to cite or link to this item:
Title: Visualisasi Data Berkelompok dengan Analisis Komponen Utama Kernel
Authors: Siswadi
Bakhtiar, Toni
Lobo, Sonna Ariyanto
Issue Date: 2015
Publisher: IPB (Bogor Agricultural University)
Abstract: Visualization is engineered in making drawings, diagrams or animations to the appearance of an information. Visualization is often used in exploratory data analysis to summarize the main characteristic of the data. Its main goal is to communicate information clearly and effectively to users through graphs, plots, and tables so it makes the data to be more accessible, understood and used. For classified data, visualization can be done directly by plotting the data object in the ordinary space. But, for the data with dimension is greater than three, the visualization cannot be done directly. It can be done by using Principal Component Analysis (PCA). PCA is used to reduce the dimensionality of the data to be the new data with smaller dimensions, in this case into 2-dimensional space, so it can be visualized. The new variables that represent the data are called principal components. However, PCA could not model the high complexity data with nonlinear relationships among variables. Therefore, Kernel PCA was developed as the generalization of ordinary PCA. Implicitly, Kernel PCA is a nonlinear form of ordinary PCA which maps the original data that have nonlinear relationships among variables into highdimensional feature space. At the space, the data obtained has linear relationships among variables so that ordinary PCA can be applied. This study aimed to compare the result of visualization between PCA and Kernel PCA in separating the data. In Kernel PCA, kernel function used in this study is Gaussian kernel function with  as parameter. The ability of Gaussian kernel function to map the classified data into feature space to obtain the separation among classes is very dependent on the parameter. So that, before using Kernel PCA, we need to assess the selection method for parameter σ in Gaussian kernel function to visualize of classified data that minimize misclassification. The classification used Fisher linear discriminant analysis to show the separation of the data in the feature space. By empirical study of the two synthetic data sets, namely Wang Synthetic Data set and Synthetic Control Chart Time Series Data set and two real world data sets, namely Iris Data set and Wine Data set, we obtained that Kernel PCA gave better results in visualizing the separation of the data compared with PCA. Parameter values in Gauss kernel function for Kernel PCA which provides results visualization to minimize misclassification is proposed to be selected in the interval min ,max  ; . i j i j   
Appears in Collections:MT - Agriculture Technology

Files in This Item:
File SizeFormat 
  Restricted Access
19.73 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.