Clustering metagenome fragments using growing self organizing map
View/ Open
Date
2013Author
Overbeek, Marlinda Vasty
Kusuma, Wisnu Ananta
Buono, Agus
Metadata
Show full item recordAbstract
Metagenome is a research about analyzing microbes in the large community and allowed the culture-independent. The microorganism samples taken directly from environment is not easy to assembly because contains mixture microorganism. If sample complexity is very high and come from high diversity environment, difficulties of assembling DNA sequence are increasing because the interspecies chimeras can be happen. Clustering commonly using supervised learning, but the supervised learning depends on avaibillity of data training. Because of that, in this research we used unsupervised learning to clustering the metagenome fragments. Beside that, clustering usually using the longer fragments, which is ≥ 8 kbp and have a small community (less than 100 microorganism). The purpose of this research is to analyze the effectiveness and efficiency of Growing Self Organizing Map to the clustered large community of metagenome fragments. We used trinucleotide, tetranucleotide, and combination of oligonucleotide frequency that consider the don’t care situation called spaced k-mer frequency as a features. As a feature extraction, we using k-mer and spaced k-mer. Based on parameter combination using oligonucleotide frequency, the best combine between Learning Rate and Neighborhood Size is a spaced k-mer frequency. We tested to get a better parameter combinatoin into [10 10] map size and 10 epochs training lenght. Error to mapped metagenome fragments using spaced k-mer frequency is 0.665 for quantization error, 0.06 for topographic error and 13.07% for error percentage. Using the map size between [100 – 500], map unit 300 – 5000 unit, and training lenght 10 epochs, gives the best training in the map size [100 150] with 300 map unit. The training time is 51 minutes and percentage error is 6.43%.