Penapisan Klinis dan Antropometri untuk Whole Genome sequencing pada Individu Sehat Suku Sunda
Date
2025Author
Jamilah, Richa Wijayanti
Andrianto, Dimas
Pratama, Rahadian
Metadata
Show full item recordAbstract
Indonesia merupakan negara dengan keanekaragaman genetik yang sangat tinggi, dan memiliki kompleksitas sejarah migrasi dan geografi kepulauan. Namun, hingga kini referensi genetik lokal yang mewakili populasi asli Indonesia, termasuk suku Sunda, masih terbatas. Suku Sunda merupakan salah satu suku dengan populasi terbesar di Indonesia. Di bidang etnolinguistik, masyarakat Suku Sunda telah menjadi dasar banyak penelitian, namun masih kurang terwakili dalam studi genomik. Meskipun memiliki populasi yang besar dan beragam, penerapan dan basis data Whole Genome Sequencing di Indonesia, terutama pada individu Suku Sunda belum tersedia secara memadai. Studi ini menyajikan data fenotipe dan klinis pra-sequencing serta data pasca-sequencing dari Whole Genome Sequencing untuk analisis lanjutan sebagai referensi genetik Indonesia di masa depan. Data sequencing yang diperoleh dianalisis dan dibandingkan dengan populasi lain yang ada di dunia untuk mengidentifikasi kekhasan genetik populasi Sunda sekaligus mengevaluasi posisinya dalam struktur populasi global dengan Principle Componen Analysis. Penelitian ini bertujuan menyusun pondasi awal referensi genetik dari populasi Sunda melalui Whole Genome Sequencing (WGS) terhadap individu sehat dari suku Sunda.
Metode penelitian yang dilakukan meliputi tiga tahapan. Tahap pertama penelitian dilakukan Pengajuan Izin etik pada Komisi Etik Penelitian untuk Manusia IPB dilanjutkan dengan perekrutan responden penelitian. Tahap kedua meliputi proses ekstraksi dan Running Sequencing sampel pada alat sekunsing PromethION 2 Solo dari Oxford Nanopore Technology. Kemudian tahap ketiga yaitu analisis bioinformatika meliputi pengecekan kualitas, Alignment dan genotyping pada genom referensi kemudian dilakukan varian calling dan anotasi jenis varian serta anotasi klinis pada database Clinvar untuk mengetahui konsekuensi dari varian serta implikasi biologis yang dapat ditimbulkan. Penilaian struktur Populasi di proyeksikan pada Principle Componen Analisis pada 99 sampel dari 10 populasi yang tersedia pada database global.
Hasil Penelitian mendapatkan satu individual subjek responden yang berasal dari Sukabumi, Jawa Barat dengan asal suku keluarga sampai keturunan kakek nenek dari Suku Sunda. Subjek yang diikut sertakan telah sesuai dengan kriteria inklusi yang ditetapkan yakni tidak memiliki riwayat penyakit degeneratif dan tidak memiliki riwayat keluarga dengan riwayat mutasi autosomal seperti hemofilia, talasemia, dan polidaktili. Subjek memiliki hasil klinis berupa pemeriksaan darah menggunakan glukocheck 3in1 dengan nilai kisaran normal yakni Berat Badan (BB) 55,30 kg, Tinggi Badan (TB) 171,80 cm, Indeks Massa Tubuh (IMT) 18,70, kadar glukosa darah puasa 69 mg/dL kadar kolesterol 141 mg/dL, dan asam urat 3,3 mg/dL. Pengukuran Antropometri Nasal Indeks bernilai 123,23 termasuk jenis nasal plattyrrine atau hidung lebar dengan ukuran sudut hidung 45°. Hasil ekstraksi sampel darah dengan kit Favorgen blood DNA Extraction secara duplo menghasilkan konsentrasi berturut-turut sebesar 28 ng/µL dengan total volume eluen sebanyak 48 µL dan 49 µL. Setelah melalui tahapan adapter ligation and clean-Up konsentrasi DNA yang dihasilkan melalui pengukuran konsentrasi sebesar 30 ng/µL dengan total eluen sebanyak 25 µL. sehingga total DNA untuk proses priming sebesar 750 ng dan termasuk dalam jumlah minimum dari rekomendasi kit yang digunakan.
Hasil Alignment pada genom referensi mendapatkan total reads sepanjang lebih dari 2 juta dengan nilai N50 sebesar 46kbp. Nilai N50 46kb menunjukkan bahwa setengah dari seluruh basa berasal dari read yang panjangnya 46 kb atau lebih. Coverage dari alignment ini bernilai 33.90x yang merupakan Coverage tinggi dan sampel terpetakan 100% pada genom referensi. Hasil variant calling sampel S01 Suku Sunda diketahui memiliki total varian genetik sebanyak 596.799 dengan distribusi yang terdiri dari 473.269 Single Nucleotide Variants (SNV) dan 124.094 insersi-delesi (INDELS) dan memiliki nilai rasio transisi terhadap transversi (Ti/Tv ratio) sebesar 2,07. Varian-varian yang terdeteksi pada individu Suku Sunda menggunakan anotasi ClinVar sebanyak total 316 varian dengan total varian berdasarkan jenis meliputi SNV sebanyak 252, kemudian 22 varian duplikasi, 20 varian delesi, 18 varian mikrosatelit dan 4 varian insersi. Principle Component Analysis menjelaskan struktur Sampel Suku Sunda tumpang tindih terhadap kluster wilayah Asia dari referensi. Struktur Populasi Sunda dekat dengan populasi Wilayah Asia Tenggara seperti populasi Kamboja, Vietnam, dan Singapore Malay. Principle Component Analysis kedua dengan mengecualikan populasi Papuan menghasilkan posisi sampel Suku Sunda yang terpisah dan unik dari populasi wilayah Asia lainnya. Simpulan penelitian ini telah dilakukan analisis Whole Genome Sequencing yang terkarakterisasi sehat berdasarkan parameter klinis dan pengukuran antropometri, identifikasi varian genetik dan efek signifikansinya, serta menunjukkan posisi individu Sunda yang unik namun berkerabat dekat dengan populasi pada wilayah Asia Tenggara lainnya Indonesia is a highly genetically diverse country with a complex history of migration and an archipelagic geography. However, genetic references representing indigenous populations, including the Sundanese, are limited. The Sundanese are one of the largest ethnic groups in Indonesia. Although the Sundanese people have been the basis for numerous ethnolinguistic studies, this population remains underrepresented in genomic studies. Despite having a large and diverse population, Whole Genome Sequencing is rarely used in Indonesia, particularly among Sundanese individuals. This study presents pre- and post-sequencing phenotypic, clinical, and Whole Genome Sequencing data for further analysis and as a future genetic reference for Indonesia. We analyzed and compared the obtained sequencing data with that of other populations worldwide to identify the Sundanese population's genetic distinctiveness and evaluate its position within the global population structure using principal component analysis. The goal of this study is to lay the preliminary basis for genetic references for the Sundanese population by performing Whole Genome Sequencing (WGS) on healthy Sundanese individuals.
The research methodology comprised three stages. The initial stage entailed obtaining ethical clearance from the IPB Human Research Ethics Committee, subsequently followed by the recruitment of study participants. The subsequent stage entailed the extraction and sequencing of samples, which was facilitated by the PromethION 2 Solo sequencing tool from Oxford Nanopore Technology. In the third stage of the process, bioinformatics analysis was employed. This analysis included the following: quality control, alignment, and genotyping of the reference genome. Subsequently, variant calling and annotation of variant types, in conjunction with clinical annotation, were performed in the Clinvar database. This approach was employed to ascertain the consequences of the variants and their potential biological implications. The population structure assessment was projected using Principal Component Analysis on 99 samples from 10 populations that were available in the global database.
The study identified a single individual respondent from Sukabumi, West Java, who had ancestors and grandparents from the Sundanese community. The subjects met the established inclusion criteria, which included the absence of a history of degenerative diseases and the absence of a family history of autosomal mutations, such as hemophilia, thalassemia, and polydactyly. The subject's clinical results, obtained through blood tests using Glucocheck 3in1, revealed normal range values for various parameters. These parameters included body weight (55.30 kg) and height (171.80 cm), both of which corresponded to the standard values. Additionally, the body mass index (BMI) was recorded as 18.70, while the fasting blood glucose level, cholesterol level, and uric acid level were 69 mg/dL, 141 mg/dL, and 3.3 mg/dL, respectively. The Nasal Anthropometric Index measurement was found to be 123.23, indicating a classification as the nasal plattyrrine type, characterized by a wide nose with a nose angle of 45°.
The results of blood sample extraction with the Favorgen DNA Extraction kit in duplicate produced consecutive concentrations of 28 ng/µL with a total eluent volume of 48 µL and 49 µL. After going through the adapter ligation and clean-up stages, the DNA concentration produced through quantus flourometer measurements was 30 ng/µL with a total eluent of 25 mL. Therefore, the total DNA for priming was 750 ng, which is within the minimum recommended amount of the kit used.
The alignment results on the reference genome yielded a total of over 2 million reads with an N50 value of 46 kbp. An N50 value of 46 kb indicates that half of all bases come from reads 46 kb or longer. The coverage of this alignment was 33.9x, which is high, and the sample mapped 100% to the reference genome. Variant calling results for the Sundanese S01 sample revealed a total of 596,799 genetic variants, distributed among 473,269 single nucleotide variants (SNVs) and 124,094 insertion-deletion variants (INDELs), with a transition-to-transversion ratio (Ti/Tv ratio) of 2.07. Clinvar annotation yielded the most variants, with 27.8% being of uncertain significance, and at least 1.6% were pathogenic and 0.9% likely pathogenic. Other variants detected in Sundanese individuals, out of a total of 316 variants, by type, included 252 SNVs, 22 duplication variants, 20 deletion variants, 18 microsatellite variants, and 4 insertion variants. Principle Component Analysis (PCA) explained the overlapping structure of the Sundanese sample with the Asian region cluster from the reference.
The anthropometric nasal index of the Sundanese sample was broad-nosed. Library preparation was performed with a total of 750 ng of DNA priming, and alignment to the GRCh38 reference genome resulted in a total of 2,215,390 reads and a mean coverage of 33,9x. The Sundanese sample genome mapped 100% to the reference genome. Sample S01 has a total genomic variation of 473,269 SNVs and 124,094 INDELS. The significance of the detected pathogenic variants based on ClinVar annotation is 1.6%. The Sundanese population structure is close to Southeast Asian populations such as Cambodian, Vietnamese, and Singaporean Malay. The second Principal Component Analysis, excluding the Papuan population, resulted in the Sundanese tribe's samples being positioned separately and uniquely from other Asian populations. In conclusion, this study include Whole Genome Sequencing analysis characterized as healthy based on clinical parameters and anthropometric measurements, identification of genetic variants and their significant effects, and demonstrating the unique yet closely related position of Sundanese individuals to other populations in the Southeast Asian region.
