1
|
Khan SY, Ali M, Lee MCW, Ma Z, Biswas P, Khan AA, Naeem MA, Riazuddin S, Riazuddin S, Ayyagari R, Hejtmancik JF, Riazuddin SA. Whole genome sequencing data of multiple individuals of Pakistani descent. Sci Data 2020; 7:350. [PMID: 33051442 PMCID: PMC7555865 DOI: 10.1038/s41597-020-00664-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 09/02/2020] [Indexed: 11/25/2022] Open
Abstract
Here we report whole genome sequencing of four individuals (H3, H4, H5, and H6) from a family of Pakistani descent. Whole genome sequencing yielded 1084.92, 894.73, 1068.62, and 1005.77 million mapped reads corresponding to 162.73, 134.21, 160.29, and 150.86 Gb sequence data and 52.49x, 43.29x, 51.70x, and 48.66x average coverage for H3, H4, H5, and H6, respectively. We identified 3,529,659, 3,478,495, 3,407,895, and 3,426,862 variants in the genomes of H3, H4, H5, and H6, respectively, including 1,668,024 variants common in the four genomes. Further, we identified 42,422, 39,824, 28,599, and 35,206 novel variants in the genomes of H3, H4, H5, and H6, respectively. A major fraction of the variants identified in the four genomes reside within the intergenic regions of the genome. Single nucleotide polymorphism (SNP) genotype based comparative analysis with ethnic populations of 1000 Genomes database linked the ancestry of all four genomes with the South Asian populations, which was further supported by mitochondria based haplogroup analysis. In conclusion, we report whole genome sequencing of four individuals of Pakistani descent. Measurement(s) | SNV • genome | Technology Type(s) | whole genome sequencing • DNA sequencing | Factor Type(s) | individual | Sample Characteristic - Organism | Homo sapiens | Sample Characteristic - Location | Pakistan |
Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12642761
Collapse
Affiliation(s)
- Shahid Y Khan
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Muhammad Ali
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
| | - Mei-Chong W Lee
- Department of Computer Science, San José State University, San José, CA, 95192, USA
| | - Zhiwei Ma
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Pooja Biswas
- Shiley Eye Institute, University of California San Diego, La Jolla, CA, 92093, USA
| | - Asma A Khan
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, 53700, Pakistan
| | - Muhammad Asif Naeem
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, 53700, Pakistan
| | - Saima Riazuddin
- Department of Otorhinolaryngology-Head & Neck Surgery, University of Maryland School Medicine, Baltimore, MD, 21201, USA
| | - Sheikh Riazuddin
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, 53700, Pakistan.,Allama Iqbal Medical College, University of Health Sciences, Lahore, 54550, Pakistan.,Department of Molecular Biology, Shaheed Zulfiqar Ali Bhutto Medical University, Islamabad, 44080, Pakistan
| | - Radha Ayyagari
- Shiley Eye Institute, University of California San Diego, La Jolla, CA, 92093, USA
| | - J Fielding Hejtmancik
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - S Amer Riazuddin
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA.
| |
Collapse
|
2
|
AlSafar HS, Al-Ali M, Elbait GD, Al-Maini MH, Ruta D, Peramo B, Henschel A, Tay GK. Introducing the first whole genomes of nationals from the United Arab Emirates. Sci Rep 2019; 9:14725. [PMID: 31604968 PMCID: PMC6789106 DOI: 10.1038/s41598-019-50876-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2018] [Accepted: 09/20/2019] [Indexed: 12/30/2022] Open
Abstract
Whole Genome Sequencing (WGS) provides an in depth description of genome variation. In the era of large-scale population genome projects, the assembly of ethnic-specific genomes combined with mapping human reference genomes of underrepresented populations has improved the understanding of human diversity and disease associations. In this study, for the first time, whole genome sequences of two nationals of the United Arab Emirates (UAE) at >27X coverage are reported. The two Emirati individuals were predominantly of Central/South Asian ancestry. An in-house customized pipeline using BWA, Picard followed by the GATK tools to map the raw data from whole genome sequences of both individuals was used. A total of 3,994,521 variants (3,350,574 Single Nucleotide Polymorphisms (SNPs) and 643,947 indels) were identified for the first individual, the UAE S001 sample. A similar number of variants, 4,031,580 (3,373,501 SNPs and 658,079 indels), were identified for UAE S002. Variants that are associated with diabetes, hypertension, increased cholesterol levels, and obesity were also identified in these individuals. These Whole Genome Sequences has provided a starting point for constructing a UAE reference panel which will lead to improvements in the delivery of precision medicine, quality of life for affected individuals and a reduction in healthcare costs. The information compiled will likely lead to the identification of target genes that could potentially lead to the development of novel therapeutic modalities.
Collapse
Affiliation(s)
- Habiba S AlSafar
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Mariam Al-Ali
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Gihan Daw Elbait
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | | | - Dymitr Ruta
- Etisalat-British Telecom Innovation Center, Abu Dhabi, United Arab Emirates
| | | | - Andreas Henschel
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates.,Department of Computer Science, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates
| | - Guan K Tay
- Center of Biotechnology, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates. .,Department of Biomedical Engineering, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates. .,College of Medicine and Health Sciences, Khalifa University of Science and Technology, Abu Dhabi, United Arab Emirates. .,School of Psychiatry and Clinical Neurosciences, University of Western Australia, Nedlands, Australia. .,School of Medical and Health Sciences, Edith Cowan University, Joondalup, Australia.
| |
Collapse
|
3
|
Sivasubbu S, Scaria V. Genomics of rare genetic diseases-experiences from India. Hum Genomics 2019; 14:52. [PMID: 31554517 PMCID: PMC6760067 DOI: 10.1186/s40246-019-0215-5] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2019] [Accepted: 06/26/2019] [Indexed: 12/15/2022] Open
Abstract
Home to a culturally heterogeneous population, India is also a melting pot of genetic diversity. The population architecture characterized by multiple endogamous groups with specific marriage patterns, including the widely prevalent practice of consanguinity, not only makes the Indian population distinct from rest of the world but also provides a unique advantage and niche to understand genetic diseases. Centuries of genetic isolation of population groups have amplified the founder effects, contributing to high prevalence of recessive alleles, which translates into genetic diseases, including rare genetic diseases in India.Rare genetic diseases are becoming a public health concern in India because a large population size of close to a billion people would essentially translate to a huge disease burden for even the rarest of the rare diseases. Genomics-based approaches have been demonstrated to accelerate the diagnosis of rare genetic diseases and reduce the socio-economic burden. The Genomics for Understanding Rare Diseases: India Alliance Network (GUaRDIAN) stands for providing genomic solutions for rare diseases in India. The consortium aims to establish a unique collaborative framework in health care planning, implementation, and delivery in the specific area of rare genetic diseases. It is a nation-wide collaborative research initiative catering to rare diseases across multiple cohorts, with over 240 clinician/scientist collaborators across 70 major medical/research centers. Within the GUaRDIAN framework, clinicians refer rare disease patients, generate whole genome or exome datasets followed by computational analysis of the data for identifying the causal pathogenic variations. The outcomes of GUaRDIAN are being translated as community services through a suitable platform providing low-cost diagnostic assays in India. In addition to GUaRDIAN, several genomic investigations for diseased and healthy population are being undertaken in the country to solve the rare disease dilemma.In summary, rare diseases contribute to a significant disease burden in India. Genomics-based solutions can enable accelerated diagnosis and management of rare diseases. We discuss how a collaborative research initiative such as GUaRDIAN can provide a nation-wide framework to cater to the rare disease community of India.
Collapse
Affiliation(s)
| | - Sridhar Sivasubbu
- CSIR Institute of Genomics and Integrative Biology, Delhi, 110025, India.
| | - Vinod Scaria
- CSIR Institute of Genomics and Integrative Biology, Delhi, 110025, India.
| |
Collapse
|
4
|
Khan SY, Kabir F, M'Hamdi O, Jiao X, Naeem MA, Khan SN, Riazuddin S, Hejtmancik JF, Riazuddin SA. Whole genome sequencing data for two individuals of Pakistani descent. Sci Data 2018; 5:180174. [PMID: 30204152 PMCID: PMC6137601 DOI: 10.1038/sdata.2018.174] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 07/11/2018] [Indexed: 12/30/2022] Open
Abstract
Here we report next-generation based whole genome sequencing of two individuals (H1 and H2) from a family of Pakistani descent. The genomic DNA was used to prepare paired-end libraries for whole-genome sequencing. Deep sequencing yielded 706.49 and 778.12 million mapped reads corresponding to 70.64 and 77.81 Gb sequence data and 23× and 25× average coverage for H1 and H2, respectively. Notably, a total of 448,544 and 470,683 novel variants, not present in the single nucleotide polymorphism database (dbSNP), were identified in H1 and H2, respectively. Comparative analysis identified 2,415,852 variants common in both genomes including 240,181 variants absent in the dbSNP. Principal component analysis linked the ancestry of both genomes with South Asian populations. In conclusion, we report whole genome sequences of two individuals from a family of Pakistani descent.
Collapse
Affiliation(s)
- Shahid Y Khan
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Firoz Kabir
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Oussama M'Hamdi
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Xiaodong Jiao
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Muhammad Asif Naeem
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore 53700, Pakistan
| | - Shaheen N Khan
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore 53700, Pakistan
| | - Sheikh Riazuddin
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore 53700, Pakistan
| | - J Fielding Hejtmancik
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - S Amer Riazuddin
- The Wilmer Eye Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| |
Collapse
|
5
|
Malhotra S, Singh S, Sarkar S. Whole genome variant analysis in three ethnically diverse Indians. Genes Genomics 2018; 40:497-510. [PMID: 29892955 DOI: 10.1007/s13258-018-0650-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Accepted: 01/02/2018] [Indexed: 12/21/2022]
Abstract
India represents an amazing confluence of geographically, linguistically and socially disparate ethnic populations (Indian Genome Variation Consortium, J Genet 87:3-20, 2008). Understanding the genetic diversity of Indian population remains a daunting task. In this paper we present detailed analysis of genomic variations (high-depth coverage (~ 30×) using Illumina Hiseq 2000 platform) from three healthy Indian male individuals each belonging to three geographically delineated regions and linguistic phylum viz. high altitude region of Ladakh (Tibeto-Burman linguistic phylum), sub mountainous region of Kumaun (Indo-European linguistic phylum) and sea level region of Telangana (Dravidian linguistic phylum) for probing the extent of genetic diversity in our population. The sequencing analysis provided high quality data (~ 95% of the total reads aligned to the human reference genome for each sample) and very good alignment quality (> 80% of the filtered mapped reads had a quality score of 60). A total of 4.3, 3.7 and 4.3 million single nucleotide variations were identified in the genome of high altitude, sub mountainous and sea level respectively by comparing with human reference genome. Approximately 17.3, 18.2, 17.4% of the variants were unique in the three genomes. The study identified many novel variations in the three diverse genomes (132,970 in Ladakh, 112,317 in Kumaun and 128,881 in Telangana individual) and is an important resource for creating a baseline and a comprehensive catalogue of human genomic variation across the Indian as well as the Asian continent.
Collapse
Affiliation(s)
- Seema Malhotra
- Defence Institute of Physiology and Allied Sciences (DIPAS), Defence Research and Development Organization, Ministry of Defence, Government of India, Lucknow Road, Delhi, 110054, India
| | - Sayar Singh
- Defence Institute of Physiology and Allied Sciences (DIPAS), Defence Research and Development Organization, Ministry of Defence, Government of India, Lucknow Road, Delhi, 110054, India
| | - Soma Sarkar
- Defence Institute of Physiology and Allied Sciences (DIPAS), Defence Research and Development Organization, Ministry of Defence, Government of India, Lucknow Road, Delhi, 110054, India.
| |
Collapse
|
6
|
Hariprakash JM, Vellarikkal SK, Verma A, Ranawat AS, Jayarajan R, Ravi R, Kumar A, Dixit V, Sivadas A, Kashyap AK, Senthivel V, Sehgal P, Mahadevan V, Scaria V, Sivasubbu S. SAGE: a comprehensive resource of genetic variants integrating South Asian whole genomes and exomes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:1-10. [PMID: 30184194 PMCID: PMC6146123 DOI: 10.1093/database/bay080] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 07/03/2018] [Indexed: 11/20/2022]
Abstract
South Asia is home to \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{upgreek}
\usepackage{mathrsfs}
\setlength{\oddsidemargin}{-69pt}
\begin{document}
}{}$\sim $\end{document}20% of the world population and characterized by distinct ethnic, linguistic, cultural and genetic lineages. Only limited representative samples from the region have found its place in large population-scale international genome projects. The recent availability of genome scale data from multiple populations and datasets from South Asian countries in public domain motivated us to integrate the data into a comprehensive resource. In the present study, we have integrated a total of six datasets encompassing 1213 human exomes and genomes to create a compendium of 154 814 557 genetic variants and adding a total of 69 059 255 novel variants. The variants were systematically annotated using public resources and along with the allele frequencies are available as a browsable-online resource South Asian genomes and exomes. As a proof of principle application of the data and resource for genetic epidemiology, we have analyzed the pathogenic genetic variants causing retinitis pigmentosa. Our analysis reveals the genetic landscape of the disease and suggests subset of genetic variants to be highly prevalent in South Asia.
Collapse
Affiliation(s)
- Judith Mary Hariprakash
- GN Ramachandran Knowledge Center for Genome Informatics, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Shamsudheen Karuthedath Vellarikkal
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Ankit Verma
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Anop Singh Ranawat
- GN Ramachandran Knowledge Center for Genome Informatics, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Rijith Jayarajan
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Rowmika Ravi
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Anoop Kumar
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Vishal Dixit
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Ambily Sivadas
- GN Ramachandran Knowledge Center for Genome Informatics, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Atul Kumar Kashyap
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Vigneshwar Senthivel
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Paras Sehgal
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Vijayalakshmi Mahadevan
- School of Chemical & Biotechnology, Shanmugha Arts, Science, Technology and Research Academy (SASTRA) University, Thanjavur, Tamil Nadu 613402, India
| | - Vinod Scaria
- GN Ramachandran Knowledge Center for Genome Informatics, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| | - Sridhar Sivasubbu
- Genomics & Molecular Medicine, Council of Scientific and Industrial Research (CSIR) Institute of Genomics & Integrative Biology, Mathura Road, Delhi 110025, India
| |
Collapse
|
7
|
Biobanks: Will the Idea Change Indian Life? Asian Bioeth Rev 2017. [DOI: 10.1007/s41649-017-0032-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
|
8
|
Bhartiya D, Scaria V. Genomic variations in non-coding RNAs: Structure, function and regulation. Genomics 2016; 107:59-68. [DOI: 10.1016/j.ygeno.2016.01.005] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 01/05/2016] [Accepted: 01/08/2016] [Indexed: 01/05/2023]
|
9
|
Periwal V, Patowary A, Vellarikkal SK, Gupta A, Singh M, Mittal A, Jeyapaul S, Chauhan RK, Singh AV, Singh PK, Garg P, Katoch VM, Katoch K, Chauhan DS, Sivasubbu S, Scaria V. Comparative whole-genome analysis of clinical isolates reveals characteristic architecture of Mycobacterium tuberculosis pangenome. PLoS One 2015; 10:e0122979. [PMID: 25853708 PMCID: PMC4390332 DOI: 10.1371/journal.pone.0122979] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Accepted: 02/26/2015] [Indexed: 11/18/2022] Open
Abstract
The tubercle complex consists of closely related mycobacterium species which appear to be variants of a single species. Comparative genome analysis of different strains could provide useful clues and insights into the genetic diversity of the species. We integrated genome assemblies of 96 strains from Mycobacterium tuberculosis complex (MTBC), which included 8 Indian clinical isolates sequenced and assembled in this study, to understand its pangenome architecture. We predicted genes for all the 96 strains and clustered their respective CDSs into homologous gene clusters (HGCs) to reveal a hard-core, soft-core and accessory genome component of MTBC. The hard-core (HGCs shared amongst 100% of the strains) was comprised of 2,066 gene clusters whereas the soft-core (HGCs shared amongst at least 95% of the strains) comprised of 3,374 gene clusters. The change in the core and accessory genome components when observed as a function of their size revealed that MTBC has an open pangenome. We identified 74 HGCs that were absent from reference strains H37Rv and H37Ra but were present in most of clinical isolates. We report PCR validation on 9 candidate genes depicting 7 genes completely absent from H37Rv and H37Ra whereas 2 genes shared partial homology with them accounting to probable insertion and deletion events. The pangenome approach is a promising tool for studying strain specific genetic differences occurring within species. We also suggest that since selecting appropriate target genes for typing purposes requires the expected target gene be present in all isolates being typed, therefore estimating the core-component of the species becomes a subject of prime importance.
Collapse
Affiliation(s)
- Vinita Periwal
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
- Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India
| | - Ashok Patowary
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
| | - Shamsudheen Karuthedath Vellarikkal
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
- Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India
| | - Anju Gupta
- Open Source Drug Discovery Unit, Council of Scientific and Industrial Research (CSIR), Anusandhan Bhavan, 2 Rafi Marg, New Delhi 110001, India
| | - Meghna Singh
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
- Academy of Scientific & Innovative Research (AcSIR), 2, Rafi Marg, Anusandhan Bhawan, New Delhi 110001, India
| | - Ashish Mittal
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
| | - Shamini Jeyapaul
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
| | - Rajendra Kumar Chauhan
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
| | - Ajay Vir Singh
- National JALMA Institute of Leprosy and other Mycobacterial Diseases, Post Box No.101,Tajganj, Agra-282001, India
| | - Pravin Kumar Singh
- National JALMA Institute of Leprosy and other Mycobacterial Diseases, Post Box No.101,Tajganj, Agra-282001, India
| | - Parul Garg
- National JALMA Institute of Leprosy and other Mycobacterial Diseases, Post Box No.101,Tajganj, Agra-282001, India
| | - Viswa Mohan Katoch
- National JALMA Institute of Leprosy and other Mycobacterial Diseases, Post Box No.101,Tajganj, Agra-282001, India
| | - Kiran Katoch
- National JALMA Institute of Leprosy and other Mycobacterial Diseases, Post Box No.101,Tajganj, Agra-282001, India
| | - Devendra Singh Chauhan
- National JALMA Institute of Leprosy and other Mycobacterial Diseases, Post Box No.101,Tajganj, Agra-282001, India
| | - Sridhar Sivasubbu
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
- * E-mail: (VS); (SS)
| | - Vinod Scaria
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi—110007, India
- * E-mail: (VS); (SS)
| |
Collapse
|
10
|
Ilyas M, Kim JS, Cooper J, Shin YA, Kim HM, Cho YS, Hwang S, Kim H, Moon J, Chung O, Jun J, Rastogi A, Song S, Ko J, Manica A, Rahman Z, Husnain T, Bhak J. Whole genome sequencing of an ethnic Pathan (Pakhtun) from the north-west of Pakistan. BMC Genomics 2015; 16:172. [PMID: 25887915 PMCID: PMC4362645 DOI: 10.1186/s12864-015-1290-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2014] [Accepted: 01/29/2015] [Indexed: 11/10/2022] Open
Abstract
Background Pakistan covers a key geographic area in human history, being both part of the Indus River region that acted as one of the cradles of civilization and as a link between Western Eurasia and Eastern Asia. This region is inhabited by a number of distinct ethnic groups, the largest being the Punjabi, Pathan (Pakhtuns), Sindhi, and Baloch. Results We analyzed the first ethnic male Pathan genome by sequencing it to 29.7-fold coverage using the Illumina HiSeq2000 platform. A total of 3.8 million single nucleotide variations (SNVs) and 0.5 million small indels were identified by comparing with the human reference genome. Among the SNVs, 129,441 were novel, and 10,315 nonsynonymous SNVs were found in 5,344 genes. SNVs were annotated for health consequences and high risk diseases, as well as possible influences on drug efficacy. We confirmed that the Pathan genome presented here is representative of this ethnic group by comparing it to a panel of Central Asians from the HGDP-CEPH panels typed for ~650 k SNPs. The mtDNA (H2) and Y haplogroup (L1) of this individual were also typical of his geographic region of origin. Finally, we reconstruct the demographic history by PSMC, which highlights a recent increase in effective population size compatible with admixture between European and Asian lineages expected in this geographic region. Conclusions We present a whole-genome sequence and analyses of an ethnic Pathan from the north-west province of Pakistan. It is a useful resource to understand genetic variation and human migration across the whole Asian continent. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1290-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Muhammad Ilyas
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, Pakistan. .,Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea.
| | - Jong-Soo Kim
- Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea.
| | - Jesse Cooper
- Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea.
| | - Young-Ah Shin
- Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea.
| | - Hak-Min Kim
- Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea. .,The Genomics Institute, Biomedical Engineering Department, UNIST, Ulsan, Republic of Korea.
| | - Yun Sung Cho
- Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea. .,The Genomics Institute, Biomedical Engineering Department, UNIST, Ulsan, Republic of Korea.
| | - Seungwoo Hwang
- Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon, Republic of Korea.
| | - Hyunho Kim
- The Genomics Institute, Biomedical Engineering Department, UNIST, Ulsan, Republic of Korea.
| | - Jaewoo Moon
- Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea.
| | - Oksung Chung
- Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea.
| | - JeHoon Jun
- Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea.
| | - Achal Rastogi
- Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea.
| | - Sanghoon Song
- Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea.
| | - Junsu Ko
- Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea.
| | - Andrea Manica
- Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK.
| | - Ziaur Rahman
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, Pakistan.
| | - Tayyab Husnain
- National Centre of Excellence in Molecular Biology, University of the Punjab, Lahore, Pakistan.
| | - Jong Bhak
- Personal Genomics Institute, Genome Research Foundation, Suwon, Republic of Korea. .,Theragen Bio Institute, TheragenEtex, Suwon, Republic of Korea. .,The Genomics Institute, Biomedical Engineering Department, UNIST, Ulsan, Republic of Korea.
| |
Collapse
|
11
|
Sequence and analysis of a whole genome from Kuwaiti population subgroup of Persian ancestry. BMC Genomics 2015; 16:92. [PMID: 25765185 PMCID: PMC4336699 DOI: 10.1186/s12864-015-1233-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Accepted: 01/12/2015] [Indexed: 12/30/2022] Open
Abstract
Background The 1000 Genome project paved the way for sequencing diverse human populations. New genome projects are being established to sequence underrepresented populations helping in understanding human genetic diversity. The Kuwait Genome Project an initiative to sequence individual genomes from the three subgroups of Kuwaiti population namely, Saudi Arabian tribe; “tent-dwelling” Bedouin; and Persian, attributing their ancestry to different regions in Arabian Peninsula and to modern-day Iran (West Asia). These subgroups were in line with settlement history and are confirmed by genetic studies. In this work, we report whole genome sequence of a Kuwaiti native from Persian subgroup at >37X coverage. Results We document 3,573,824 SNPs, 404,090 insertions/deletions, and 11,138 structural variations. Out of the reported SNPs and indels, 85,939 are novel. We identify 295 ‘loss-of-function’ and 2,314 ’deleterious’ coding variants, some of which carry homozygous genotypes in the sequenced genome; the associated phenotypes include pharmacogenomic traits such as greater triglyceride lowering ability with fenofibrate treatment, and requirement of high warfarin dosage to elicit anticoagulation response. 6,328 non-coding SNPs associate with 811 phenotype traits: in congruence with medical history of the participant for Type 2 diabetes and β-Thalassemia, and of participant’s family for migraine, 72 (of 159 known) Type 2 diabetes, 3 (of 4) β-Thalassemia, and 76 (of 169) migraine variants are seen in the genome. Intergenome comparisons based on shared disease-causing variants, positions the sequenced genome between Asian and European genomes in congruence with geographical location of the region. On comparison, bead arrays perform better than sequencing platforms in correctly calling genotypes in low-coverage sequenced genome regions however in the event of novel SNP or indel near genotype calling position can lead to false calls using bead arrays. Conclusions We report, for the first time, reference genome resource for the population of Persian ancestry. The resource provides a starting point for designing large-scale genetic studies in Peninsula including Kuwait, and Persian population. Such efforts on populations under-represented in global genome variation surveys help augment current knowledge on human genome diversity. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1233-x) contains supplementary material, which is available to authorized users.
Collapse
|
12
|
Giri AK, Khan NM, Basu A, Tandon N, Scaria V, Bharadwaj D. Pharmacogenetic landscape of clopidogrel in north Indians suggest distinct interpopulation differences in allele frequencies. Pharmacogenomics 2014; 15:643-53. [PMID: 24798721 DOI: 10.2217/pgs.13.241] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
AIM Clopidogrel, a widely used antiplatelet drug, exhibits high interindividual variability; more than 80% of which could be explained by genetic polymorphisms. We built an allele frequency map of variants affecting clopidogrel response in north Indians. MATERIALS & METHODS We mined a cross-sectional population-scale genome-wide dataset of 2128 Indo-Europeans residing in north India for presence of variants associated with pharmacogenetics of clopidogrel. RESULTS Our analysis reveals significant differences in population-scale allele frequencies between Indians and the global population. Indians had a higher allele frequency for variants in the CYP2C9*2, CYP2C9*3 and P2RY1 genes whereas lower frequency for the ABCB1, CYP1A2, CYP2C19*2C, CYP3A5 and PON1 genes compared with the global population. Furthermore, from our study we proposed a model to explain the higher prevalence of clopidogrel metabolizers in north Indians. CONCLUSION This is the largest population-scale genetic epidemiology study that provides a high-resolution map of variants associated with clopidogrel response that could be potentially valuable to clinicians to rationally plan appropriate dosage for therapy in resource poor conditions based on population level allele frequencies.
Collapse
Affiliation(s)
- Anil K Giri
- Genomics & Molecular Medicine Unit, CSIR-Institute of Genomics & Integrative Biology, Mathura Road, Delhi-110 020, India
| | | | | | | | | | | |
Collapse
|
13
|
|
14
|
Pharmacogenomics for Precision Medicine in the Era of Collaborative Co-creation and Crowdsourcing. CURRENT GENETIC MEDICINE REPORTS 2014. [DOI: 10.1007/s40142-014-0041-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
15
|
Ribeiro-dos-Santos AM, de Souza JES, Almeida R, Alencar DO, Barbosa MS, Gusmão L, Silva WA, de Souza SJ, Silva A, Ribeiro-dos-Santos Â, Darnet S, Santos S. High-throughput sequencing of a South American Amerindian. PLoS One 2013; 8:e83340. [PMID: 24386182 PMCID: PMC3875439 DOI: 10.1371/journal.pone.0083340] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2013] [Accepted: 10/30/2013] [Indexed: 11/18/2022] Open
Abstract
The emergence of next-generation sequencing technologies allowed access to the vast amounts of information that are contained in the human genome. This information has contributed to the understanding of individual and population-based variability and improved the understanding of the evolutionary history of different human groups. However, the genome of a representative of the Amerindian populations had not been previously sequenced. Thus, the genome of an individual from a South American tribe was completely sequenced to further the understanding of the genetic variability of Amerindians. A total of 36.8 giga base pairs (Gbp) were sequenced and aligned with the human genome. These Gbp corresponded to 95.92% of the human genome with an estimated miscall rate of 0.0035 per sequenced bp. The data obtained from the alignment were used for SNP (single-nucleotide) and INDEL (insertion-deletion) calling, which resulted in the identification of 502,017 polymorphisms, of which 32,275 were potentially new high-confidence SNPs and 33,795 new INDELs, specific of South Native American populations. The authenticity of the sample as a member of the South Native American populations was confirmed through the analysis of the uniparental (maternal and paternal) lineages. The autosomal comparison distinguished the investigated sample from others continental populations and revealed a close relation to the Eastern Asian populations and Aboriginal Australian. Although, the findings did not discard the classical model of America settlement; it brought new insides to the understanding of the human population history. The present study indicates a remarkable genetic variability in human populations that must still be identified and contributes to the understanding of the genetic variability of South Native American populations and of the human populations history.
Collapse
Affiliation(s)
| | - Jorge Estefano Santana de Souza
- Centro Regional de Hemoterapia, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
- Institute of Bioinformatics and Biotechnology, São Paulo, São Paulo, Brazil
| | - Renan Almeida
- Institute of Bioinformatics and Biotechnology, São Paulo, São Paulo, Brazil
| | - Dayse O. Alencar
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Pará, Brazil
| | | | - Leonor Gusmão
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Pará, Brazil
- Institute of Molecular Pathology and Immunology, University of Porto, Porto, Portugal
| | - Wilson A. Silva
- Centro Regional de Hemoterapia, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
- Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, São Paulo, Brazil
| | - Sandro J. de Souza
- Institute of Bioinformatics and Biotechnology, São Paulo, São Paulo, Brazil
- Brain Institute, Universidade Federal do Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
| | - Artur Silva
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Pará, Brazil
| | | | - Sylvain Darnet
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Pará, Brazil
| | - Sidney Santos
- Instituto de Ciências Biológicas, Universidade Federal do Pará, Belém, Pará, Brazil
- * E-mail: /
| |
Collapse
|
16
|
|
17
|
Distinct Patterns of Genetic Variations in Potential Functional Elements in Long Noncoding RNAs. Hum Mutat 2013; 35:192-201. [DOI: 10.1002/humu.22472] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2013] [Accepted: 10/14/2013] [Indexed: 01/09/2023]
|
18
|
Salleh MZ, Teh LK, Lee LS, Ismet RI, Patowary A, Joshi K, Pasha A, Ahmed AZ, Janor RM, Hamzah AS, Adam A, Yusoff K, Hoh BP, Hatta FHM, Ismail MI, Scaria V, Sivasubbu S. Systematic pharmacogenomics analysis of a Malay whole genome: proof of concept for personalized medicine. PLoS One 2013; 8:e71554. [PMID: 24009664 PMCID: PMC3751891 DOI: 10.1371/journal.pone.0071554] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 07/01/2013] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND With a higher throughput and lower cost in sequencing, second generation sequencing technology has immense potential for translation into clinical practice and in the realization of pharmacogenomics based patient care. The systematic analysis of whole genome sequences to assess patient to patient variability in pharmacokinetics and pharmacodynamics responses towards drugs would be the next step in future medicine in line with the vision of personalizing medicine. METHODS Genomic DNA obtained from a 55 years old, self-declared healthy, anonymous male of Malay descent was sequenced. The subject's mother died of lung cancer and the father had a history of schizophrenia and deceased at the age of 65 years old. A systematic, intuitive computational workflow/pipeline integrating custom algorithm in tandem with large datasets of variant annotations and gene functions for genetic variations with pharmacogenomics impact was developed. A comprehensive pathway map of drug transport, metabolism and action was used as a template to map non-synonymous variations with potential functional consequences. PRINCIPAL FINDINGS Over 3 million known variations and 100,898 novel variations in the Malay genome were identified. Further in-depth pharmacogenetics analysis revealed a total of 607 unique variants in 563 proteins, with the eventual identification of 4 drug transport genes, 2 drug metabolizing enzyme genes and 33 target genes harboring deleterious SNVs involved in pharmacological pathways, which could have a potential role in clinical settings. CONCLUSIONS The current study successfully unravels the potential of personal genome sequencing in understanding the functionally relevant variations with potential influence on drug transport, metabolism and differential therapeutic outcomes. These will be essential for realizing personalized medicine through the use of comprehensive computational pipeline for systematic data mining and analysis.
Collapse
Affiliation(s)
- Mohd Zaki Salleh
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
- Faculty of Pharmacy, Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
| | - Lay Kek Teh
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
- Faculty of Pharmacy, Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
| | - Lian Shien Lee
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
| | - Rose Iszati Ismet
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
| | - Ashok Patowary
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Kandarp Joshi
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Ayesha Pasha
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Azni Zain Ahmed
- Institute of Science, Universiti Teknologi MARA (UiTM) Malaysia, Shah Alam, Selangor, Malaysia
| | - Roziah Mohd Janor
- Faculty of Computer and Mathematical Science, Universiti Teknologi MARA (UiTM) Malaysia, Shah Alam, Selangor, Malaysia
| | - Ahmad Sazali Hamzah
- Institute of Science, Universiti Teknologi MARA (UiTM) Malaysia, Shah Alam, Selangor, Malaysia
| | - Aishah Adam
- Faculty of Pharmacy, Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
| | - Khalid Yusoff
- Faculty of Medicine, Universiti Teknologi MARA (UiTM) Malaysia, Sg Buloh, Selangor, Malaysia
| | - Boon Peng Hoh
- Insitute of Medical Molecular Biotechnology (IMMB), Faculty of Medicine, Universiti Teknologi MARA (UiTM) Malaysia, Sg Buloh, Selangor, Malaysia
| | | | - Mohamad Izwan Ismail
- Integrative Pharmacogenomics Institute (iPROMISE), Universiti Teknologi MARA (UiTM) Malaysia, Puncak Alam, Selangor, Malaysia
| | - Vinod Scaria
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| | - Sridhar Sivasubbu
- Genomics and Molecular Medicine, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
| |
Collapse
|
19
|
DeWoody JA, Abts KC, Fahey AL, Ji Y, Kimble SJA, Marra NJ, Wijayawardena BK, Willoughby JR. Of contigs and quagmires: next‐generation sequencing pitfalls associated with transcriptomic studies. Mol Ecol Resour 2013; 13:551-8. [PMID: 23615313 DOI: 10.1111/1755-0998.12107] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Revised: 03/13/2013] [Accepted: 03/14/2013] [Indexed: 12/15/2022]
Affiliation(s)
- J. Andrew DeWoody
- Department of Biological Sciences Purdue University West Lafayette IN 47907 USA
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| | - Kendra C. Abts
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| | - Anna L. Fahey
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| | - Yanzhu Ji
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| | - Steven J. A. Kimble
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| | - Nicholas J. Marra
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| | | | - Janna R. Willoughby
- Department of Forestry & Natural Resources Purdue University West Lafayette IN 47907 USA
| |
Collapse
|
20
|
Gupta R, Ratan A, Rajesh C, Chen R, Kim HL, Burhans R, Miller W, Santhosh S, Davuluri RV, Butte AJ, Schuster SC, Seshagiri S, Thomas G. Sequencing and analysis of a South Asian-Indian personal genome. BMC Genomics 2012; 13:440. [PMID: 22938532 PMCID: PMC3534380 DOI: 10.1186/1471-2164-13-440] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 08/18/2012] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. RESULTS We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance. CONCLUSIONS This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population.
Collapse
Affiliation(s)
- Ravi Gupta
- SciGenom Labs Pvt Ltd., Plot 43A, SDF 3rd Floor CSEZ, Kakkanad, Cochin, Kerala, 682037, India
| | - Aakrosh Ratan
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 310 Wartik Lab, University Park, , Pennsylvania, 16802, USA
| | - Changanamkandath Rajesh
- SciGenom Labs Pvt Ltd., Plot 43A, SDF 3rd Floor CSEZ, Kakkanad, Cochin, Kerala, 682037, India
| | - Rong Chen
- , , Personalis, 1350 Willow Road, Suite 202, Menlo Park, CA, 94025, USA
| | - Hie Lim Kim
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 310 Wartik Lab, University Park, , Pennsylvania, 16802, USA
| | - Richard Burhans
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 310 Wartik Lab, University Park, , Pennsylvania, 16802, USA
| | - Webb Miller
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 310 Wartik Lab, University Park, , Pennsylvania, 16802, USA
| | - Sam Santhosh
- SciGenom Labs Pvt Ltd., Plot 43A, SDF 3rd Floor CSEZ, Kakkanad, Cochin, Kerala, 682037, India
| | - Ramana V Davuluri
- Center for Systems The Wistar Institute,, , Philadelphia, PA, 19104, USA
| | - Atul J Butte
- Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Stephan C Schuster
- Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 310 Wartik Lab, University Park, , Pennsylvania, 16802, USA
- Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, 60 Nanyang Drive, SBS-01N-27, Singapore, Singapore , 637551
| | - Somasekar Seshagiri
- Department of Molecular Biology, Genentech Inc, 1 DNA Way, South San Francisco, CA, 94080, USA
| | - George Thomas
- SciGenom Labs Pvt Ltd., Plot 43A, SDF 3rd Floor CSEZ, Kakkanad, Cochin, Kerala, 682037, India
| |
Collapse
|