52
|
O'Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, Bodily P, Tian L, Hakonarson H, Johnson WE, Wei Z, Wang K, Lyon GJ. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 2013; 5:28. [PMID: 23537139 PMCID: PMC3706896 DOI: 10.1186/gm432] [Citation(s) in RCA: 299] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Revised: 03/23/2013] [Accepted: 03/27/2013] [Indexed: 12/18/2022] Open
Abstract
Background To facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be. Methods We sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage. Results SNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family. Conclusions Our results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.
Collapse
Affiliation(s)
- Jason O'Rawe
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA
| | - Tao Jiang
- BGI-Shenzhen, Shenzhen 518000, China
| | | | - Yiyang Wu
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA
| | - Wei Wang
- New Jersey Institute of Technology, Martin Luther King Jr. Blvd, Newark, 07103, USA
| | | | - Paul Bodily
- Brigham Young University, N University Ave, Provo, 84606, USA
| | - Lifeng Tian
- Children's Hospital of Philadelphia, Civic Center Blvd, Philadelphia, 19104, USA
| | - Hakon Hakonarson
- Children's Hospital of Philadelphia, Civic Center Blvd, Philadelphia, 19104, USA
| | - W Evan Johnson
- Boston University School of Medicine, E Concord St, Boston, 02118, USA
| | - Zhi Wei
- New Jersey Institute of Technology, Martin Luther King Jr. Blvd, Newark, 07103, USA
| | - Kai Wang
- University of Southern California, 1501 San Pablo Street, Los Angeles, 90089, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA
| | - Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, One Bungtown Rd, Cold Spring Harbor, 11724, USA ; Stony Brook University, 100 Nicolls Rd, Stony Brook, 11794, USA ; Utah Foundation for Biomedical Research, E 3300 S, Salt Lake City, 84106, USA
| |
Collapse
|
53
|
Lyon GJ, Segal JP. Practical, ethical and regulatory considerations for the evolving medical and research genomics landscape. Appl Transl Genom 2013; 2:34-40. [PMID: 27942444 PMCID: PMC5133337 DOI: 10.1016/j.atg.2013.02.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2012] [Revised: 02/13/2013] [Accepted: 02/13/2013] [Indexed: 01/29/2023]
Abstract
Recent advances in sequencing technology are making possible the application of large-scale genomic analyses to individualized care, both in wellness and disease. However, a number of obstacles remain before genomic sequencing can become a routine part of clinical practice. One of the more significant and underappreciated is the lack of consensus regarding the proper environment and regulatory structure under which clinical genome sequencing and interpretation should be performed. The continued reliance on pure research vs. pure clinical models leads to problems for both research participants and patients in an era in which the lines between research and clinical practice are becoming increasingly blurred. Here, we discuss some of the ethical, regulatory and practical considerations that are emerging in the field of genomic medicine. We also propose that many of the cost and safety issues we are facing can be mitigated through expanded reliance on existing clinical regulatory frameworks and the implementation of distributive work-sharing strategies designed to leverage the strengths of our genomics centers and clinical interpretive teams.
Collapse
Affiliation(s)
- Gholson J Lyon
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, NY, United States; Utah Foundation for Biomedical Research, Salt Lake City, UT, United States
| | - Jeremy P Segal
- New York Genome Center, New York City, NY, United States
| |
Collapse
|
54
|
Abstract
Recent advances in genetic testing technology have made chromosome microarray analysis (CMA) a first-tier clinical diagnostic test for Autism Spectrum Disorders (ASDs). Two main types of microarrays are available, single nucleotide polymorphism (SNP) arrays and array comparative genomic hybridization (aCGH), each with its own advantages and disadvantages in ASDs testing. Rare genetic variants, and copy number variants (CNVs) in particular, have been shown to play a major role in ASDs. More than 200 autism susceptibility genes have been identified to date, and complex patterns of inheritance, such as oligogenic heterozygosity, appear to contribute to the etiopathogenesis of ASDs. Incomplete penetrance and variable expressivity represent particular challenges in the interpretation of CMA testing of autistic individuals. This review aims to provide an overview of autism genetics for the practicing physician and gives hands-on advice on how to follow-up on abnormal CMA findings in individuals with neuropsychiatric disorders.
Collapse
Affiliation(s)
- Karsten M Heil
- Faculty of Medicine, University of Heidelberg, Im Neuenheimer Feld 134b, 69120 Heidelberg, Germany.
| | | |
Collapse
|
56
|
Torkamani A, Pham P, Libiger O, Bansal V, Zhang G, Scott-Van Zeeland AA, Tewhey R, Topol EJ, Schork NJ. Clinical implications of human population differences in genome-wide rates of functional genotypes. Front Genet 2012; 3:211. [PMID: 23125845 PMCID: PMC3485509 DOI: 10.3389/fgene.2012.00211] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 09/26/2012] [Indexed: 12/21/2022] Open
Abstract
There have been a number of recent successes in the use of whole genome sequencing and sophisticated bioinformatics techniques to identify pathogenic DNA sequence variants responsible for individual idiopathic congenital conditions. However, the success of this identification process is heavily influenced by the ancestry or genetic background of a patient with an idiopathic condition. This is so because potential pathogenic variants in a patient’s genome must be contrasted with variants in a reference set of genomes made up of other individuals’ genomes of the same ancestry as the patient. We explored the effect of ignoring the ancestries of both an individual patient and the individuals used to construct reference genomes. We pursued this exploration in two major steps. We first considered variation in the per-genome number and rates of likely functional derived (i.e., non-ancestral, based on the chimp genome) single nucleotide variants and small indels in 52 individual whole human genomes sampled from 10 different global populations. We took advantage of a suite of computational and bioinformatics techniques to predict the functional effect of over 24 million genomic variants, both coding and non-coding, across these genomes. We found that the typical human genome harbors ∼5.5–6.1 million total derived variants, of which ∼12,000 are likely to have a functional effect (∼5000 coding and ∼7000 non-coding). We also found that the rates of functional genotypes per the total number of genotypes in individual whole genomes differ dramatically between human populations. We then created tables showing how the use of comparator or reference genome panels comprised of genomes from individuals that do not have the same ancestral background as a patient can negatively impact pathogenic variant identification. Our results have important implications for clinical sequencing initiatives.
Collapse
Affiliation(s)
- Ali Torkamani
- The Scripps Translational Science La Jolla, CA, USA ; Scripps Health La Jolla, CA, USA ; Department of Molecular and Experimental Medicine, The Scripps Research Institute La Jolla, CA, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Sifrim A, Van Houdt JKJ, Tranchevent LC, Nowakowska B, Sakai R, Pavlopoulos GA, Devriendt K, Vermeesch JR, Moreau Y, Aerts J. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease. Genome Med 2012; 4:73. [PMID: 23013645 PMCID: PMC3580443 DOI: 10.1186/gm374] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 09/14/2012] [Accepted: 09/26/2012] [Indexed: 12/18/2022] Open
Abstract
The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org.
Collapse
Affiliation(s)
- Alejandro Sifrim
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Jeroen KJ Van Houdt
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Leon-Charles Tranchevent
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Beata Nowakowska
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Ryo Sakai
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Georgios A Pavlopoulos
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Koen Devriendt
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Joris R Vermeesch
- KU Leuven, Centre for Human Genetics, University Hospital Gasthuisberg, Herestraat 49, 3000 Leuven, Belgium
| | - Yves Moreau
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| | - Jan Aerts
- KU Leuven, Department of Electrical Engineering-ESAT, SCD-SISTA, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
- IBBT Future Health Department, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium
| |
Collapse
|