251
|
Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD, Hovig E, Heisler LE, Beck TA, Simpson JT, Tonon L, Sertier AS, Patch AM, Jäger N, Ginsbach P, Drews R, Paramasivam N, Kabbe R, Chotewutmontri S, Diessl N, Previti C, Schmidt S, Brors B, Feuerbach L, Heinold M, Gröbner S, Korshunov A, Tarpey PS, Butler AP, Hinton J, Jones D, Menzies A, Raine K, Shepherd R, Stebbings L, Teague JW, Ribeca P, Giner FC, Beltran S, Raineri E, Dabad M, Heath SC, Gut M, Denroche RE, Harding NJ, Yamaguchi TN, Fujimoto A, Nakagawa H, Quesada V, Valdés-Mas R, Nakken S, Vodák D, Bower L, Lynch AG, Anderson CL, Waddell N, Pearson JV, Grimmond SM, Peto M, Spellman P, He M, Kandoth C, Lee S, Zhang J, Létourneau L, Ma S, Seth S, Torrents D, Xi L, Wheeler DA, López-Otín C, Campo E, Campbell PJ, Boutros PC, Puente XS, Gerhard DS, Pfister SM, McPherson JD, Hudson TJ, Schlesner M, Lichter P, Eils R, Jones DTW, Gut IG. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat Commun 2015; 6:10001. [PMID: 26647970 PMCID: PMC4682041 DOI: 10.1038/ncomms10001] [Citation(s) in RCA: 207] [Impact Index Per Article: 20.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 10/23/2015] [Indexed: 12/13/2022] Open
Abstract
As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.
Collapse
Affiliation(s)
- Tyler S. Alioto
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Ivo Buchhalter
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Sophia Derdak
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Barbara Hutter
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Matthew D. Eldridge
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- Department of Informatics, University of Oslo, 0373 Oslo, Norway
| | - Lawrence E. Heisler
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Timothy A. Beck
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Jared T. Simpson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Laurie Tonon
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
| | - Anne-Sophie Sertier
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
| | - Ann-Marie Patch
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - Natalie Jäger
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Department of Genetics, Stanford University, Mail Stop-5120, Stanford, California 94305-5120, USA
| | - Philip Ginsbach
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Ruben Drews
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Nagarajan Paramasivam
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Rolf Kabbe
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Sasithorn Chotewutmontri
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Nicolle Diessl
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Christopher Previti
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Sabine Schmidt
- Genome and Proteome Core Facility, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Lars Feuerbach
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Michael Heinold
- Division of Applied Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Susanne Gröbner
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 430, Heidelberg 69120, Germany
| | - Andrey Korshunov
- Department of Neuropathology, Heidelberg University Hospital, Im Neuenheimer Feld 224, Heidelberg 69120, Germany
| | | | - Adam P. Butler
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jonathan Hinton
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - David Jones
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Andrew Menzies
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Keiran Raine
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Rebecca Shepherd
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Lucy Stebbings
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Jon W. Teague
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Paolo Ribeca
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Francesc Castro Giner
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Sergi Beltran
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Emanuele Raineri
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Marc Dabad
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Simon C. Heath
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Marta Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| | - Robert E. Denroche
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Nicholas J. Harding
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Takafumi N. Yamaguchi
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
| | - Akihiro Fujimoto
- RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Hidewaki Nakagawa
- RIKEN Center for Integrative Medical Sciences, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Víctor Quesada
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Rafael Valdés-Mas
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
| | - Daniel Vodák
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424 Oslo, Norway
- The Bioinformatics Core Facility, Institute for Cancer Genetics and Informatics, Oslo University Hospital, 0310 Oslo, Norway
| | - Lawrence Bower
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Andrew G. Lynch
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Charlotte L. Anderson
- Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
- Victorian Life Sciences Computation Initiative, The University of Melbourne, Melbourne, Victoria 3053, Australia
| | - Nicola Waddell
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - John V. Pearson
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- QIMR Berghofer Medical Research Institute, Brisbane, Queensland 4006, Australia
| | - Sean M. Grimmond
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St Lucia, Brisbane, Queensland 4072, Australia
- WolfsonWohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, Scotland G61 1QH, UK
| | - Myron Peto
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon 97239-3098, USA
| | - Paul Spellman
- Knight Cancer Institute, Oregon Health and Science University, Portland, Oregon 97239-3098, USA
| | | | - Cyriac Kandoth
- The Genome Institute, Washington University, St Louis, Missouri 63108, USA
| | - Semin Lee
- Harvard Medical School, Boston, Massachusetts 02115, USA
| | - John Zhang
- Harvard Medical School, Boston, Massachusetts 02115, USA
- MD Anderson Cancer Center, Houston, Texas 77030, USA
| | | | - Singer Ma
- Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
| | - Sahil Seth
- MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - David Torrents
- IRB-BSC Joint Research Program on Computational Biology, Barcelona Supercomputing Center, 08034 Barcelona, Spain
| | - Liu Xi
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - David A. Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA
| | - Carlos López-Otín
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Elías Campo
- Hematopathology Unit, Department of Pathology, Hospital Clinic, University of Barcelona, Institut d'Investigacions Biomèdiques August Pi i Sunyer, 08036 Barcelona, Spain
| | | | - Paul C. Boutros
- Synergie Lyon Cancer Foundation, Centre Léon Bérard, Cheney C, 28 rue Laennec, Lyon 69373, France
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
| | - Xose S. Puente
- Universidad de Oviedo—IUOPA, C/Fernando Bongera s/n, 33006 Oviedo, Spain
| | - Daniela S. Gerhard
- National Cancer Institute, Office of Cancer Genomics, 31 Center Drive, 10A07, Bethesda, Maryland 20892-2580, USA
| | - Stefan M. Pfister
- Department of Pediatric Hematology and Oncology, Heidelberg University Hospital, Im Neuenheimer Feld 430, Heidelberg 69120, Germany
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - John D. McPherson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
| | - Thomas J. Hudson
- Ontario Institute for Cancer Research, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada M5G 1L7
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Matthias Schlesner
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Peter Lichter
- Division of Molecular Genetics, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120,Germany
- Heidelberg Center for Personalised Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Roland Eils
- Division of Theoretical Bioinformatics, German Cancer Research Center, Im Neuenheimer Feld 280, Heidelberg 69120, Germany
- Heidelberg Center for Personalised Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Pharmacy and Molecular Biotechnology, University of Heidelberg, Heidelberg 69120, Germany
- Bioquant Center, University of Heidelberg, Im Neuenheimer Feld 267, Heidelberg 69120, Germany
| | - David T. W. Jones
- Division of Pediatric Neurooncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg 69120, Germany
| | - Ivo G. Gut
- CNAG-CRG, Centre for Genomic Regulation, Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), 08002 Barcelona, Spain
| |
Collapse
|
252
|
Abstract
SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure.
Collapse
|
253
|
Zhang N, McHale LK, Finer JJ. Isolation and characterization of "GmScream" promoters that regulate highly expressing soybean (Glycine max Merr.) genes. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2015; 241:189-98. [PMID: 26706070 DOI: 10.1016/j.plantsci.2015.10.010] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Revised: 09/22/2015] [Accepted: 10/17/2015] [Indexed: 05/25/2023]
Abstract
To increase our understanding of the regulatory components that control gene expression, it is important to identify, isolate and characterize new promoters. In this study, a group of highly expressed soybean (Glycine max Merr.) genes, which we have named "GmScream", were first identified from RNA-Seq data. The promoter regions were then identified, cloned and fused with the coding region of the green fluorescent protein (gfp) gene, for introduction and analysis in different tissues using 3 tools for validation. Approximately half of the GmScream promoters identified showed levels of GFP expression comparable to or higher than the Cauliflower Mosaic Virus 35S (35S) promoter. Using transient expression in lima bean cotyledonary tissues, the strongest GmScream promoters gave over 6-fold higher expression than the 35S promoter while several other GmScream promoters showed 2- to 3-fold higher expression. The two highest expressing promoters, GmScreamM4 and GmScreamM8, regulated two different elongation factor 1A genes in soybean. In stably transformed soybean tissues, GFP driven by the GmScreamM4 or GmScreamM8 promoter exhibited constitutive high expression in most tissues with preferentially higher expression in proliferative embryogenic tissues, procambium, vascular tissues, root tips and young embryos. Using deletion analysis of the promoter, two proximal regions of the GmScreamM8 promoter were identified as contributing significantly to high levels of gene expression.
Collapse
Affiliation(s)
- Ning Zhang
- Department of Horticulture and Crop Science, The Ohio State University, 1680 Madison Ave., Wooster, OH 44691, USA
| | - Leah K McHale
- Department of Horticulture and Crop Science, The Ohio State University, 2021Coffey Rd, Columbus, OH 43210, USA
| | - John J Finer
- Department of Horticulture and Crop Science, The Ohio State University, 1680 Madison Ave., Wooster, OH 44691, USA.
| |
Collapse
|
254
|
Vo NS, Phan V. Improving variant calling by incorporating known genetic variants into read alignment. BMC Bioinformatics 2015. [PMCID: PMC4625095 DOI: 10.1186/1471-2105-16-s15-p18] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
255
|
Li MJ, Liu Z, Wang P, Wong MP, Nelson MR, Kocher JPA, Yeager M, Sham PC, Chanock SJ, Xia Z, Wang J. GWASdb v2: an update database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res 2015; 44:D869-76. [PMID: 26615194 PMCID: PMC4702921 DOI: 10.1093/nar/gkv1317] [Citation(s) in RCA: 142] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 11/10/2015] [Indexed: 12/19/2022] Open
Abstract
Genome-wide association studies (GWASs), now as a routine approach to study single-nucleotide polymorphism (SNP)-trait association, have uncovered over ten thousand significant trait/disease associated SNPs (TASs). Here, we updated GWASdb (GWASdb v2, http://jjwanglab.org/gwasdb) which provides comprehensive data curation and knowledge integration for GWAS TASs. These updates include: (i) Up to August 2015, we collected 2479 unique publications from PubMed and other resources; (ii) We further curated moderate SNP-trait associations (P-value < 1.0×10−3) from each original publication, and generated a total of 252 530 unique TASs in all GWASdb v2 collected studies; (iii) We manually mapped 1610 GWAS traits to 501 Human Phenotype Ontology (HPO) terms, 435 Disease Ontology (DO) terms and 228 Disease Ontology Lite (DOLite) terms. For each ontology term, we also predicted the putative causal genes; (iv) We curated the detailed sub-populations and related sample size for each study; (v) Importantly, we performed extensive function annotation for each TAS by incorporating gene-based information, ENCODE ChIP-seq assays, eQTL, population haplotype, functional prediction across multiple biological domains, evolutionary signals and disease-related annotation; (vi) Additionally, we compiled a SNP-drug response association dataset for 650 pharmacogenetic studies involving 257 drugs in this update; (vii) Last, we improved the user interface of website.
Collapse
Affiliation(s)
- Mulin Jun Li
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Zipeng Liu
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Panwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Maria P Wong
- Department of Pathology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Matthew R Nelson
- Quantitative Sciences, GlaxoSmithKline, Research Triangle Park, NC, USA
| | - Jean-Pierre A Kocher
- Division of Biomedical Statistics and Informatics, Mayo Clinic College of Medicine, Rochester, MN, USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Pak Chung Sham
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China State Key Laboratory of Brain and Cognitive Sciences and Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Stephen J Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Zhengyuan Xia
- Department of Anaesthesiology, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Junwen Wang
- Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China School of Biomedical Sciences, LKS Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
256
|
Field MA, Cho V, Andrews TD, Goodnow CC. Reliably Detecting Clinically Important Variants Requires Both Combined Variant Calls and Optimized Filtering Strategies. PLoS One 2015; 10:e0143199. [PMID: 26600436 PMCID: PMC4658170 DOI: 10.1371/journal.pone.0143199] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 11/02/2015] [Indexed: 12/21/2022] Open
Abstract
A diversity of tools is available for identification of variants from genome sequence data. Given the current complexity of incorporating external software into a genome analysis infrastructure, a tendency exists to rely on the results from a single tool alone. The quality of the output variant calls is highly variable however, depending on factors such as sequence library quality as well as the choice of short-read aligner, variant caller, and variant caller filtering strategy. Here we present a two-part study first using the high quality 'genome in a bottle' reference set to demonstrate the significant impact the choice of aligner, variant caller, and variant caller filtering strategy has on overall variant call quality and further how certain variant callers outperform others with increased sample contamination, an important consideration when analyzing sequenced cancer samples. This analysis confirms previous work showing that combining variant calls of multiple tools results in the best quality resultant variant set, for either specificity or sensitivity, depending on whether the intersection or union, of all variant calls is used respectively. Second, we analyze a melanoma cell line derived from a control lymphocyte sample to determine whether software choices affect the detection of clinically important melanoma risk-factor variants finding that only one of the three such variants is unanimously detected under all conditions. Finally, we describe a cogent strategy for implementing a clinical variant detection pipeline; a strategy that requires careful software selection, variant caller filtering optimizing, and combined variant calls in order to effectively minimize false negative variants. While implementing such features represents an increase in complexity and computation the results offer indisputable improvements in data quality.
Collapse
Affiliation(s)
- Matthew A. Field
- Department of Immunology, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- National Computational Infrastructure, Australian National University, Canberra, ACT, Australia
| | - Vicky Cho
- Department of Immunology, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- Australian Phenomics Facility, Australian National University, Canberra, ACT, Australia
| | - T. Daniel Andrews
- Department of Immunology, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- National Computational Infrastructure, Australian National University, Canberra, ACT, Australia
| | - Chris C. Goodnow
- Department of Immunology, John Curtin School of Medical Research, Australian National University, Canberra, ACT, Australia
- Immunogenomics Group, Immunology Research Program, Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
| |
Collapse
|
257
|
Monat C, Tranchant-Dubreuil C, Kougbeadjo A, Farcy C, Ortega-Abboud E, Amanzougarene S, Ravel S, Agbessi M, Orjuela-Bouniol J, Summo M, Sabot F. TOGGLE: toolbox for generic NGS analyses. BMC Bioinformatics 2015; 16:374. [PMID: 26552596 PMCID: PMC4640241 DOI: 10.1186/s12859-015-0795-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 10/24/2015] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND The explosion of NGS (Next Generation Sequencing) sequence data requires a huge effort in Bioinformatics methods and analyses. The creation of dedicated, robust and reliable pipelines able to handle dozens of samples from raw FASTQ data to relevant biological data is a time-consuming task in all projects relying on NGS. To address this, we created a generic and modular toolbox for developing such pipelines. RESULTS TOGGLE (TOolbox for Generic nGs anaLysEs) is a suite of tools able to design pipelines that manage large sets of NGS softwares and utilities. Moreover, TOGGLE offers an easy way to manipulate the various options of the different softwares through the pipelines in using a single basic configuration file, which can be changed for each assay without having to change the code itself. We also describe one implementation of TOGGLE in a complete analysis pipeline designed for SNP discovery for large sets of genomic data, ready to use in different environments (from a single machine to HPC clusters). CONCLUSION TOGGLE speeds up the creation of robust pipelines with reliable log tracking and data flow, for a large range of analyses. Moreover, it enables Biologists to concentrate on the biological relevance of results, and change the experimental conditions easily. The whole code and test data are available at https://github.com/SouthGreenPlatform/TOGGLE .
Collapse
Affiliation(s)
- Cécile Monat
- UMR DIADE IRD/UM, 911 Avenue Agropolis, Montpellier Cedex 5, F-34934, France.
| | | | - Ayité Kougbeadjo
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | - Cédric Farcy
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | - Enrique Ortega-Abboud
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | | | - Sébastien Ravel
- UMR-BGPI CIRAD TA A-54/K, Campus International de Baillarguet, Montpellier Cedex 5, F-34398, France.
| | - Mawussé Agbessi
- UMR DIADE IRD/UM, 911 Avenue Agropolis, Montpellier Cedex 5, F-34934, France.
| | | | - Maryline Summo
- UMR AGAP CIRAD/INRA/SupAgro, TA A-108/03 - Avenue Agropolis, Montpellier Cedex 5, F-34398, France.
| | - François Sabot
- UMR DIADE IRD/UM, 911 Avenue Agropolis, Montpellier Cedex 5, F-34934, France.
| |
Collapse
|
258
|
Perales C, Quer J, Gregori J, Esteban JI, Domingo E. Resistance of Hepatitis C Virus to Inhibitors: Complexity and Clinical Implications. Viruses 2015; 7:5746-66. [PMID: 26561827 PMCID: PMC4664975 DOI: 10.3390/v7112902] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Revised: 10/23/2015] [Accepted: 10/26/2015] [Indexed: 12/20/2022] Open
Abstract
Selection of inhibitor-resistant viral mutants is universal for viruses that display quasi-species dynamics, and hepatitis C virus (HCV) is no exception. Here we review recent results on drug resistance in HCV, with emphasis on resistance to the newly-developed, directly-acting antiviral agents, as they are increasingly employed in the clinic. We put the experimental observations in the context of quasi-species dynamics, in particular what the genetic and phenotypic barriers to resistance mean in terms of exploration of sequence space while HCV replicates in the liver of infected patients or in cell culture. Strategies to diminish the probability of viral breakthrough during treatment are briefly outlined.
Collapse
Affiliation(s)
- Celia Perales
- Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain.
- Centro de Biologia Molecular "Severo Ochoa" (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), 08035 Barcelona, Spain.
| | - Josep Quer
- Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), 08035 Barcelona, Spain.
- Universitat Autònoma de Barcelona, Bellaterra 08193, Spain.
| | - Josep Gregori
- Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), 08035 Barcelona, Spain.
- Roche Diagnostics SL, 08174 Sant Cugat del Vallès, Spain.
| | - Juan Ignacio Esteban
- Liver Unit, Internal Medicine, Laboratory of Malalties Hepàtiques, Vall d'Hebron Institut de Recerca-Hospital Universitari Vall d'Hebron (VHIR-HUVH), Universitat Autònoma de Barcelona, 08035 Barcelona, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), 08035 Barcelona, Spain.
- Universitat Autònoma de Barcelona, Bellaterra 08193, Spain.
| | - Esteban Domingo
- Centro de Biologia Molecular "Severo Ochoa" (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), 08035 Barcelona, Spain.
| |
Collapse
|
259
|
Konopka T, Nijman SMB. Comparison of genetic variants in matched samples using thesaurus annotation. Bioinformatics 2015; 32:657-63. [PMID: 26545822 PMCID: PMC4795618 DOI: 10.1093/bioinformatics/btv654] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 10/30/2015] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Calling changes in DNA, e.g. as a result of somatic events in cancer, requires analysis of multiple matched sequenced samples. Events in low-mappability regions of the human genome are difficult to encode in variant call files and have been under-reported as a result. However, they can be described accurately through thesaurus annotation-a technique that links multiple genomic loci together to explicate a single variant. RESULTS We here describe software and benchmarks for using thesaurus annotation to detect point changes in DNA from matched samples. In benchmarks on matched normal/tumor samples we show that the technique can recover between five and ten percent more true events than conventional approaches, while strictly limiting false discovery and being fully consistent with popular variant analysis workflows. We also demonstrate the utility of the approach for analysis of de novo mutations in parents/child families. AVAILABILITY AND IMPLEMENTATION Software performing thesaurus annotation is implemented in java; available in source code on github at GeneticThesaurus (https://github.com/tkonopka/GeneticThesaurus) and as an executable on sourceforge at geneticthesaurus (https://sourceforge.net/projects/geneticthesaurus). Mutation calling is implemented in an R package available on github at RGeneticThesaurus (https://github.com/tkonopka/RGeneticThesaurus). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT tomasz.konopka@ludwig.ox.ac.uk.
Collapse
Affiliation(s)
- Tomasz Konopka
- Ludwig Institute for Cancer Research, University of Oxford, Oxford, UK
| | | |
Collapse
|
260
|
Kovatch P, Costa A, Giles Z, Fluder E, Cho HM, Mazurkova S. Big Omics Data Experience. SC ... CONFERENCE PROCEEDINGS. SC (CONFERENCE : SUPERCOMPUTING) 2015; 2015. [PMID: 30788464 DOI: 10.1145/2807591.2807595] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
As personalized medicine becomes more integrated into healthcare, the rate at which human genomes are being sequenced is rising quickly together with a concomitant acceleration in compute and storage requirements. To achieve the most effective solution for genomic workloads without re-architecting the industry-standard software, we performed a rigorous analysis of usage statistics, benchmarks and available technologies to design a system for maximum throughput. We share our experiences designing a system optimized for the "Genome Analysis ToolKit (GATK) Best Practices" whole genome DNA and RNA pipeline based on an evaluation of compute, workload and I/O characteristics. The characteristics of genomic-based workloads are vastly different from those of traditional HPC workloads, requiring different configurations of the scheduler and the I/O subsystem to achieve reliability, performance and scalability. By understanding how our researchers and clinicians work, we were able to employ techniques not only to speed up their workflow yielding improved and repeatable performance, but also to make more efficient use of storage and compute resources.
Collapse
Affiliation(s)
- Patricia Kovatch
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500
| | - Anthony Costa
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500
| | - Zachary Giles
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500
| | - Eugene Fluder
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500
| | - Hyung Min Cho
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500
| | - Svetlana Mazurkova
- Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Place, New York, NY 10029, 212-241-6500
| |
Collapse
|
261
|
Gu W, Gurguis CI, Zhou JJ, Zhu Y, Ko EA, Ko JH, Wang T, Zhou T. Functional and Structural Consequence of Rare Exonic Single Nucleotide Polymorphisms: One Story, Two Tales. Genome Biol Evol 2015; 7:2929-40. [PMID: 26454016 PMCID: PMC4684694 DOI: 10.1093/gbe/evv191] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2015] [Indexed: 01/01/2023] Open
Abstract
Genetic variation arising from single nucleotide polymorphisms (SNPs) is ubiquitously found among human populations. While disease-causing variants are known in some cases, identifying functional or causative variants for most human diseases remains a challenging task. Rare SNPs, rather than common ones, are thought to be more important in the pathology of most human diseases. We propose that rare SNPs should be divided into two categories dependent on whether the minor alleles are derived or ancestral. Derived alleles are less likely to have been purified by evolutionary processes and may be more likely to induce deleterious effects. We therefore hypothesized that the rare SNPs with derived minor alleles would be more important for human diseases and predicted that these variants would have larger functional or structural consequences relative to the rare variants for which the minor alleles are ancestral. We systematically investigated the consequences of the exonic SNPs on protein function, mRNA structure, and translation. We found that the functional and structural consequences are more significant for the rare exonic variants for which the minor alleles are derived. However, this pattern is reversed when the minor alleles are ancestral. Thus, the rare exonic SNPs with derived minor alleles are more likely to be deleterious. Age estimation of rare SNPs confirms that these potentially deleterious SNPs are recently evolved in the human population. These results have important implications for understanding the function of genetic variations in human exonic regions and for prioritizing functional SNPs in genome-wide association studies of human diseases.
Collapse
Affiliation(s)
- Wanjun Gu
- Research Center for Learning Sciences, Southeast University, Nanjing, Jiangsu, China
| | | | - Jin J Zhou
- Department of Epidemiology and Biostatistics, The University of Arizona
| | - Yihua Zhu
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, Jiangsu, China College of Information Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Eun-A Ko
- Department of Pharmacology, The University of Nevada School of Medicine, Reno
| | - Jae-Hong Ko
- Department of Physiology, College of Medicine, Chung-Ang University, Seoul, South Korea
| | - Ting Wang
- Department of Medicine, The University of Arizona
| | - Tong Zhou
- Department of Medicine, The University of Arizona
| |
Collapse
|
262
|
Cunha MLR, Meijers JCM, Middeldorp S. Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism. Thromb Haemost 2015; 114:920-32. [PMID: 26446408 DOI: 10.1160/th15-05-0411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/26/2015] [Indexed: 12/13/2022]
Abstract
Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.
Collapse
Affiliation(s)
- Marisa L R Cunha
- Marisa L. R. Cunha, Department of Experimental Vascular Medicine, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel.: +31 20 5662824, Fax: +31 20 6968833, E-mail:
| | | | | |
Collapse
|
263
|
Thangam M, Gopal RK. CRCDA--Comprehensive resources for cancer NGS data analysis. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav092. [PMID: 26450948 PMCID: PMC4597977 DOI: 10.1093/database/bav092] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2015] [Accepted: 08/31/2015] [Indexed: 12/24/2022]
Abstract
Next generation sequencing (NGS) innovations put a compelling landmark in life science and changed the direction of research in clinical oncology with its productivity to diagnose and treat cancer. The aim of our portal comprehensive resources for cancer NGS data analysis (CRCDA) is to provide a collection of different NGS tools and pipelines under diverse classes with cancer pathways and databases and furthermore, literature information from PubMed. The literature data was constrained to 18 most common cancer types such as breast cancer, colon cancer and other cancers that exhibit in worldwide population. NGS-cancer tools for the convenience have been categorized into cancer genomics, cancer transcriptomics, cancer epigenomics, quality control and visualization. Pipelines for variant detection, quality control and data analysis were listed to provide out-of-the box solution for NGS data analysis, which may help researchers to overcome challenges in selecting and configuring individual tools for analysing exome, whole genome and transcriptome data. An extensive search page was developed that can be queried by using (i) type of data [literature, gene data and sequence read archive (SRA) data] and (ii) type of cancer (selected based on global incidence and accessibility of data). For each category of analysis, variety of tools are available and the biggest challenge is in searching and using the right tool for the right application. The objective of the work is collecting tools in each category available at various places and arranging the tools and other data in a simple and user-friendly manner for biologists and oncologists to find information easier. To the best of our knowledge, we have collected and presented a comprehensive package of most of the resources available in cancer for NGS data analysis. Given these factors, we believe that this website will be an useful resource to the NGS research community working on cancer. Database URL: http://bioinfo.au-kbc.org.in/ngs/ngshome.html.
Collapse
Affiliation(s)
- Manonanthini Thangam
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| | - Ramesh Kumar Gopal
- AU-KBC Research Centre, MIT Campus of Anna University, Chromepet, Chennai, India
| |
Collapse
|
264
|
Song C, Castellanos-Rizaldos E, Bejar R, Ebert BL, Makrigiorgos GM. DMSO Increases Mutation Scanning Detection Sensitivity of High-Resolution Melting in Clinical Samples. Clin Chem 2015; 61:1354-62. [PMID: 26432802 DOI: 10.1373/clinchem.2015.245357] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Accepted: 08/17/2015] [Indexed: 02/02/2023]
Abstract
BACKGROUND Mutation scanning provides the simplest, lowest-cost method for identifying DNA variations on single PCR amplicons, and it may be performed before sequencing to avoid screening of noninformative wild-type samples. High-resolution melting (HRM) is the most commonly used method for mutation scanning. With PCR-HRM, however, mutations less abundant than approximately 3%-10% that can still be clinically significant may often be missed. Therefore, enhancing HRM detection sensitivity is important for mutation scanning and its clinical application. METHODS We used serial dilution of cell lines containing the TP53 exon 8 mutation to demonstrate the improvement in detection sensitivity for conventional-PCR-HRM in the presence of DMSO. We also conducted coamplification at lower denaturation temperature (COLD)-PCR with an extra step for cross-hybridization, followed by preferential denaturation and amplification at optimized critical temperature (full-COLD-PCR), to further enrich low-level mutations before HRM with or without DMSO, and we used droplet-digital PCR to derive the optimal conditions for mutation enrichment. Both conventional PCR-HRM and full-COLD-PCR-HRM with and without DMSO were used for mutation scanning of TP53 exon 8 in cancer samples containing known mutations and myelodysplastic syndrome samples with unknown mutations. Mutations in other genes were also examined. RESULTS The detection sensitivity of PCR-HRM scanning increases 2- to 5-fold in the presence of DMSO, depending on mutation type and sequence context, and can typically detect mutation abundance of approximately 1%. When mutation enrichment is applied during amplification with full-COLD-PCR followed by HRM in the presence of DMSO, mutations with 0.2%-0.3% abundance in TP53 exon 8 can be detected. CONCLUSIONS DMSO improves HRM mutation scanning sensitivity with saturating dyes. When full-COLD-PCR is used, followed by DMSO-HRM, the overall improvement is about 20-fold compared with conventional PCR-HRM.
Collapse
Affiliation(s)
- Chen Song
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Elena Castellanos-Rizaldos
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - Rafael Bejar
- Division of Hematology and Oncology, UCSD Moores Cancer Center, La Jolla, CA
| | - Benjamin L Ebert
- Division of Hematology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA
| | - G Mike Makrigiorgos
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Brigham and Women's Hospital, Harvard Medical School, Boston, MA;
| |
Collapse
|
265
|
Li J, Drubay D, Michiels S, Gautheret D. Mining the coding and non-coding genome for cancer drivers. Cancer Lett 2015; 369:307-15. [PMID: 26433158 DOI: 10.1016/j.canlet.2015.09.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 09/24/2015] [Accepted: 09/24/2015] [Indexed: 12/20/2022]
Abstract
Progress in next-generation sequencing provides unprecedented opportunities to fully characterize the spectrum of somatic mutations of cancer genomes. Given the large number of somatic mutations identified by such technologies, the prioritization of cancer-driving events is a consistent bottleneck. Most bioinformatics tools concentrate on driver mutations in the coding fraction of the genome, those causing changes in protein products. As more non-coding pathogenic variants are identified and characterized, the development of computational approaches to effectively prioritize cancer-driving variants within the non-coding fraction of human genome is becoming critical. After a short summary of methods for coding variant prioritization, we here review the highly diverse non-coding elements that may act as cancer drivers and describe recent methods that attempt to evaluate the deleteriousness of sequence variation in these elements. With such tools, the prioritization and identification of cancer-implicated regulatory elements and non-coding RNAs is becoming a reality.
Collapse
Affiliation(s)
- Jia Li
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Université Paris-Sud, Université Paris-Saclay, 91198 Gif sur Yvette, France
| | - Damien Drubay
- Service de Biostatistique et d'Epidemiologie, Gustave Roussy, Villejuif, France; INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, France
| | - Stefan Michiels
- Service de Biostatistique et d'Epidemiologie, Gustave Roussy, Villejuif, France; INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Université Paris-Sud, Université Paris-Saclay, 91198 Gif sur Yvette, France.
| |
Collapse
|
266
|
SeqMule: automated pipeline for analysis of human exome/genome sequencing data. Sci Rep 2015; 5:14283. [PMID: 26381817 PMCID: PMC4585643 DOI: 10.1038/srep14283] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 08/21/2015] [Indexed: 11/16/2022] Open
Abstract
Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.
Collapse
|
267
|
Wang X, Li X, Cheng Y, Sun X, Sun X, Self S, Kooperberg C, Dai JY. Copy number alterations detected by whole-exome and whole-genome sequencing of esophageal adenocarcinoma. Hum Genomics 2015; 9:22. [PMID: 26374103 PMCID: PMC4570720 DOI: 10.1186/s40246-015-0044-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 08/25/2015] [Indexed: 02/08/2023] Open
Abstract
Background Esophageal adenocarcinoma (EA) is among the leading causes of cancer mortality, especially in developed countries. A high level of somatic copy number alterations (CNAs) accumulates over the decades in the progression from Barrett’s esophagus, the precursor lesion, to EA. Accurate identification of somatic CNAs is essential to understand cancer development. Many studies have been conducted for the detection of CNA in EA using microarrays. Next-generation sequencing (NGS) technologies are believed to have advantages in sensitivity and accuracy to detect CNA, yet no NGS-based CNA detection in EA has been reported. Results In this study, we analyzed whole-exome (WES) and whole-genome sequencing (WGS) data for detecting CNA from a published large-scale genomic study of EA. Two specific comparisons were conducted. First, the recurrent CNAs based on WGS and WES data from 145 EA samples were compared to those found in five previous microarray-based studies. We found that the majority of the previously identified regions were also detected in this study. Interestingly, some novel amplifications and deletions were discovered using the NGS data. In particular, SKI and PRKCZ detected in a deletion region are involved in transforming growth factor-β pathway, suggesting the potential utility of novel biomarkers for EA. Second, we compared CNAs detected in WGS and WES data from the same 15 EA samples. No large-scale CNA was identified statistically more frequently by WES or WGS, while more focal-scale CNAs were detected by WGS than by WES. Conclusions Our results suggest that NGS can replace microarrays to detect CNA in EA. WGS is superior to WES in that it can offer finer resolution for the detection, though if the interest is on recurrent CNAs, WES can be preferable to WGS for its cost-effectiveness. Electronic supplementary material The online version of this article (doi:10.1186/s40246-015-0044-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyu Wang
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Xiaohong Li
- Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. .,Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Yichen Cheng
- Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Xin Sun
- Institute of Occupational Health and Poison Control, Chinese Center for Disease Control and Prevention, Beijing, China.
| | - Xibin Sun
- Henan Office for Cancer Research and Control, Henan Cancer Hospital, Zhengzhou, Henan, China.
| | - Steve Self
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - Charles Kooperberg
- Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| | - James Y Dai
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. .,Public Health Science Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
| |
Collapse
|
268
|
Hart SN, Maxwell KN, Thomas T, Ravichandran V, Wubberhorst B, Klein RJ, Schrader K, Szabo C, Weitzel JN, Neuhausen SL, Nathanson K, Offit K, Couch FJ, Vijai J. Collaborative science in the next-generation sequencing era: a viewpoint on how to combine exome sequencing data across sites to identify novel disease susceptibility genes. Brief Bioinform 2015; 17:672-7. [PMID: 26358132 DOI: 10.1093/bib/bbv075] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2015] [Indexed: 11/14/2022] Open
Abstract
The purpose of this article is to inform readers about technical challenges that we encountered when assembling exome sequencing data from the 'Simplifying Complex Exomes' (SIMPLEXO) consortium-whose mandate is the discovery of novel genes predisposing to breast and ovarian cancers. Our motivation is to share these obstacles-and our solutions to them-as a means of communicating important technical details that should be discussed early in projects involving massively parallel sequencing.
Collapse
|
269
|
Meerzaman D, Dunn BK, Lee M, Chen Q, Yan C, Ross S. The promise of omics-based approaches to cancer prevention. Semin Oncol 2015; 43:36-48. [PMID: 26970123 DOI: 10.1053/j.seminoncol.2015.09.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Cancer is a complex category of diseases caused in large part by genetic or genomic, transcriptomic, and epigenetic or epigenomic alterations in affected cells and the surrounding microenvironment. Carcinogenesis reflects the clonal expansion of cells that progressively acquire these genetic and epigenetic alterations-changes that, in turn, lead to modifications at the RNA level. Gradually advancing technology and most recently, the advent of next-generation sequencing (NGS), combined with bioinformatics analytic tools, have revolutionized our ability to interrogate cancer cells. The ultimate goal is to apply these high-throughput technologies to the various aspects of clinical cancer care: cancer-risk assessment, diagnosis, as well as target identification for treatment and prevention. In this article, we emphasize how the knowledge gained through large-scale omics-oriented approaches, with a focus on variations at the level of nucleic acids, can inform the field of chemoprevention.
Collapse
Affiliation(s)
- Daoud Meerzaman
- Center for Biomedical Informatics & Information Technology, Computational Genomics and Bioinformatics Group, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA.
| | - Barbara K Dunn
- Chemoprevention Agent Development Research Group, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Maxwell Lee
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Qingrong Chen
- Center for Biomedical Informatics & Information Technology, Computational Genomics and Bioinformatics Group, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| | - Chunhua Yan
- Center for Biomedical Informatics & Information Technology, Computational Genomics and Bioinformatics Group, National Cancer Institute, National Institutes of Health, Rockville, MD 20852, USA
| | - Sharon Ross
- Chemoprevention Agent Development Research Group, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
270
|
Pranckevičiene E, Rančelis T, Pranculis A, Kučinskas V. Challenges in exome analysis by LifeScope and its alternative computational pipelines. BMC Res Notes 2015; 8:421. [PMID: 26346699 PMCID: PMC4562342 DOI: 10.1186/s13104-015-1385-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2015] [Accepted: 08/24/2015] [Indexed: 12/22/2022] Open
Abstract
Background Every next generation sequencing (NGS) platform relies on proprietary and open source computational tools to analyze sequencing data. NGS tools for Illumina platforms are well documented which is not the case with AB SOLiD systems. We applied several computational and variant calling pipelines to analyse targeted exome sequencing data obtained using AB SOLiD 5500 system. Our investigated tools comprised proprietary LifeScope’s pipeline in combination with open source color-space competent mapping programs and a variant caller. We present instrumental details of the pipelines that were used and quantitative comparative analysis of variant lists generated by LifeScope’s pipeline versus open source tools. Results Sufficient coverage of targeted regions was achieved by all investigated pipelines. High variability was observed in identities of variants across the mapping programs. We observed less than 50 % concordance of variant lists produced by approaches based on different mapping algorithms. We summarized different approaches with regards to coverage (DP) and quality (QUAL) properties of the variants provided by GATK and found that LifeScope’s computational pipeline is superior. Fusion of information on mapping profiles (pileup) at genomic positions of variants in several different alignments proved to be a useful strategy to assess questionable singleton variants. Conclusions We quantitatively supported a conclusion that Lifescope’s pipeline is superior for processing sequencing data obtained by AB SOLiD 5500 system. Nevertheless the use of alternative pipelines is encouraged because aggregation of information from other mapping and variant calling approaches helps to resolve questionable calls and increases the confidence of the call. It was noted that a coverage threshold for variant to be considered for further analysis has to be chosen in data-driven way to prevent a loss of important information.
Collapse
Affiliation(s)
- Erinija Pranckevičiene
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania.
| | - Tautvydas Rančelis
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania.
| | - Aidas Pranculis
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania.
| | - Vaidutis Kučinskas
- Department of Human and Medical Genetics, Faculty of Medicine, Vilnius University, Santariskiu str. 2, LT-08661, Vilnius, Lithuania.
| |
Collapse
|
271
|
Correll M, Bailey AL, Sarikaya A, O'Connor DH, Gleicher M. LayerCake: a tool for the visual comparison of viral deep sequencing data. Bioinformatics 2015; 31:3522-8. [PMID: 26153515 DOI: 10.1093/bioinformatics/btv407] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 07/01/2015] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION The advent of next-generation sequencing (NGS) has created unprecedented opportunities to examine viral populations within individual hosts, among infected individuals and over time. Comparing sequence variability across viral genomes allows for the construction of complex population structures, the analysis of which can yield powerful biological insights. However, the simultaneous display of sequence variation, coverage depth and quality scores across thousands of bases presents a unique visualization challenge that has not been fully met by current NGS analysis tools. RESULTS Here, we present LayerCake, a self-contained visualization tool that allows for the rapid analysis of variation in viral NGS data. LayerCake enables the user to simultaneously visualize variations in multiple viral populations across entire genomes within a highly customizable framework, drawing attention to pertinent and interesting patterns of variation. We have successfully deployed LayerCake to assist with a variety of different genomics datasets. AVAILABILITY AND IMPLEMENTATION Program downloads and detailed instructions are available at http://graphics.cs.wisc.edu/WP/layercake under a modified MIT license. LayerCake is a cross-platform tool written in the Processing framework for Java. CONTACT mcorrell@cs.wisc.edu.
Collapse
Affiliation(s)
| | - Adam L Bailey
- Department of Pathology and Laboratory Medicine, University of Wisconsin, Madison, Madison, WI 53706, USA
| | | | - David H O'Connor
- Department of Pathology and Laboratory Medicine, University of Wisconsin, Madison, Madison, WI 53706, USA
| | | |
Collapse
|
272
|
Olson ND, Lund SP, Colman RE, Foster JT, Sahl JW, Schupp JM, Keim P, Morrow JB, Salit ML, Zook JM. Best practices for evaluating single nucleotide variant calling methods for microbial genomics. Front Genet 2015. [PMID: 26217378 PMCID: PMC4493402 DOI: 10.3389/fgene.2015.00235] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards.
Collapse
Affiliation(s)
- Nathan D Olson
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Steven P Lund
- Statistical Engineering Division, Information Technology Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Rebecca E Colman
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Jeffrey T Foster
- Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jason W Sahl
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - James M Schupp
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA
| | - Paul Keim
- Division of Pathogen Genomics, Translational Genomics Research Institute , Flagstaff, AZ, USA ; Center for Microbial Genetics and Genomics, Northern Arizona University , Flagstaff, AZ, USA
| | - Jayne B Morrow
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| | - Marc L Salit
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA ; Department of Bioengineering, Stanford University , Stanford, CA, USA
| | - Justin M Zook
- Biosystems and Biomaterials Division, Material Measurement Laboratory, National Institute of Standards and Technology , Gaithersburg, MD, USA
| |
Collapse
|
273
|
Oliveira TG, Mitne-Neto M, Cerdeira LT, Marsiglia JD, Arteaga-Fernandez E, Krieger JE, Pereira AC. A Variant Detection Pipeline for Inherited Cardiomyopathy–Associated Genes Using Next-Generation Sequencing. J Mol Diagn 2015; 17:420-30. [DOI: 10.1016/j.jmoldx.2015.02.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2014] [Revised: 02/11/2015] [Accepted: 02/26/2015] [Indexed: 01/26/2023] Open
|
274
|
Tattini L, D'Aurizio R, Magi A. Detection of Genomic Structural Variants from Next-Generation Sequencing Data. Front Bioeng Biotechnol 2015; 3:92. [PMID: 26161383 PMCID: PMC4479793 DOI: 10.3389/fbioe.2015.00092] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2014] [Accepted: 06/10/2015] [Indexed: 01/16/2023] Open
Abstract
Structural variants are genomic rearrangements larger than 50 bp accounting for around 1% of the variation among human genomes. They impact on phenotypic diversity and play a role in various diseases including neurological/neurocognitive disorders and cancer development and progression. Dissecting structural variants from next-generation sequencing data presents several challenges and a number of approaches have been proposed in the literature. In this mini review, we describe and summarize the latest tools – and their underlying algorithms – designed for the analysis of whole-genome sequencing, whole-exome sequencing, custom captures, and amplicon sequencing data, pointing out the major advantages/drawbacks. We also report a summary of the most recent applications of third-generation sequencing platforms. This assessment provides a guided indication – with particular emphasis on human genetics and copy number variants – for researchers involved in the investigation of these genomic events.
Collapse
Affiliation(s)
- Lorenzo Tattini
- Department of Neurosciences, Psychology, Pharmacology and Child Health, University of Florence , Florence , Italy
| | - Romina D'Aurizio
- Laboratory of Integrative Systems Medicine (LISM), Institute of Informatics and Telematics and Institute of Clinical Physiology, National Research Council , Pisa , Italy
| | - Alberto Magi
- Department of Clinical and Experimental Medicine, University of Florence , Florence , Italy
| |
Collapse
|
275
|
Tian R, Basu MK, Capriotti E. Computational methods and resources for the interpretation of genomic variants in cancer. BMC Genomics 2015; 16 Suppl 8:S7. [PMID: 26111056 PMCID: PMC4480958 DOI: 10.1186/1471-2164-16-s8-s7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
The recent improvement of the high-throughput sequencing technologies is having a strong impact on the detection of genetic variations associated with cancer. Several institutions worldwide have been sequencing the whole exomes and or genomes of cancer patients in the thousands, thereby providing an invaluable collection of new somatic mutations in different cancer types. These initiatives promoted the development of methods and tools for the analysis of cancer genomes that are aimed at studying the relationship between genotype and phenotype in cancer. In this article we review the online resources and computational tools for the analysis of cancer genome. First, we describe the available repositories of cancer genome data. Next, we provide an overview of the methods for the detection of genetic variation and computational tools for the prioritization of cancer related genes and causative somatic variations. Finally, we discuss the future perspectives in cancer genomics focusing on the impact of computational methods and quantitative approaches for defining personalized strategies to improve the diagnosis and treatment of cancer.
Collapse
|
276
|
Mielczarek M, Szyda J. Review of alignment and SNP calling algorithms for next-generation sequencing data. J Appl Genet 2015; 57:71-9. [PMID: 26055432 DOI: 10.1007/s13353-015-0292-7] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Revised: 02/27/2015] [Accepted: 05/15/2015] [Indexed: 01/21/2023]
Abstract
Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.
Collapse
Affiliation(s)
- M Mielczarek
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland.
| | - J Szyda
- Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland
| |
Collapse
|
277
|
Abstract
Molecular informatics (MI) is an evolving discipline that will support the dynamic landscape of molecular pathology and personalized medicine. MI provides a fertile ground for development of clinical solutions to bridge the gap between clinical informatics and bioinformatics. Rapid adoption of next generation sequencing (NGS) in the clinical arena has triggered major endeavors in MI that are expected to bring a paradigm shift in the practice of pathology. This brief review presents a broad overview of various aspects of MI, particularly in the context of NGS based testing.
Collapse
Affiliation(s)
- Somak Roy
- Department of Pathology, Molecular and Genomic Pathology, University of Pittsburgh Medical Center, 3477 Euler way, Pittsburgh, PA 15213, USA.
| |
Collapse
|
278
|
Milner LC, Garrison NA, Cho MK, Altman RB, Hudgins L, Galli SJ, Lowe HJ, Schrijver I, Magnus DC. Genomics in the clinic: ethical and policy challenges in clinical next-generation sequencing programs at early adopter USA institutions. Per Med 2015; 12:269-282. [PMID: 29771644 DOI: 10.2217/pme.14.88] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Next-generation sequencing (NGS) technologies are poised to revolutionize clinical diagnosis and treatment, but raise significant ethical and policy challenges. This review examines NGS program challenges through a synthesis of published literature, website and conference presentation content, and interviews at early-adopting institutions in the USA. Institutions are proactively addressing policy challenges related to the management and technical aspects of program development. However, ethical challenges related to patient-related aspects have not been fully addressed. These complex challenges present opportunities to develop comprehensive and standardized regulations across programs. Understanding the strengths, weaknesses and current practices of evolving NGS program approaches are important considerations for institutions developing NGS services, policymakers regulating or funding NGS programs and physicians and patients considering NGS services.
Collapse
Affiliation(s)
- Lauren C Milner
- Stanford Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, CA, USA
| | - Nanibaa' A Garrison
- Stanford Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, CA, USA.,Center for Biomedical Ethics & Society, Departments of Pediatrics & Anthropology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Mildred K Cho
- Stanford Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, CA, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University School of Medicine, Stanford, CA, USA.,Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA.,Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Louanne Hudgins
- Division of Medical Genetics, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| | - Stephen J Galli
- Stanford Center for Genomics & Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA.,Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.,Department of Microbiology & Immunology, Stanford University School of Medicine, Stanford, CA, USA
| | - Henry J Lowe
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Iris Schrijver
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA.,Stanford Center for Genomics & Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - David C Magnus
- Stanford Center for Biomedical Ethics, Stanford University School of Medicine, Stanford, CA, USA.,Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
279
|
Clevenger J, Chavarro C, Pearl SA, Ozias-Akins P, Jackson SA. Single Nucleotide Polymorphism Identification in Polyploids: A Review, Example, and Recommendations. MOLECULAR PLANT 2015; 8:831-46. [PMID: 25676455 DOI: 10.1016/j.molp.2015.02.002] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 01/21/2015] [Accepted: 02/01/2015] [Indexed: 05/23/2023]
Abstract
Understanding the relationship between genotype and phenotype is a major biological question and being able to predict phenotypes based on molecular genotypes is integral to molecular breeding. Whole-genome duplications have shaped the history of all flowering plants and present challenges to elucidating the relationship between genotype and phenotype, especially in neopolyploid species. Although single nucleotide polymorphisms (SNPs) have become popular tools for genetic mapping, discovery and application of SNPs in polyploids has been difficult. Here, we summarize common experimental approaches to SNP calling, highlighting recent polyploid successes. To examine the impact of software choice on these analyses, we called SNPs among five peanut genotypes using different alignment programs (BWA-mem and Bowtie 2) and variant callers (SAMtools, GATK, and Freebayes). Alignments produced by Bowtie 2 and BWA-mem and analyzed in SAMtools shared 24.5% concordant SNPs, and SAMtools, GATK, and Freebayes shared 1.4% concordant SNPs. A subsequent analysis of simulated Brassica napus chromosome 1A and 1C genotypes demonstrated that, of the three software programs, SAMtools performed with the highest sensitivity and specificity on Bowtie 2 alignments. These results, however, are likely to vary among species, and we therefore propose a series of best practices for SNP calling in polyploids.
Collapse
Affiliation(s)
- Josh Clevenger
- Institute of Plant Breeding, Genetics & Genomics, University of Georgia, Tifton, GA 31793, USA
| | - Carolina Chavarro
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Stephanie A Pearl
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA
| | - Peggy Ozias-Akins
- Institute of Plant Breeding, Genetics & Genomics, University of Georgia, Tifton, GA 31793, USA.
| | - Scott A Jackson
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA 30602, USA.
| |
Collapse
|
280
|
Betge J, Kerr G, Miersch T, Leible S, Erdmann G, Galata CL, Zhan T, Gaiser T, Post S, Ebert MP, Horisberger K, Boutros M. Amplicon sequencing of colorectal cancer: variant calling in frozen and formalin-fixed samples. PLoS One 2015; 10:e0127146. [PMID: 26010451 PMCID: PMC4444292 DOI: 10.1371/journal.pone.0127146] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2015] [Accepted: 04/13/2015] [Indexed: 12/21/2022] Open
Abstract
Next generation sequencing (NGS) is an emerging technology becoming relevant for genotyping of clinical samples. Here, we assessed the stability of amplicon sequencing from formalin-fixed paraffin-embedded (FFPE) and paired frozen samples from colorectal cancer metastases with different analysis pipelines. 212 amplicon regions in 48 cancer related genes were sequenced with Illumina MiSeq using DNA isolated from resection specimens from 17 patients with colorectal cancer liver metastases. From ten of these patients, paired fresh frozen and routinely processed FFPE tissue was available for comparative study. Sample quality of FFPE tissues was determined by the amount of amplifiable DNA using qPCR, sequencing libraries were evaluated using Bioanalyzer. Three bioinformatic pipelines were compared for analysis of amplicon sequencing data. Selected hot spot mutations were reviewed using Sanger sequencing. In the sequenced samples from 16 patients, 29 non-synonymous coding mutations were identified in eleven genes. Most frequent were mutations in TP53 (10), APC (7), PIK3CA (3) and KRAS (2). A high concordance of FFPE and paired frozen tissue samples was observed in ten matched samples, revealing 21 identical mutation calls and only two mutations differing. Comparison of these results with two other commonly used variant calling tools, however, showed high discrepancies. Hence, amplicon sequencing can potentially be used to identify hot spot mutations in colorectal cancer metastases in frozen and FFPE tissue. However, remarkable differences exist among results of different variant calling tools, which are not only related to DNA sample quality. Our study highlights the need for standardization and benchmarking of variant calling pipelines, which will be required for translational and clinical applications.
Collapse
Affiliation(s)
- Johannes Betge
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
- Department of Medicine II, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
- * E-mail: ;
| | - Grainne Kerr
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Thilo Miersch
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Svenja Leible
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Gerrit Erdmann
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Christian L. Galata
- Department of Surgery, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Tianzuo Zhan
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
- Department of Medicine II, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Timo Gaiser
- Institue of Pathology, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Stefan Post
- Department of Surgery, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Matthias P. Ebert
- Department of Medicine II, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Karoline Horisberger
- Department of Surgery, University Hospital Mannheim, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Michael Boutros
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Department of Cell and Molecular Biology, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
- * E-mail: ;
| |
Collapse
|
281
|
Cui H, Dhroso A, Johnson N, Korkin D. The variation game: Cracking complex genetic disorders with NGS and omics data. Methods 2015; 79-80:18-31. [PMID: 25944472 DOI: 10.1016/j.ymeth.2015.04.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2014] [Revised: 03/27/2015] [Accepted: 04/17/2015] [Indexed: 12/14/2022] Open
Abstract
Tremendous advances in Next Generation Sequencing (NGS) and high-throughput omics methods have brought us one step closer towards mechanistic understanding of the complex disease at the molecular level. In this review, we discuss four basic regulatory mechanisms implicated in complex genetic diseases, such as cancer, neurological disorders, heart disease, diabetes, and many others. The mechanisms, including genetic variations, copy-number variations, posttranscriptional variations, and epigenetic variations, can be detected using a variety of NGS methods. We propose that malfunctions detected in these mechanisms are not necessarily independent, since these malfunctions are often found associated with the same disease and targeting the same gene, group of genes, or functional pathway. As an example, we discuss possible rewiring effects of the cancer-associated genetic, structural, and posttranscriptional variations on the protein-protein interaction (PPI) network centered around P53 protein. The review highlights multi-layered complexity of common genetic disorders and suggests that integration of NGS and omics data is a critical step in developing new computational methods capable of deciphering this complexity.
Collapse
Affiliation(s)
- Hongzhu Cui
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Andi Dhroso
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Nathan Johnson
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| | - Dmitry Korkin
- Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States; Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, United States
| |
Collapse
|
282
|
Bertolini F, Ghionda MC, D’Alessandro E, Geraci C, Chiofalo V, Fontanesi L. A next generation semiconductor based sequencing approach for the identification of meat species in DNA mixtures. PLoS One 2015; 10:e0121701. [PMID: 25923709 PMCID: PMC4414512 DOI: 10.1371/journal.pone.0121701] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 02/14/2015] [Indexed: 11/18/2022] Open
Abstract
The identification of the species of origin of meat and meat products is an important issue to prevent and detect frauds that might have economic, ethical and health implications. In this paper we evaluated the potential of the next generation semiconductor based sequencing technology (Ion Torrent Personal Genome Machine) for the identification of DNA from meat species (pig, horse, cattle, sheep, rabbit, chicken, turkey, pheasant, duck, goose and pigeon) as well as from human and rat in DNA mixtures through the sequencing of PCR products obtained from different couples of universal primers that amplify 12S and 16S rRNA mitochondrial DNA genes. Six libraries were produced including PCR products obtained separately from 13 species or from DNA mixtures containing DNA from all species or only avian or only mammalian species at equimolar concentration or at 1:10 or 1:50 ratios for pig and horse DNA. Sequencing obtained a total of 33,294,511 called nucleotides of which 29,109,688 with Q20 (87.43%) in a total of 215,944 reads. Different alignment algorithms were used to assign the species based on sequence data. Error rate calculated after confirmation of the obtained sequences by Sanger sequencing ranged from 0.0003 to 0.02 for the different species. Correlation about the number of reads per species between different libraries was high for mammalian species (0.97) and lower for avian species (0.70). PCR competition limited the efficiency of amplification and sequencing for avian species for some primer pairs. Detection of low level of pig and horse DNA was possible with reads obtained from different primer pairs. The sequencing of the products obtained from different universal PCR primers could be a useful strategy to overcome potential problems of amplification. Based on these results, the Ion Torrent technology can be applied for the identification of meat species in DNA mixtures.
Collapse
Affiliation(s)
- Francesca Bertolini
- Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127, Bologna, Italy
| | - Marco Ciro Ghionda
- Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127, Bologna, Italy
- Department of Veterinary Sciences, Animal Production Unit, University of Messina, Polo Universitario dell'Annunziata, 98168, Messina, Italy
| | - Enrico D’Alessandro
- Department of Veterinary Sciences, Animal Production Unit, University of Messina, Polo Universitario dell'Annunziata, 98168, Messina, Italy
| | - Claudia Geraci
- Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127, Bologna, Italy
| | - Vincenzo Chiofalo
- Department of Veterinary Sciences, Animal Production Unit, University of Messina, Polo Universitario dell'Annunziata, 98168, Messina, Italy
- Meat Research Consortium, Polo Universitario dell’Annunziata, 98168, Messina, Italy
| | - Luca Fontanesi
- Department of Agricultural and Food Sciences, Division of Animal Sciences, University of Bologna, Viale Fanin 46, 40127, Bologna, Italy
- * E-mail:
| |
Collapse
|
283
|
English AC, Salerno WJ, Hampton OA, Gonzaga-Jauregui C, Ambreth S, Ritter DI, Beck CR, Davis CF, Dahdouli M, Ma S, Carroll A, Veeraraghavan N, Bruestle J, Drees B, Hastie A, Lam ET, White S, Mishra P, Wang M, Han Y, Zhang F, Stankiewicz P, Wheeler DA, Reid JG, Muzny DM, Rogers J, Sabo A, Worley KC, Lupski JR, Boerwinkle E, Gibbs RA. Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics 2015; 16:286. [PMID: 25886820 PMCID: PMC4490614 DOI: 10.1186/s12864-015-1479-3] [Citation(s) in RCA: 108] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 03/23/2015] [Indexed: 01/19/2023] Open
Abstract
Background Characterizing large genomic variants is essential to expanding the research and clinical applications of genome sequencing. While multiple data types and methods are available to detect these structural variants (SVs), they remain less characterized than smaller variants because of SV diversity, complexity, and size. These challenges are exacerbated by the experimental and computational demands of SV analysis. Here, we characterize the SV content of a personal genome with Parliament, a publicly available consensus SV-calling infrastructure that merges multiple data types and SV detection methods. Results We demonstrate Parliament’s efficacy via integrated analyses of data from whole-genome array comparative genomic hybridization, short-read next-generation sequencing, long-read (Pacific BioSciences RSII), long-insert (Illumina Nextera), and whole-genome architecture (BioNano Irys) data from the personal genome of a single subject (HS1011). From this genome, Parliament identified 31,007 genomic loci between 100 bp and 1 Mbp that are inconsistent with the hg19 reference assembly. Of these loci, 9,777 are supported as putative SVs by hybrid local assembly, long-read PacBio data, or multi-source heuristics. These SVs span 59 Mbp of the reference genome (1.8%) and include 3,801 events identified only with long-read data. The HS1011 data and complete Parliament infrastructure, including a BAM-to-SV workflow, are available on the cloud-based service DNAnexus. Conclusions HS1011 SV analysis reveals the limits and advantages of multiple sequencing technologies, specifically the impact of long-read SV discovery. With the full Parliament infrastructure, the HS1011 data constitute a public resource for novel SV discovery, software calibration, and personal genome structural variation analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1479-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Adam C English
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - William J Salerno
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Oliver A Hampton
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Claudia Gonzaga-Jauregui
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Shruthi Ambreth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Deborah I Ritter
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Christine R Beck
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Caleb F Davis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Mahmoud Dahdouli
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Singer Ma
- DNAnexus, Mountain View, CA, 94040, USA.
| | | | | | | | - Becky Drees
- Spiral Genetics Inc, Seattle, WA, 98117, USA.
| | - Alex Hastie
- BioNano Genomics Inc, San Diego, CA, 92121, USA.
| | - Ernest T Lam
- BioNano Genomics Inc, San Diego, CA, 92121, USA.
| | - Simon White
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Pamela Mishra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Min Wang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Feng Zhang
- Collaborative Innovation Center of Genetics and Development, School of Life Sciences, Fudan University, Shanghai, 200438, China.
| | - Pawel Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - David A Wheeler
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Jeffrey G Reid
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Donna M Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Jeffrey Rogers
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Aniko Sabo
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - Kim C Worley
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| | - James R Lupski
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Pediatrics, Baylor College of Medicine, Houston, TX 77030, USA. .,Texas Children's Hospital, Houston, TX, 77030, USA.
| | - Eric Boerwinkle
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. .,Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
| |
Collapse
|
284
|
|
285
|
Belcaid M, Toonen RJ. Demystifying computer science for molecular ecologists. Mol Ecol 2015; 24:2619-40. [PMID: 25824671 DOI: 10.1111/mec.13175] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 03/23/2015] [Accepted: 03/25/2015] [Indexed: 11/30/2022]
Abstract
In this age of data-driven science and high-throughput biology, computational thinking is becoming an increasingly important skill for tackling both new and long-standing biological questions. However, despite its obvious importance and conspicuous integration into many areas of biology, computer science is still viewed as an obscure field that has, thus far, permeated into only a few of the biology curricula across the nation. A national survey has shown that lack of computational literacy in environmental sciences is the norm rather than the exception [Valle & Berdanier (2012) Bulletin of the Ecological Society of America, 93, 373-389]. In this article, we seek to introduce a few important concepts in computer science with the aim of providing a context-specific introduction aimed at research biologists. Our goal was to help biologists understand some of the most important mainstream computational concepts to better appreciate bioinformatics methods and trade-offs that are not obvious to the uninitiated.
Collapse
Affiliation(s)
- Mahdi Belcaid
- The Hawai'i Institute of Marine Biology, P.O. Box 1346, Kane'ohe, HI, 96744, USA
| | - Robert J Toonen
- The Hawai'i Institute of Marine Biology, P.O. Box 1346, Kane'ohe, HI, 96744, USA
| |
Collapse
|
286
|
Kerzendorfer C, Konopka T, Nijman SMB. A thesaurus of genetic variation for interrogation of repetitive genomic regions. Nucleic Acids Res 2015; 43:e68. [PMID: 25820428 PMCID: PMC4446415 DOI: 10.1093/nar/gkv178] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2014] [Accepted: 02/22/2015] [Indexed: 01/11/2023] Open
Abstract
Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.
Collapse
Affiliation(s)
- Claudia Kerzendorfer
- Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria
| | - Tomasz Konopka
- Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria
| | - Sebastian M B Nijman
- Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria
| |
Collapse
|
287
|
Leung WY, Marschall T, Paudel Y, Falquet L, Mei H, Schönhuth A, Maoz Moss TY. SV-AUTOPILOT: optimized, automated construction of structural variation discovery and benchmarking pipelines. BMC Genomics 2015; 16:238. [PMID: 25887570 PMCID: PMC4520269 DOI: 10.1186/s12864-015-1376-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2014] [Accepted: 02/21/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many tools exist to predict structural variants (SVs), utilizing a variety of algorithms. However, they have largely been developed and tested on human germline or somatic (e.g. cancer) variation. It seems appropriate to exploit this wealth of technology available for humans also for other species. Objectives of this work included: a) Creating an automated, standardized pipeline for SV prediction. b) Identifying the best tool(s) for SV prediction through benchmarking. c) Providing a statistically sound method for merging SV calls. RESULTS The SV-AUTOPILOT meta-tool platform is an automated pipeline for standardization of SV prediction and SV tool development in paired-end next-generation sequencing (NGS) analysis. SV-AUTOPILOT comes in the form of a virtual machine, which includes all datasets, tools and algorithms presented here. The virtual machine easily allows one to add, replace and update genomes, SV callers and post-processing routines and therefore provides an easy, out-of-the-box environment for complex SV discovery tasks. SV-AUTOPILOT was used to make a direct comparison between 7 popular SV tools on the Arabidopsis thaliana genome using the Landsberg (Ler) ecotype as a standardized dataset. Recall and precision measurements suggest that Pindel and Clever were the most adaptable to this dataset across all size ranges while Delly performed well for SVs larger than 250 nucleotides. A novel, statistically-sound merging process, which can control the false discovery rate, reduced the false positive rate on the Arabidopsis benchmark dataset used here by >60%. CONCLUSION SV-AUTOPILOT provides a meta-tool platform for future SV tool development and the benchmarking of tools on other genomes using a standardized pipeline. It optimizes detection of SVs in non-human genomes using statistically robust merging. The benchmarking in this study has demonstrated the power of 7 different SV tools for analyzing different size classes and types of structural variants. The optional merge feature enriches the call set and reduces false positives providing added benefit to researchers planning to validate SVs. SV-AUTOPILOT is a powerful, new meta-tool for biologists as well as SV tool developers.
Collapse
Affiliation(s)
- Wai Yi Leung
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, The Netherlands.
| | - Tobias Marschall
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany. .,Max Planck Institute for Informatics, Saarbrücken, Germany. .,Centrum Wiskunde and Informatica, Amsterdam, The Netherlands.
| | - Yogesh Paudel
- Animal Breeding and Genomics Centre, Wageningen University, Wageningen, The Netherlands.
| | - Laurent Falquet
- University of Fribourg and Swiss Institute of Bioinformatics, Fribourg, Switzerland.
| | - Hailiang Mei
- Sequencing Analysis Support Core, Leiden University Medical Center, Leiden, The Netherlands.
| | | | | |
Collapse
|
288
|
Cabañes FJ, Sanseverino W, Castellá G, Bragulat MR, Cigliano RA, Sánchez A. Rapid genome resequencing of an atoxigenic strain of Aspergillus carbonarius. Sci Rep 2015; 5:9086. [PMID: 25765923 PMCID: PMC4358045 DOI: 10.1038/srep09086] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 02/18/2015] [Indexed: 12/18/2022] Open
Abstract
In microorganisms, Ion Torrent sequencing technology has been proved to be useful in whole-genome sequencing of bacterial genomes (5 Mbp). In our study, for the first time we used this technology to perform a resequencing approach in a whole fungal genome (36 Mbp), a non-ochratoxin A producing strain of Aspergillus carbonarius. Ochratoxin A (OTA) is a potent nephrotoxin which is found mainly in cereals and their products, but it also occurs in a variety of common foods and beverages. Due to the fact that this strain does not produce OTA, we focused some of the bioinformatics analyses in genes involved in OTA biosynthesis, using a reference genome of an OTA producing strain of the same species. This study revealed that in the atoxigenic strain there is a high accumulation of nonsense and missense mutations in several genes. Importantly, a two fold increase in gene mutation ratio was observed in PKS and NRPS encoding genes which are suggested to be involved in OTA biosynthesis.
Collapse
Affiliation(s)
- F Javier Cabañes
- Veterinary Mycology Group, Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra, Catalonia, Spain
| | | | - Gemma Castellá
- Veterinary Mycology Group, Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra, Catalonia, Spain
| | - M Rosa Bragulat
- Veterinary Mycology Group, Department of Animal Health and Anatomy, Universitat Autònoma de Barcelona, Bellaterra, Catalonia, Spain
| | | | - Armand Sánchez
- Departament de Genètica Animal, Centre de Recerca en AgriGenòmica (CRAG), CSIC-IRTA-UAB-UB, Universitat Autònoma de Barcelona, Bellaterra, Catalonia, Spain
| |
Collapse
|
289
|
Hoffmann S, Stadler PF, Strimmer K. A simple data-adaptive probabilistic variant calling model. Algorithms Mol Biol 2015; 10:10. [PMID: 25788974 PMCID: PMC4363181 DOI: 10.1186/s13015-015-0037-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 01/11/2015] [Indexed: 11/30/2022] Open
Abstract
Background Several sources of noise obfuscate the identification of single nucleotide variation (SNV) in next generation sequencing data. For instance, errors may be introduced during library construction and sequencing steps. In addition, the reference genome and the algorithms used for the alignment of the reads are further critical factors determining the efficacy of variant calling methods. It is crucial to account for these factors in individual sequencing experiments. Results We introduce a simple data-adaptive model for variant calling. This model automatically adjusts to specific factors such as alignment errors. To achieve this, several characteristics are sampled from sites with low mismatch rates, and these are used to estimate empirical log-likelihoods. The likelihoods are then combined to a score that typically gives rise to a mixture distribution. From this we determine a decision threshold to separate potentially variant sites from the noisy background. Conclusions In simulations we show that our simple model is competitive with frequently used much more complex SNV calling algorithms in terms of sensitivity and specificity. It performs specifically well in cases with low allele frequencies. The application to next-generation sequencing data reveals stark differences of the score distributions indicating a strong influence of data specific sources of noise. The proposed model is specifically designed to adjust to these differences.
Collapse
|
290
|
Kimura K, Koike A. Ultrafast SNP analysis using the Burrows-Wheeler transform of short-read data. ACTA ACUST UNITED AC 2015; 31:1577-83. [PMID: 25609790 DOI: 10.1093/bioinformatics/btv024] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 01/12/2015] [Indexed: 12/30/2022]
Abstract
MOTIVATION Sequence-variation analysis is conventionally performed on mapping results that are highly redundant and occasionally contain undesirable heuristic biases. A straightforward approach to single-nucleotide polymorphism (SNP) analysis, using the Burrows-Wheeler transform (BWT) of short-read data, is proposed. RESULTS The BWT makes it possible to simultaneously process collections of read fragments of the same sequences; accordingly, SNPs were found from the BWT much faster than from the mapping results. It took only a few minutes to find SNPs from the BWT (with a supplementary data, fragment depth of coverage [FDC]) using a desktop workstation in the case of human exome or transcriptome sequencing data and 20 min using a dual-CPU server in the case of human genome sequencing data. The SNPs found with the proposed method almost agreed with those found by a time-consuming state-of-the-art tool, except for the cases in which the use of fragments of reads led to sensitivity loss or sequencing depth was not sufficient. These exceptions were predictable in advance on the basis of minimum length for uniqueness (MLU) and FDC defined on the reference genome. Moreover, BWT and FDC were computed in less time than it took to get the mapping results, provided that the data were large enough.
Collapse
Affiliation(s)
- Kouichi Kimura
- Biosystems Research Department, Central Research Laboratory, Hitachi, Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185-8601, Japan
| | - Asako Koike
- Biosystems Research Department, Central Research Laboratory, Hitachi, Ltd., 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185-8601, Japan
| |
Collapse
|
291
|
Benjak A, Sala C, Hartkoorn RC. Whole-genome sequencing for comparative genomics and de novo genome assembly. Methods Mol Biol 2015; 1285:1-16. [PMID: 25779307 DOI: 10.1007/978-1-4939-2450-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
Collapse
Affiliation(s)
- Andrej Benjak
- École polytechnique fédérale de Lausanne (EPFL), Global Health Institute, Lausanne, CH-1015, Switzerland
| | | | | |
Collapse
|
292
|
B S, Dharshini AP, Kumar GR. NGS meta data analysis for identification of SNP and INDEL patterns in human airway transcriptome: A preliminary indicator for lung cancer. Appl Transl Genom 2014; 4:4-9. [PMID: 26937342 PMCID: PMC4745382 DOI: 10.1016/j.atg.2014.12.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Revised: 12/05/2014] [Accepted: 12/08/2014] [Indexed: 12/22/2022]
Abstract
High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. It is also an efficient way to discover coding SNPs and when multiple individuals with different genetic backgrounds were used, RNA-Seq is very effective for the identification of SNPs. The objective of this study was to perform SNP and INDEL discoveries in human airway transcriptome of healthy never smokers, healthy current smokers, smokers without lung cancer and smokers with lung cancer. By preliminary comparative analysis of these four data sets, it is expected to get SNP and INDEL patterns responsible for lung cancer. A total of 85,028 SNPs and 5738 INDELs in healthy never smokers, 32,671 SNPs and 1561 INDELs in healthy current smokers, 50,205 SNPs and 3008 INDELs in smokers without lung cancer and 51,299 SNPs and 3138 INDELs in smokers with lung cancer were identified. The analysis of the SNPs and INDELs in genes that were reported earlier as differentially expressed was also performed. It has been found that a smoking person has SNPs at position 62,186,542 and 62,190,293 in SCGB1A1 gene and 180,017,251, 180,017,252, and 180,017,597 in SCGB3A1 gene and INDELs at position 35,871,168 in NFKBIA gene and 180,017,797 in SCGB3A1 gene. The SNPs identified in this study provides a resource for genetic studies in smokers and shall contribute to the development of a personalized medicine. This study is only a preliminary kind and more vigorous data analysis and wet lab validation are required.
Collapse
Affiliation(s)
- Sathya B
- Department of Bioinformatics, School of Bio Engineering, SRM University, Chennai 603203, India
| | - Akila Parvathy Dharshini
- Department of Bioinformatics, AU KBC Research Centre, Anna University, MIT Campus, Chennai 600044, India
| | - Gopal Ramesh Kumar
- Department of Bioinformatics, AU KBC Research Centre, Anna University, MIT Campus, Chennai 600044, India
| |
Collapse
|
293
|
Prevalence of mutations in a panel of breast cancer susceptibility genes in BRCA1/2-negative patients with early-onset breast cancer. Genet Med 2014; 17:630-8. [PMID: 25503501 PMCID: PMC4465412 DOI: 10.1038/gim.2014.176] [Citation(s) in RCA: 114] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Accepted: 11/03/2014] [Indexed: 01/06/2023] Open
Abstract
Purpose Clinical testing for germline variation in multiple cancer susceptibility genes is available using massively parallel sequencing. Limited information is available for pre-test genetic counseling regarding the spectrum of mutations and variants of uncertain significance (VUSs) in defined patient populations. Methods We performed massively parallel sequencing using targeted capture of 22 cancer susceptibility genes in 278 BRCA1/2 negative patients with early onset breast cancer (diagnosed under age 40). Results Thirty-one patients (11%) were found to have at least one deleterious or likely deleterious variant. Seven patients (2.5% overall) were found to have deleterious or likely deleterious variants in genes for which clinical guidelines exist for management, namely TP53 (4), CDKN2A (1) MSH2 (1), and MUTYH (double heterozygote). Twenty-four patients (8.6%) had deleterious or likely deleterious variants in a cancer susceptibility gene for which clinical guidelines are lacking, such as CHEK2 and ATM. Fifty-four patients (19%) had at least one VUS, and six patients were heterozygous for a variant in MUTYH. Conclusion These data demonstrate that massively parallel sequencing identifies reportable variants in known cancer susceptibility genes in over 30% of patients with early onset breast cancer. However, only rare patients (2.5%) have definitively actionable mutations given current clinical management guidelines.
Collapse
|
294
|
Yang JF, Ding XF, Chen L, Mat WK, Xu MZ, Chen JF, Wang JM, Xu L, Poon WS, Kwong A, Leung GKK, Tan TC, Yu CH, Ke YB, Xu XY, Ke XY, Ma RC, Chan JC, Wan WQ, Zhang LW, Kumar Y, Tsang SY, Li S, Wang HY, Xue H. Copy number variation analysis based on AluScan sequences. J Clin Bioinforma 2014; 4:15. [PMID: 25558350 PMCID: PMC4273479 DOI: 10.1186/s13336-014-0015-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 11/12/2014] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND AluScan combines inter-Alu PCR using multiple Alu-based primers with opposite orientations and next-generation sequencing to capture a huge number of Alu-proximal genomic sequences for investigation. Its requirement of only sub-microgram quantities of DNA facilitates the examination of large numbers of samples. However, the special features of AluScan data rendered difficult the calling of copy number variation (CNV) directly using the calling algorithms designed for whole genome sequencing (WGS) or exome sequencing. RESULTS In this study, an AluScanCNV package has been assembled for efficient CNV calling from AluScan sequencing data employing a Geary-Hinkley transformation (GHT) of read-depth ratios between either paired test-control samples, or between test samples and a reference template constructed from reference samples, to call the localized CNVs, followed by use of a GISTIC-like algorithm to identify recurrent CNVs and circular binary segmentation (CBS) to reveal large extended CNVs. To evaluate the utility of CNVs called from AluScan data, the AluScans from 23 non-cancer and 38 cancer genomes were analyzed in this study. The glioma samples analyzed yielded the familiar extended copy-number losses on chromosomes 1p and 9. Also, the recurrent somatic CNVs identified from liver cancer samples were similar to those reported for liver cancer WGS with respect to a striking enrichment of copy-number gains in chromosomes 1q and 8q. When localized or recurrent CNV-features capable of distinguishing between liver and non-liver cancer samples were selected by correlation-based machine learning, a highly accurate separation of the liver and non-liver cancer classes was attained. CONCLUSIONS The results obtained from non-cancer and cancerous tissues indicated that the AluScanCNV package can be employed to call localized, recurrent and extended CNVs from AluScan sequences. Moreover, both the localized and recurrent CNVs identified by this method could be subjected to machine-learning selection to yield distinguishing CNV-features that were capable of separating between liver cancers and other types of cancers. Since the method is applicable to any human DNA sample with or without the availability of a paired control, it can also be employed to analyze the constitutional CNVs of individuals.
Collapse
Affiliation(s)
- Jian-Feng Yang
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Xiao-Fan Ding
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Lei Chen
- National Center for Liver Cancer Research and Eastern Hepatobiliary Surgery Hospital, 225 Changhai Road, Shanghai, 200438 China
| | - Wai-Kin Mat
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Michelle Zhi Xu
- Department of Oncology, Nanjing First Hospital, No. 68 Changle Road, Nanjing, 210006 China
| | - Jin-Fei Chen
- Department of Oncology, Nanjing First Hospital, No. 68 Changle Road, Nanjing, 210006 China
| | - Jian-Min Wang
- Department of Hematology, Changhai Hospital, Second Military Medical University, 174 Changhai Road, Shanghai, 200433 China
| | - Lin Xu
- Department of Thoracic Surgery, Jiangsu Key Laboratory of Molecular and Translational Cancer Research, Nanjing Medical University Affiliated Cancer Hospital, Cancer Institute of Jiangsu Province, Baiziting 42, Nanjing, 210009 China
| | - Wai-Sang Poon
- Division of Neurosurgery, Department of Surgery, Prince of Wales Hospital, Chinese University of Hong Kong, 30-32 Ngan Shing Street, Sha Tin, Hong Kong, China
| | - Ava Kwong
- Division of Neurosurgery, Department of Surgery, Li Ka Shing Faculty of Medicine, University of Hong Kong, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China
| | - Gilberto Ka-Kit Leung
- Division of Neurosurgery, Department of Surgery, Li Ka Shing Faculty of Medicine, University of Hong Kong, Queen Mary Hospital, 102 Pokfulam Road, Hong Kong, China
| | - Tze-Ching Tan
- Department of Neurosurgery, Queen Elizabeth Hospital, 30 Gascoigne Road, Kowloon, Hong Kong, China
| | - Chi-Hung Yu
- Department of Neurosurgery, Queen Elizabeth Hospital, 30 Gascoigne Road, Kowloon, Hong Kong, China
| | - Yue-Bin Ke
- Shenzhen Center for Disease Control and Prevention, No 8 Longyuan Road, Nanshan district, Shenzhen City, 518055 China
| | - Xin-Yun Xu
- Shenzhen Center for Disease Control and Prevention, No 8 Longyuan Road, Nanshan district, Shenzhen City, 518055 China
| | - Xiao-Yan Ke
- Nanjing Brain Hospital and Nanjing Institute of Neuropsychiatry, Nanjing Medical University, Nanjing, 210029 China
| | - Ronald Cw Ma
- Department of Medicine and Therapeutics, 9th floor, Clinical Sciences Building, The Prince of Wales Hospital, Shatin, Hong Kong
| | - Juliana Cn Chan
- Department of Medicine and Therapeutics, 9th floor, Clinical Sciences Building, The Prince of Wales Hospital, Shatin, Hong Kong
| | - Wei-Qing Wan
- Department of Neurosurgery, Beijing Tiantan Hospital, 6 Tiantan Xili, Dongcheng District, Capital Medical University, Beijing, 100050 China
| | - Li-Wei Zhang
- Department of Neurosurgery, Beijing Tiantan Hospital, 6 Tiantan Xili, Dongcheng District, Capital Medical University, Beijing, 100050 China
| | - Yogesh Kumar
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Shui-Ying Tsang
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| | - Shao Li
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST, Department of Automation, Tsinghua University, Beijing, 100084 China
| | - Hong-Yang Wang
- National Center for Liver Cancer Research and Eastern Hepatobiliary Surgery Hospital, 225 Changhai Road, Shanghai, 200438 China.,International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Hospital, 225 Changhai Road, Shanghai, 200438 China
| | - Hong Xue
- Division of Life Science and Applied Genomics Centre, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
295
|
Shyr C, Tarailo-Graovac M, Gottlieb M, Lee JJY, van Karnebeek C, Wasserman WW. FLAGS, frequently mutated genes in public exomes. BMC Med Genomics 2014; 7:64. [PMID: 25466818 PMCID: PMC4267152 DOI: 10.1186/s12920-014-0064-y] [Citation(s) in RCA: 100] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Accepted: 10/24/2014] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Dramatic improvements in DNA-sequencing technologies and computational analyses have led to wide use of whole exome sequencing (WES) to identify the genetic basis of Mendelian disorders. More than 180 novel rare-disease-causing genes with Mendelian inheritance patterns have been discovered through sequencing the exomes of just a few unrelated individuals or family members. As rare/novel genetic variants continue to be uncovered, there is a major challenge in distinguishing true pathogenic variants from rare benign mutations. METHODS We used publicly available exome cohorts, together with the dbSNP database, to derive a list of genes (n = 100) that most frequently exhibit rare (<1%) non-synonymous/splice-site variants in general populations. We termed these genes FLAGS for FrequentLy mutAted GeneS and analyzed their properties. RESULTS Analysis of FLAGS revealed that these genes have significantly longer protein coding sequences, a greater number of paralogs and display less evolutionarily selective pressure than expected. FLAGS are more frequently reported in PubMed clinical literature and more frequently associated with diseased phenotypes compared to the set of human protein-coding genes. We demonstrated an overlap between FLAGS and the rare-disease causing genes recently discovered through WES studies (n = 10) and the need for replication studies and rigorous statistical and biological analyses when associating FLAGS to rare disease. Finally, we showed how FLAGS are applied in disease-causing variant prioritization approach on exome data from a family affected by an unknown rare genetic disorder. CONCLUSIONS We showed that some genes are frequently affected by rare, likely functional variants in general population, and are frequently observed in WES studies analyzing diverse rare phenotypes. We found that the rate at which genes accumulate rare mutations is beneficial information for prioritizing candidates. We provided a ranking system based on the mutation accumulation rates for prioritizing exome-captured human genes, and propose that clinical reports associating any disease/phenotype to FLAGS be evaluated with extra caution.
Collapse
Affiliation(s)
- Casper Shyr
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada. .,Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | - Maja Tarailo-Graovac
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada.
| | - Michael Gottlieb
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada.
| | - Jessica J Y Lee
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Genome Science and Technology Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | - Clara van Karnebeek
- Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada. .,Division of Biochemical Diseases, BC Children's Hospital, Vancouver, BC, Canada. .,Department of Pediatrics, University of British Columbia, Vancouver, BC, Canada.
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, Vancouver, BC, Canada. .,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada. .,Treatable Intellectual Disability Endeavour in British Columbia, Vancouver, Canada.
| |
Collapse
|
296
|
Vo NS, Tran Q, Phan V. An integrated approach for SNP calling based on population of genomes. BMC Bioinformatics 2014. [PMCID: PMC4196081 DOI: 10.1186/1471-2105-15-s10-p30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
297
|
Olsen LR, Campos B, Barnkob MS, Winther O, Brusic V, Andersen MH. Bioinformatics for cancer immunotherapy target discovery. Cancer Immunol Immunother 2014; 63:1235-49. [PMID: 25344903 PMCID: PMC11029190 DOI: 10.1007/s00262-014-1627-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 10/08/2014] [Indexed: 12/13/2022]
Abstract
The mechanisms of immune response to cancer have been studied extensively and great effort has been invested into harnessing the therapeutic potential of the immune system. Immunotherapies have seen significant advances in the past 20 years, but the full potential of protective and therapeutic cancer immunotherapies has yet to be fulfilled. The insufficient efficacy of existing treatments can be attributed to a number of biological and technical issues. In this review, we detail the current limitations of immunotherapy target selection and design, and review computational methods to streamline therapy target discovery in a bioinformatics analysis pipeline. We describe specialized bioinformatics tools and databases for three main bottlenecks in immunotherapy target discovery: the cataloging of potentially antigenic proteins, the identification of potential HLA binders, and the selection epitopes and co-targets for single-epitope and multi-epitope strategies. We provide examples of application to the well-known tumor antigen HER2 and suggest bioinformatics methods to ameliorate therapy resistance and ensure efficient and lasting control of tumors.
Collapse
Affiliation(s)
- Lars Rønn Olsen
- Department of Biology, Bioinformatics Centre, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen, Denmark,
| | | | | | | | | | | |
Collapse
|
298
|
Abstract
BACKGROUND Next generation sequencing (NGS)-based assays continue to redefine the field of genetic testing. Owing to the complexity of the data, bioinformatics has become a necessary component in any laboratory implementing a clinical NGS test. CONTENT The computational components of an NGS-based work flow can be conceptualized as primary, secondary, and tertiary analytics. Each of these components addresses a necessary step in the transformation of raw data into clinically actionable knowledge. Understanding the basic concepts of these analysis steps is important in assessing and addressing the informatics needs of a molecular diagnostics laboratory. Equally critical is a familiarity with the regulatory requirements addressing the bioinformatics analyses. These and other topics are covered in this review article. SUMMARY Bioinformatics has become an important component in clinical laboratories generating, analyzing, maintaining, and interpreting data from molecular genetics testing. Given the rapid adoption of NGS-based clinical testing, service providers must develop informatics work flows that adhere to the rigor of clinical laboratory standards, yet are flexible to changes as the chemistry and software for analyzing sequencing data mature.
Collapse
Affiliation(s)
- Gavin R Oliver
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Steven N Hart
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Eric W Klee
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN.
| |
Collapse
|
299
|
Abbey DA, Funt J, Lurie-Weinberger MN, Thompson DA, Regev A, Myers CL, Berman J. YMAP: a pipeline for visualization of copy number variation and loss of heterozygosity in eukaryotic pathogens. Genome Med 2014; 6:100. [PMID: 25505934 PMCID: PMC4263066 DOI: 10.1186/s13073-014-0100-8] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 10/30/2014] [Indexed: 12/13/2022] Open
Abstract
The design of effective antimicrobial therapies for serious eukaryotic pathogens requires a clear understanding of their highly variable genomes. To facilitate analysis of copy number variations, single nucleotide polymorphisms and loss of heterozygosity events in these pathogens, we developed a pipeline for analyzing diverse genome-scale datasets from microarray, deep sequencing, and restriction site associated DNA sequence experiments for clinical and laboratory strains of Candida albicans, the most prevalent human fungal pathogen. The YMAP pipeline (http://lovelace.cs.umn.edu/Ymap/) automatically illustrates genome-wide information in a single intuitive figure and is readily modified for the analysis of other pathogens with small genomes.
Collapse
Affiliation(s)
- Darren A Abbey
- Department of Genetics, Cell Biology and Development, University of Minnesota, 6-160 Jackson Hall, Minneapolis, MN 55415 USA
| | - Jason Funt
- Broad Institute of MIT and Harvard University, 415 Main Street, Cambridge, MA 02142 USA
| | - Mor N Lurie-Weinberger
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, 418 Britannia Building, Ramat Aviv, 69978 Israel
| | - Dawn A Thompson
- Broad Institute of MIT and Harvard University, 415 Main Street, Cambridge, MA 02142 USA
| | - Aviv Regev
- Broad Institute of MIT and Harvard University, 415 Main Street, Cambridge, MA 02142 USA
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, 200 Union St SE, Minneapolis, MN 55455 USA
| | - Judith Berman
- Department of Genetics, Cell Biology and Development, University of Minnesota, 6-160 Jackson Hall, Minneapolis, MN 55415 USA ; Department of Molecular Microbiology and Biotechnology, Tel Aviv University, 418 Britannia Building, Ramat Aviv, 69978 Israel
| |
Collapse
|
300
|
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 2014; 9:e112963. [PMID: 25409509 PMCID: PMC4237348 DOI: 10.1371/journal.pone.0112963] [Citation(s) in RCA: 5826] [Impact Index Per Article: 529.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2014] [Accepted: 10/16/2014] [Indexed: 02/06/2023] Open
Abstract
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential for sequencing massive numbers of genomes calls for fully automated methods to produce high-quality assemblies and variant calls. We introduce Pilon, a fully automated, all-in-one tool for correcting draft assemblies and calling sequence variants of multiple sizes, including very large insertions and deletions. Pilon works with many types of sequence data, but is particularly strong when supplied with paired end data from two Illumina libraries with small e.g., 180 bp and large e.g., 3–5 Kb inserts. Pilon significantly improves draft genome assemblies by correcting bases, fixing mis-assemblies and filling gaps. For both haploid and diploid genomes, Pilon produces more contiguous genomes with fewer errors, enabling identification of more biologically relevant genes. Furthermore, Pilon identifies small variants with high accuracy as compared to state-of-the-art tools and is unique in its ability to accurately identify large sequence variants including duplications and resolve large insertions. Pilon is being used to improve the assemblies of thousands of new genomes and to identify variants from thousands of clinically relevant bacterial strains. Pilon is freely available as open source software.
Collapse
Affiliation(s)
- Bruce J. Walker
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (BJW); (AME)
| | - Thomas Abeel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- VIB Department of Plant Systems Biology, Ghent University, Ghent, Belgium
| | - Terrance Shea
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Margaret Priest
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Amr Abouelliel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sharadha Sakthikumar
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Christina A. Cuomo
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Qiandong Zeng
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Jennifer Wortman
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Sarah K. Young
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Ashlee M. Earl
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
- * E-mail: (BJW); (AME)
| |
Collapse
|