1
|
Nguyen HTL, Kohl E, Bade J, Eng SE, Tosevska A, Al Shihabi A, Tebon PJ, Hong JJ, Dry S, Boutros PC, Panossian A, Gosline SJC, Soragni A. A platform for rapid patient-derived cutaneous neurofibroma organoid establishment and screening. CELL REPORTS METHODS 2024; 4:100772. [PMID: 38744290 PMCID: PMC11133839 DOI: 10.1016/j.crmeth.2024.100772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 02/10/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Localized cutaneous neurofibromas (cNFs) are benign tumors that arise in the dermis of patients affected by neurofibromatosis type 1 syndrome. cNFs are benign lesions: they do not undergo malignant transformation or metastasize. Nevertheless, they can cover a significant proportion of the body, with some individuals developing hundreds to thousands of lesions. cNFs can cause pain, itching, and disfigurement resulting in substantial socio-emotional repercussions. Currently, surgery and laser desiccation are the sole treatment options but may result in scarring and potential regrowth from incomplete removal. To identify effective systemic therapies, we introduce an approach to establish and screen cNF organoids. We optimized conditions to support the ex vivo growth of genomically diverse cNFs. Patient-derived cNF organoids closely recapitulate cellular and molecular features of parental tumors as measured by immunohistopathology, methylation, RNA sequencing, and flow cytometry. Our cNF organoid platform enables rapid screening of hundreds of compounds in a patient- and tumor-specific manner.
Collapse
Affiliation(s)
- Huyen Thi Lam Nguyen
- Department of Orthopaedic Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Emily Kohl
- Department of Orthopaedic Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jessica Bade
- Pacific Northwest National Laboratories, Seattle, WA, USA
| | - Stefan E Eng
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA
| | - Anela Tosevska
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Ahmad Al Shihabi
- Department of Orthopaedic Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Pathology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Peyton J Tebon
- Department of Orthopaedic Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Jenny J Hong
- Division of Hematology-Oncology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Sarah Dry
- Department of Pathology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
| | - Paul C Boutros
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA; Department of Urology, University of California, Los Angeles, Los Angeles, CA, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Sara J C Gosline
- Pacific Northwest National Laboratories, Seattle, WA, USA; Department of Biomedical Engineering, Oregon Health and Sciences University, Portland, OR, USA.
| | - Alice Soragni
- Department of Orthopaedic Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA; Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Kalleberg J, Rissman J, Schnabel RD. Overcoming Limitations to Deep Learning in Domesticated Animals with TrioTrain. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.15.589602. [PMID: 38659907 PMCID: PMC11042298 DOI: 10.1101/2024.04.15.589602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Variant calling across diverse species remains challenging as most bioinformatics tools default to assumptions based on human genomes. DeepVariant (DV) excels without joint genotyping while offering fewer implementation barriers. However, the growing appeal of a "universal" algorithm has magnified the unknown impacts when used with non-human genomes. Here, we use bovine genomes to assess the limits of human-genome-trained models in other species. We introduce the first multi-species DV model that achieves a lower Mendelian Inheritance Error (MIE) rate during single-sample genotyping. Our novel approach, TrioTrain, automates extending DV for species without Genome In A Bottle (GIAB) resources and uses region shuffling to mitigate barriers for SLURM-based clusters. To offset imperfect truth labels for animal genomes, we remove Mendelian discordant variants before training, where models are tuned to genotype the offspring correctly. With TrioTrain, we use cattle, yak, and bison trios to build 30 model iterations across five phases. We observe remarkable performance across phases when testing the GIAB human trios with a mean SNP F1 score >0.990. In HG002, our phase 4 bovine model identifies more variants at a lower MIE rate than DeepTrio. In bovine F1-hybrid genomes, our model substantially reduces inheritance errors with a mean MIE rate of 0.03 percent. Although constrained by imperfect labels, we find that multi-species, trio-based training produces a robust variant calling model. Our research demonstrates that exclusively training with human genomes restricts the application of deep-learning approaches for comparative genomics.
Collapse
Affiliation(s)
- Jenna Kalleberg
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Jacob Rissman
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
| | - Robert D Schnabel
- University of Missouri, Division of Animal Sciences, Columbia, MO, 65201 USA
- University of Missouri, Genetics Area Program, Columbia, MO, 65201 USA
| |
Collapse
|
3
|
Höjer P, Frick T, Siga H, Pourbozorgi P, Aghelpasand H, Martin M, Ahmadian A. BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies. Nucleic Acids Res 2023; 51:e114. [PMID: 37941142 PMCID: PMC10711428 DOI: 10.1093/nar/gkad1010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 10/04/2023] [Accepted: 10/18/2023] [Indexed: 11/10/2023] Open
Abstract
Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.
Collapse
Affiliation(s)
- Pontus Höjer
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Tobias Frick
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Humam Siga
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Parham Pourbozorgi
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Hooman Aghelpasand
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Marcel Martin
- Stockholm University, Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, SE-171 65, Solna, Sweden
| | - Afshin Ahmadian
- Royal Institute of Technology (KTH), School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Gene Technology, Science for Life Laboratory, SE-171 65, Solna, Sweden
| |
Collapse
|
4
|
Godazandeh K, Van Olmen L, Van Oudenhove L, Lefever S, Bogaert C, Fant B. Methods behind neoantigen prediction for personalized anticancer vaccines. Methods Cell Biol 2023; 183:161-186. [PMID: 38548411 DOI: 10.1016/bs.mcb.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024]
Abstract
Next to conventional cancer therapies, immunotherapies such as immune checkpoint inhibitors have broadened the cancer treatment landscape over the past decades. Recent advances in next generation sequencing and bioinformatics technologies have made it possible to identify a patient's own immunogenic neoantigens. These cancer neoantigens serve as important targets for personalized immunotherapy which has the benefit of being more active and effective in targeting cancer cells. This paper is a step-by-step guide discussing the different analyses and challenges encountered during in-silico neoantigen prediction. The protocol describes all the tools and steps required for the identification of immunogenic neoantigens.
Collapse
|
5
|
Prodanov T, Bansal V. A multilocus approach for accurate variant calling in low-copy repeats using whole-genome sequencing. Bioinformatics 2023; 39:i279-i287. [PMID: 37387146 PMCID: PMC10311303 DOI: 10.1093/bioinformatics/btad268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION Low-copy repeats (LCRs) or segmental duplications are long segments of duplicated DNA that cover > 5% of the human genome. Existing tools for variant calling using short reads exhibit low accuracy in LCRs due to ambiguity in read mapping and extensive copy number variation. Variants in more than 150 genes overlapping LCRs are associated with risk for human diseases. METHODS We describe a short-read variant calling method, ParascopyVC, that performs variant calling jointly across all repeat copies and utilizes reads independent of mapping quality in LCRs. To identify candidate variants, ParascopyVC aggregates reads mapped to different repeat copies and performs polyploid variant calling. Subsequently, paralogous sequence variants that can differentiate repeat copies are identified using population data and used for estimating the genotype of variants for each repeat copy. RESULTS On simulated whole-genome sequence data, ParascopyVC achieved higher precision (0.997) and recall (0.807) than three state-of-the-art variant callers (best precision = 0.956 for DeepVariant and best recall = 0.738 for GATK) in 167 LCR regions. Benchmarking of ParascopyVC using the genome-in-a-bottle high-confidence variant calls for HG002 genome showed that it achieved a very high precision of 0.991 and a high recall of 0.909 across LCR regions, significantly better than FreeBayes (precision = 0.954 and recall = 0.822), GATK (precision = 0.888 and recall = 0.873) and DeepVariant (precision = 0.983 and recall = 0.861). ParascopyVC demonstrated a consistently higher accuracy (mean F1 = 0.947) than other callers (best F1 = 0.908) across seven human genomes. AVAILABILITY AND IMPLEMENTATION ParascopyVC is implemented in Python and is freely available at https://github.com/tprodanov/ParascopyVC.
Collapse
Affiliation(s)
- Timofey Prodanov
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA 92093, United States
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Center for Digital Medicine, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Vikas Bansal
- School of Medicine, University of California San Diego, La Jolla, CA 92093, United States
| |
Collapse
|
6
|
McConnell SC, Hernandez KM, Andrade J, de Jong JLO. Immune gene variation associated with chromosome-scale differences among individual zebrafish genomes. Sci Rep 2023; 13:7777. [PMID: 37179373 PMCID: PMC10183018 DOI: 10.1038/s41598-023-34467-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 04/30/2023] [Indexed: 05/15/2023] Open
Abstract
Immune genes have evolved to maintain exceptional diversity, offering robust defense against pathogens. We performed genomic assembly to examine immune gene variation in zebrafish. Gene pathway analysis identified immune genes as significantly enriched among genes with evidence of positive selection. A large subset of genes was absent from analysis of coding sequences due to apparent lack of reads, prompting us to examine genes overlapping zero coverage regions (ZCRs), defined as 2 kb stretches without mapped reads. Immune genes were identified as highly enriched within ZCRs, including over 60% of major histocompatibility complex (MHC) genes and NOD-like receptor (NLR) genes, mediators of direct and indirect pathogen recognition. This variation was most highly concentrated throughout one arm of chromosome 4 carrying a large cluster of NLR genes, associated with large-scale structural variation covering more than half of the chromosome. Our genomic assemblies uncovered alternative haplotypes and distinct complements of immune genes among individual zebrafish, including the MHC Class II locus on chromosome 8 and the NLR gene cluster on chromosome 4. While previous studies have shown marked variation in NLR genes between vertebrate species, our study highlights extensive variation in NLR gene regions between individuals of the same species. Taken together, these findings provide evidence of immune gene variation on a scale previously unknown in other vertebrate species and raise questions about potential impact on immune function.
Collapse
Affiliation(s)
- Sean C McConnell
- Section of Hematology-Oncology and Stem Cell Transplant, Department of Pediatrics, The University of Chicago, Chicago, IL, 60637, USA
| | - Kyle M Hernandez
- Center for Research Informatics, The University of Chicago, Chicago, IL, 60637, USA
- Department of Medicine, Computational Biomedicine and Biomedical Data Science, Center for Translational Data Science, The University of Chicago, Chicago, IL, 60637, USA
| | - Jorge Andrade
- Center for Research Informatics, The University of Chicago, Chicago, IL, 60637, USA
- Kite Pharma, Santa Monica, CA, 90404, USA
| | - Jill L O de Jong
- Section of Hematology-Oncology and Stem Cell Transplant, Department of Pediatrics, The University of Chicago, Chicago, IL, 60637, USA.
| |
Collapse
|
7
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
8
|
Ding Y, Owen M, Le J, Batalov S, Chau K, Kwon YH, Van Der Kraan L, Bezares-Orin Z, Zhu Z, Veeraraghavan N, Nahas S, Bainbridge M, Gleeson J, Baer RJ, Bandoli G, Chambers C, Kingsmore SF. Scalable, high quality, whole genome sequencing from archived, newborn, dried blood spots. NPJ Genom Med 2023; 8:5. [PMID: 36788231 PMCID: PMC9929090 DOI: 10.1038/s41525-023-00349-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Accepted: 01/05/2023] [Indexed: 02/16/2023] Open
Abstract
Universal newborn screening (NBS) is a highly successful public health intervention. Archived dried bloodspots (DBS) collected for NBS represent a rich resource for population genomic studies. To fully harness this resource in such studies, DBS must yield high-quality genomic DNA (gDNA) for whole genome sequencing (WGS). In this pilot study, we hypothesized that gDNA of sufficient quality and quantity for WGS could be extracted from archived DBS up to 20 years old without PCR (Polymerase Chain Reaction) amplification. We describe simple methods for gDNA extraction and WGS library preparation from several types of DBS. We tested these methods in DBS from 25 individuals who had previously undergone diagnostic, clinical WGS and 29 randomly selected DBS cards collected for NBS from the California State Biobank. While gDNA from DBS had significantly less yield than from EDTA blood from the same individuals, it was of sufficient quality and quantity for WGS without PCR. All samples DBS yielded WGS that met quality control metrics for high-confidence variant calling. Twenty-eight variants of various types that had been reported clinically in 19 samples were recapitulated in WGS from DBS. There were no significant effects of age or paper type on WGS quality. Archived DBS appear to be a suitable sample type for WGS in population genomic studies.
Collapse
Affiliation(s)
- Yan Ding
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Mallory Owen
- Rady Children's Institute for Genomic Medicine, Rady Children's Hospital, San Diego, CA, 92123, USA.
| | - Jennie Le
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Sergey Batalov
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Kevin Chau
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Yong Hyun Kwon
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Lucita Van Der Kraan
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Zaira Bezares-Orin
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Zhanyang Zhu
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Narayanan Veeraraghavan
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Shareef Nahas
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Matthew Bainbridge
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA
| | - Joe Gleeson
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA ,grid.266100.30000 0001 2107 4242Department of Pediatrics, University of California San Diego, La Jolla, CA 92093 USA
| | - Rebecca J. Baer
- grid.266100.30000 0001 2107 4242Department of Pediatrics, University of California San Diego, La Jolla, CA 92093 USA ,grid.266102.10000 0001 2297 6811California Preterm Birth Initiative, University of California San Francisco, San Francisco, CA USA
| | - Gretchen Bandoli
- grid.266100.30000 0001 2107 4242Department of Pediatrics, University of California San Diego, La Jolla, CA 92093 USA
| | - Christina Chambers
- grid.266100.30000 0001 2107 4242Department of Pediatrics, University of California San Diego, La Jolla, CA 92093 USA
| | - Stephen F. Kingsmore
- grid.286440.c0000 0004 0383 2910Rady Children’s Institute for Genomic Medicine, Rady Children’s Hospital, San Diego, CA 92123 USA ,grid.419735.d0000 0004 0615 8415Keck Graduate Institute, Claremont, CA 91711 USA
| |
Collapse
|
9
|
Prodanov T, Bansal V. Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing. Nat Commun 2022; 13:3221. [PMID: 35680869 PMCID: PMC9184528 DOI: 10.1038/s41467-022-30930-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/20/2022] [Indexed: 11/09/2022] Open
Abstract
The human genome contains hundreds of low-copy repeats (LCRs) that are challenging to analyze using short-read sequencing technologies due to extensive copy number variation and ambiguity in read mapping. Copy number and sequence variants in more than 150 duplicated genes that overlap LCRs have been implicated in monogenic and complex human diseases. We describe a computational tool, Parascopy, for estimating the aggregate and paralog-specific copy number of duplicated genes using whole-genome sequencing (WGS). Parascopy is an efficient method that jointly analyzes reads mapped to different repeat copies without the need for global realignment. It leverages multiple samples to mitigate sequencing bias and to identify reliable paralogous sequence variants (PSVs) that differentiate repeat copies. Analysis of WGS data for 2504 individuals from diverse populations showed that Parascopy is robust to sequencing bias, has higher accuracy compared to existing methods and enables prioritization of pathogenic copy number changes in duplicated genes.
Collapse
Affiliation(s)
- Timofey Prodanov
- Bioinformatics and Systems Biology Graduate Program, University of California, La Jolla, San Diego, CA, 92093, USA
| | - Vikas Bansal
- Department of Pediatrics, School of Medicine, University of California, La Jolla, San Diego, CA, 92093, USA.
| |
Collapse
|
10
|
Yang H, Gu F, Zhang L, Hua XS. Using generative adversarial networks for genome variant calling from low depth ONT sequencing data. Sci Rep 2022; 12:8725. [PMID: 35637238 PMCID: PMC9151722 DOI: 10.1038/s41598-022-12346-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 05/10/2022] [Indexed: 11/21/2022] Open
Abstract
Genome variant calling is a challenging yet critical task for subsequent studies. Existing methods almost rely on high depth DNA sequencing data. Performance on low depth data drops a lot. Using public Oxford Nanopore (ONT) data of human being from the Genome in a Bottle (GIAB) Consortium, we trained a generative adversarial network for low depth variant calling. Our method, noted as LDV-Caller, can project high depth sequencing information from low depth data. It achieves 94.25% F1 score on low depth data, while the F1 score of the state-of-the-art method on two times higher depth data is 94.49%. By doing so, the price of genome-wide sequencing examination can reduce deeply. In addition, we validated the trained LDV-Caller model on 157 public Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) samples. The mean sequencing depth of these samples is 2982. The LDV-Caller yields 92.77% F1 score using only 22x sequencing depth, which demonstrates our method has potential to analyze different species with only low depth sequencing data.
Collapse
|
11
|
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, Johanson E, Boja E, Maier EJ, Serang O, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Kyriakidis K, Malousi A, Shafin K, Pesout T, Jain M, Paten B, Chang PC, Kolesnikov A, Nattestad M, Baid G, Goel S, Yang H, Carroll A, Eveleigh R, Bourgey M, Bourque G, Li G, Ma C, Tang L, Du Y, Zhang S, Morata J, Tonda R, Parra G, Trotta JR, Brueffer C, Demirkaya-Budak S, Kabakci-Zorlu D, Turgut D, Kalay Ö, Budak G, Narcı K, Arslan E, Brown R, Johnson IJ, Dolgoborodov A, Semenyuk V, Jain A, Tetikol HS, Jain V, Ruehle M, Lajoie B, Roddey C, Catreux S, Mehio R, Ahsan MU, Liu Q, Wang K, Ebrahim Sahraeian SM, Fang LT, Mohiyuddin M, Hung C, Jain C, Feng H, Li Z, Chen L, Sedlazeck FJ, Zook JM. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. CELL GENOMICS 2022; 2:S2666-979X(22)00058-1. [PMID: 35720974 PMCID: PMC9205427 DOI: 10.1016/j.xgen.2022.100129] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 11/01/2021] [Accepted: 04/08/2022] [Indexed: 11/19/2022]
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
Collapse
Affiliation(s)
- Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | | | | | | | - Elaine Johanson
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Emily Boja
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Ezekiel J. Maier
- Booz Allen Hamilton, 8283 Greensboro Drive, Mclean, VA 22102, USA
| | - Omar Serang
- DNAnexus, Inc., 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
| | - Andigoni Malousi
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | | | - Maria Nattestad
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Gunjan Baid
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Sidharth Goel
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Howard Yang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Robert Eveleigh
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Mathieu Bourgey
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Guillaume Bourque
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Gen Li
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ChouXian Ma
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - LinQi Tang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - YuanPing Du
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ShaoWei Zhang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - Jordi Morata
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Raúl Tonda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Genís Parra
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jean-Rémi Trotta
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christian Brueffer
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | | | | | - Deniz Turgut
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Özem Kalay
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Gungor Budak
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Kübra Narcı
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | | | | | | | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA 95050, USA
| | | | | | - Chirag Jain
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| |
Collapse
|
12
|
Salatino A, Sookoian S, Pirola CJ. Computational Pipeline for Next-Generation Sequencing (NGS) Studies in Genetics of NASH. Methods Mol Biol 2022; 2455:203-222. [PMID: 35212996 DOI: 10.1007/978-1-0716-2128-8_16] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
High-throughput sequencing (HTS) technologies have contributed to expand current knowledge of the biology of complex diseases, including nonalcoholic fatty liver disease (NAFLD). Genome-wide association studies, whole exome sequencing, and sequencing of entire genes are used to identify variants and/or mutations that predispose to the disease pathogenesis. Here, we present a tutorial that may guide readers to manage high volume of genetics data in the context of Next-Generation Sequencing (NGS) studies.
Collapse
Affiliation(s)
- Adrian Salatino
- School of Medicine, Institute of Medical Research A Lanari, University of Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
- Department of Molecular Genetics and Biology of Complex Diseases, Institute of Medical Research (IDIM), National Scientific and Technical Research Council (CONICET)-University of Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina
| | - Silvia Sookoian
- School of Medicine, Institute of Medical Research A Lanari, University of Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
- Department of Clinical and Molecular Hepatology, Institute of Medical Research (IDIM), National Scientific and Technical Research Council (CONICET)-University of Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
| | - Carlos J Pirola
- School of Medicine, Institute of Medical Research A Lanari, University of Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
- Department of Molecular Genetics and Biology of Complex Diseases, Institute of Medical Research (IDIM), National Scientific and Technical Research Council (CONICET)-University of Buenos Aires, Ciudad Autónoma de Buenos Aires, Argentina.
| |
Collapse
|
13
|
Yan B, Wang D, Vaisvila R, Sun Z, Ettwiller L. Methyl-SNP-seq reveals dual readouts of methylome and variome at molecule resolution while enabling target enrichment. Genome Res 2022; 32:2079-2091. [PMID: 36332968 PMCID: PMC9808626 DOI: 10.1101/gr.277080.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Accepted: 10/31/2022] [Indexed: 11/06/2022]
Abstract
Covalent modifications of genomic DNA are crucial for most organisms to survive. Amplicon-based high-throughput sequencing technologies erase all DNA modifications to retain only sequence information for the four canonical nucleobases, necessitating specialized technologies for ascertaining epigenetic information. To also capture base modification information, we developed Methyl-SNP-seq, a technology that takes advantage of the complementarity of the double helix to extract the methylation and original sequence information from a single DNA molecule. More specifically, Methyl-SNP-seq uses bisulfite conversion of one of the strands to identify cytosine methylation while retaining the original four-bases sequence information on the other strand. As both strands are locked together to link the dual readouts on a single paired-end read, Methyl-SNP-seq allows detecting the methylation status of any DNA even without a reference genome. Because one of the strands retains the original four nucleotide composition, Methyl-SNP-seq can also be used in conjunction with standard sequence-specific probes for targeted enrichment and amplification. We show the usefulness of this technology in a broad spectrum of applications ranging from allele-specific methylation analysis in humans to identification of methyltransferase specificity in complex bacterial communities.
Collapse
Affiliation(s)
- Bo Yan
- New England Biolabs, Incorporated, Ipswich, Massachusetts 01938, USA
| | - Duan Wang
- SLC Management, Wellesley Hills, Massachusetts 02481, USA
| | | | - Zhiyi Sun
- New England Biolabs, Incorporated, Ipswich, Massachusetts 01938, USA
| | | |
Collapse
|
14
|
Shafin K, Pesout T, Chang PC, Nattestad M, Kolesnikov A, Goel S, Baid G, Kolmogorov M, Eizenga JM, Miga KH, Carnevali P, Jain M, Carroll A, Paten B. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat Methods 2021; 18:1322-1332. [PMID: 34725481 PMCID: PMC8571015 DOI: 10.1038/s41592-021-01299-w] [Citation(s) in RCA: 114] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 09/06/2021] [Indexed: 01/15/2023]
Abstract
Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).
Collapse
Affiliation(s)
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | | | | | | | | | | | | - Karen H Miga
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | - Miten Jain
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | | | | |
Collapse
|
15
|
Lindner M, Gawehns F, Te Molder S, Visser ME, van Oers K, Laine VN. Performance of methods to detect genetic variants from bisulphite sequencing data in a non-model species. Mol Ecol Resour 2021; 22:834-846. [PMID: 34435438 PMCID: PMC9290141 DOI: 10.1111/1755-0998.13493] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 08/10/2021] [Accepted: 08/20/2021] [Indexed: 12/17/2022]
Abstract
The profiling of epigenetic marks like DNA methylation has become a central aspect of studies in evolution and ecology. Bisulphite sequencing is commonly used for assessing genome‐wide DNA methylation at single nucleotide resolution but these data can also provide information on genetic variants like single nucleotide polymorphisms (SNPs). However, bisulphite conversion causes unmethylated cytosines to appear as thymines, complicating the alignment and subsequent SNP calling. Several tools have been developed to overcome this challenge, but there is no independent evaluation of such tools for non‐model species, which often lack genomic references. Here, we used whole‐genome bisulphite sequencing (WGBS) data from four female great tits (Parus major) to evaluate the performance of seven tools for SNP calling from bisulphite sequencing data. We used SNPs from whole‐genome resequencing data of the same samples as baseline SNPs to assess common performance metrics like sensitivity, precision, and the number of true positive, false positive, and false negative SNPs for the full range of variant and genotype quality values. We found clear differences between the tools in either optimizing precision (bis‐snp), sensitivity (biscuit), or a compromise between both (all other tools). Overall, the choice of SNP caller strongly depends on which performance parameter should be maximized and whether ascertainment bias should be minimized to optimize downstream analysis, highlighting the need for studies that assess such differences.
Collapse
Affiliation(s)
- Melanie Lindner
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Fleur Gawehns
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Sebastiaan Te Molder
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Marcel E Visser
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands.,Chronobiology Unit, Groningen Institute for Evolutionary Life Sciences (GELIFES), University of Groningen, Groningen, The Netherlands
| | - Kees van Oers
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands
| | - Veronika N Laine
- Department of Animal Ecology, Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands.,Finnish Museum of Natural History, University of Helsinki, Helsinki, Finland
| |
Collapse
|
16
|
Kovaka S, Fan Y, Ni B, Timp W, Schatz MC. Targeted nanopore sequencing by real-time mapping of raw electrical signal with UNCALLED. Nat Biotechnol 2021; 39:431-441. [PMID: 33257863 PMCID: PMC8567335 DOI: 10.1038/s41587-020-0731-9] [Citation(s) in RCA: 116] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 10/07/2020] [Indexed: 02/07/2023]
Abstract
Conventional targeted sequencing methods eliminate many of the benefits of nanopore sequencing, such as the ability to accurately detect structural variants or epigenetic modifications. The ReadUntil method allows nanopore devices to selectively eject reads from pores in real time, which could enable purely computational targeted sequencing. However, this requires rapid identification of on-target reads while most mapping methods require computationally intensive basecalling. We present UNCALLED ( https://github.com/skovaka/UNCALLED ), an open source mapper that rapidly matches streaming of nanopore current signals to a reference sequence. UNCALLED probabilistically considers k-mers that could be represented by the signal and then prunes the candidates based on the reference encoded within a Ferragina-Manzini index. We used UNCALLED to deplete sequencing of known bacterial genomes within a metagenomics community, enriching the remaining species 4.46-fold. UNCALLED also enriched 148 human genes associated with hereditary cancers to 29.6× coverage using one MinION flowcell, enabling accurate detection of single-nucleotide polymorphisms, insertions and deletions, structural variants and methylation in these genes.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
| | - Yunfan Fan
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| |
Collapse
|
17
|
Kim JE, Choi J, Sung CO, Hong YS, Kim SY, Lee H, Kim TW, Kim JI. High prevalence of TP53 loss and whole-genome doubling in early-onset colorectal cancer. Exp Mol Med 2021; 53:446-456. [PMID: 33753878 PMCID: PMC8080557 DOI: 10.1038/s12276-021-00583-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/10/2020] [Accepted: 12/22/2020] [Indexed: 02/01/2023] Open
Abstract
The global incidence of early-onset colorectal cancer (EO-CRC) is rapidly rising. However, the reason for this rise in incidence as well as the genomic characteristics of EO-CRC remain largely unknown. We performed whole-exome sequencing in 47 cases of EO-CRC and targeted deep sequencing in 833 cases of CRC. Mutational profiles of EO-CRC were compared with previously published large-scale studies. EO-CRC and The Cancer Genome Atlas (TCGA) data were further investigated according to copy number profiles and mutation timing. We classified colorectal cancer into three subgroups: the hypermutated group consisted of mutations in POLE and mismatch repair genes; the whole-genome doubling group had early functional loss of TP53 that led to whole-genome doubling and focal oncogene amplification; the genome-stable group had mutations in APC and KRAS, similar to conventional colon cancer. Among non-hypermutated samples, whole-genome doubling was more prevalent in early-onset than in late-onset disease (54% vs 38%, Fisher's exact P = 0.04). More than half of non-hypermutated EO-CRC cases involved early TP53 mutation and whole-genome doubling, which led to notable differences in mutation frequencies between age groups. Alternative carcinogenesis involving genomic instability via loss of TP53 may be related to the rise in EO-CRC.
Collapse
Affiliation(s)
- Jeong Eun Kim
- Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Jaeyong Choi
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Korea
| | - Chang-Ohk Sung
- Department of Pathology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
- Asan Center for Cancer Genome Discovery, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Yong Sang Hong
- Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Sun Young Kim
- Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea
| | - Hyunjung Lee
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Korea
| | - Tae Won Kim
- Department of Oncology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Korea.
| | - Jong-Il Kim
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Korea.
- Genomic Medicine Institute, Medical Research Center, Seoul National University, Seoul, Korea.
- Cancer Research Institute, Seoul National University College of Medicine, Seoul, Korea.
| |
Collapse
|
18
|
Nachmanson D, Steward J, Yao H, Officer A, Jeong E, O'Keefe TJ, Hasteh F, Jepsen K, Hirst GL, Esserman LJ, Borowsky AD, Harismendy O. Mutational profiling of micro-dissected pre-malignant lesions from archived specimens. BMC Med Genomics 2020; 13:173. [PMID: 33208147 PMCID: PMC7672910 DOI: 10.1186/s12920-020-00820-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Systematic cancer screening has led to the increased detection of pre-malignant lesions (PMLs). The absence of reliable prognostic markers has led mostly to over treatment resulting in potentially unnecessary stress, or insufficient treatment and avoidable progression. Importantly, most mutational profiling studies have relied on PML synchronous to invasive cancer, or performed in patients without outcome information, hence limiting their utility for biomarker discovery. The limitations in comprehensive mutational profiling of PMLs are in large part due to the significant technical and methodological challenges: most PML specimens are small, fixed in formalin and paraffin embedded (FFPE) and lack matching normal DNA. METHODS Using test DNA from a highly degraded FFPE specimen, multiple targeted sequencing approaches were evaluated, varying DNA input amount (3-200 ng), library preparation strategy (BE: Blunt-End, SS: Single-Strand, AT: A-Tailing) and target size (whole exome vs. cancer gene panel). Variants in high-input DNA from FFPE and mirrored frozen specimens were used for PML-specific variant calling training and testing, respectively. The resulting approach was applied to profile and compare multiple regions micro-dissected (mean area 5 mm2) from 3 breast ductal carcinoma in situ (DCIS). RESULTS Using low-input FFPE DNA, BE and SS libraries resulted in 4.9 and 3.7 increase over AT libraries in the fraction of whole exome covered at 20x (BE:87%, SS:63%, AT:17%). Compared to high-confidence somatic mutations from frozen specimens, PML-specific variant filtering increased recall (BE:85%, SS:80%, AT:75%) and precision (BE:93%, SS:91%, AT:84%) to levels expected from sampling variation. Copy number alterations were consistent across all tested approaches and only impacted by the design of the capture probe-set. Applied to DNA extracted from 9 micro-dissected regions (8 PML, 1 normal epithelium), the approach achieved comparable performance, illustrated the data adequacy to identify candidate driver events (GATA3 mutations, ERBB2 or FGFR1 gains, TP53 loss) and measure intra-lesion genetic heterogeneity. CONCLUSION Alternate experimental and analytical strategies increased the accuracy of DNA sequencing from archived micro-dissected PML regions, supporting the deeper molecular characterization of early cancer lesions and achieving a critical milestone in the development of biology-informed prognostic markers and precision chemo-prevention strategies.
Collapse
Affiliation(s)
- Daniela Nachmanson
- Bioinformatics and Systems Biology Graduate Program - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Joseph Steward
- Moores Cancer Center - UC San Diego Health - 3855 Health Sciences Dr., La Jolla, CA, 92093, USA
| | - Huazhen Yao
- Institute for Genomic Medicine - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Adam Officer
- Bioinformatics and Systems Biology Graduate Program - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA.,Division of Biomedical Informatics, Department of Medicine - UC San Diego School of Medicine, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Eliza Jeong
- Moores Cancer Center - UC San Diego Health - 3855 Health Sciences Dr., La Jolla, CA, 92093, USA
| | - Thomas J O'Keefe
- Division of Breast Surgery and The Comprehensive Breast Health Center - UC San Diego School of Medicine, 3855 Health Sciences Dr., La Jolla, CA, 92093, USA
| | - Farnaz Hasteh
- Department of Pathology - UC San Diego School of Medicine, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Kristen Jepsen
- Institute for Genomic Medicine - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Gillian L Hirst
- Helen Diller Family Comprehensive Cancer Center - UC San Francisco School of Medicine, 1450 3rd St, San Francisco, CA, 94158, USA
| | - Laura J Esserman
- Helen Diller Family Comprehensive Cancer Center - UC San Francisco School of Medicine, 1450 3rd St, San Francisco, CA, 94158, USA
| | - Alexander D Borowsky
- Department of Pathology and Laboratory Medicine - UC Davis Comprehensive Cancer Center, UC Davis School of Medicine, 2279 45th Street, Sacramento, CA, 95817, USA
| | - Olivier Harismendy
- Moores Cancer Center - UC San Diego Health - 3855 Health Sciences Dr., La Jolla, CA, 92093, USA. .,Division of Biomedical Informatics, Department of Medicine - UC San Diego School of Medicine, 9500 Gilman Dr., La Jolla, CA, 92093, USA.
| |
Collapse
|
19
|
Recurrent inversion toggling and great ape genome evolution. Nat Genet 2020; 52:849-858. [PMID: 32541924 PMCID: PMC7415573 DOI: 10.1038/s41588-020-0646-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Accepted: 05/15/2020] [Indexed: 01/14/2023]
Abstract
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Collapse
|
20
|
Luo R, Wong CL, Wong YS, Tang CI, Liu CM, Leung CM, Lam TW. Exploring the limit of using a deep neural network on pileup data for germline variant calling. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-020-0167-4] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
21
|
Mohanty AK, Vuzman D, Francioli L, Cassa C, Toth-Petroczy A, Sunyaev S. novoCaller: a Bayesian network approach for de novo variant calling from pedigree and population sequence data. Bioinformatics 2020; 35:1174-1180. [PMID: 30169785 PMCID: PMC6449753 DOI: 10.1093/bioinformatics/bty749] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 06/19/2018] [Accepted: 08/29/2018] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION De novo mutations (i.e. newly occurring mutations) are a pre-dominant cause of sporadic dominant monogenic diseases and play a significant role in the genetics of complex disorders. De novo mutation studies also inform population genetics models and shed light on the biology of DNA replication and repair. Despite the broad interest, there is room for improvement with regard to the accuracy of de novo mutation calling. RESULTS We designed novoCaller, a Bayesian variant calling algorithm that uses information from read-level data both in the pedigree and in unrelated samples. The method was extensively tested using large trio-sequencing studies, and it consistently achieved over 97% sensitivity. We applied the algorithm to 48 trio cases of suspected rare Mendelian disorders as part of the Brigham Genomic Medicine gene discovery initiative. Its application resulted in a significant reduction in the resources required for manual inspection and experimental validation of the calls. Three de novo variants were found in known genes associated with rare disorders, leading to rapid genetic diagnosis of the probands. Another 14 variants were found in genes that are likely to explain the phenotype, and could lead to novel disease-gene discovery. AVAILABILITY AND IMPLEMENTATION Source code implemented in C++ and Python can be downloaded from https://github.com/bgm-cwg/novoCaller. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anwoy Kumar Mohanty
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Dana Vuzman
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Laurent Francioli
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Christopher Cassa
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | | | | | | | - Agnes Toth-Petroczy
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Shamil Sunyaev
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
22
|
Edge P, Bansal V. Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing. Nat Commun 2019; 10:4660. [PMID: 31604920 PMCID: PMC6788989 DOI: 10.1038/s41467-019-12493-y] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Accepted: 09/10/2019] [Indexed: 12/30/2022] Open
Abstract
Whole-genome sequencing using sequencing technologies such as Illumina enables the accurate detection of small-scale variants but provides limited information about haplotypes and variants in repetitive regions of the human genome. Single-molecule sequencing (SMS) technologies such as Pacific Biosciences and Oxford Nanopore generate long reads that can potentially address the limitations of short-read sequencing. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, which leverages the haplotype information present in SMS reads to accurately detect and phase single-nucleotide variants (SNVs) in diploid genomes. We demonstrate that Longshot achieves very high accuracy for SNV detection using whole-genome Pacific Biosciences data, outperforms existing variant calling methods, and enables variant detection in duplicated regions of the genome that cannot be mapped using short reads. Single-molecule sequencing (SMS) such as Pacific Biosciences and Oxford Nanopore generate long reads with high error rate. Here, the authors develop Longshot, a computational method that detects and phases single nucleotide variants (SNV) in diploid genomes using SMS data.
Collapse
Affiliation(s)
- Peter Edge
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, 92093, USA
| | - Vikas Bansal
- Department of Pediatrics, School of Medicine, University of California, San Diego, La Jolla, California, 92093, USA.
| |
Collapse
|
23
|
Statistical Binning for Barcoded Reads Improves Downstream Analyses. Cell Syst 2019; 7:219-226.e5. [PMID: 30138581 PMCID: PMC6214366 DOI: 10.1016/j.cels.2018.07.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2018] [Revised: 05/03/2018] [Accepted: 07/10/2018] [Indexed: 12/30/2022]
Abstract
Sequencing technologies are capturing longer-range genomic information at lower error rates, enabling alignment to genomic regions that are inaccessible with short reads. However, many methods are unable to align reads to much of the genome, recognized as important in disease, and thus report erroneous results in downstream analyses. We introduce EMA, a novel two-tiered statistical binning model for barcoded read alignment, that first probabilistically maps reads to potentially multiple "read clouds" and then within clouds by newly exploiting the non-uniform read densities characteristic of barcoded read sequencing. EMA substantially improves downstream accuracy over existing methods, including phasing and genotyping on 10x data, with fewer false variant calls in nearly half the time. EMA effectively resolves particularly challenging alignments in genomic regions that contain nearby homologous elements, uncovering variants in the pharmacogenomically important CYP2D region, and clinically important genes C4 (schizophrenia) and AMY1A (obesity), which go undetected by existing methods. Our work provides a framework for future generation sequencing.
Collapse
|
24
|
Chromosome Y-encoded antigens associate with acute graft-versus-host disease in sex-mismatched stem cell transplant. Blood Adv 2019; 2:2419-2429. [PMID: 30262602 DOI: 10.1182/bloodadvances.2018019513] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2018] [Accepted: 08/21/2018] [Indexed: 12/22/2022] Open
Abstract
Allogeneic hematopoietic stem cell transplantation (allo-HCT) is a curative option for blood cancers, but the coupled effects of graft-versus-tumor and graft-versus-host disease (GVHD) limit its broader application. Outcomes improve with matching at HLAs, but other factors are required to explain residual risk of GVHD. In an effort to identify genetic associations outside the major histocompatibility complex, we conducted a genome-wide clinical outcomes study on 205 acute myeloid leukemia patients and their fully HLA-A-, HLA-B-, HLA-C-, HLA-DRB1-, and HLA-DQB1-matched (10/10) unrelated donors. HLA-DPB1 T-cell epitope permissibility mismatches were observed in less than half (45%) of acute GVHD cases, motivating a broader search for genetic factors affecting clinical outcomes. A novel bioinformatics workflow adapted from neoantigen discovery found no associations between acute GVHD and known, HLA-restricted minor histocompatibility antigens (MiHAs). These results were confirmed with microarray data from an additional 988 samples. On the other hand, Y-chromosome-encoded single-nucleotide polymorphisms in 4 genes (PCDH11Y, USP9Y, UTY, and NLGN4Y) did associate with acute GVHD in male patients with female donors. Males in this category with acute GVHD had more Y-encoded variant peptides per patient with higher predicted HLA-binding affinity than males without GVHD who matched X-paralogous alleles in their female donors. Methods and results described here have an immediate impact for allo-HCT, warranting further development and larger genomic studies where MiHAs are clinically relevant, including cancer immunotherapy, solid organ transplant, and pregnancy.
Collapse
|
25
|
Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY, De La Vega FM, Xiao C, Sherry S, Salit M. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 2019; 37:561-566. [PMID: 30936564 PMCID: PMC6500473 DOI: 10.1038/s41587-019-0074-6] [Citation(s) in RCA: 187] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 02/19/2019] [Indexed: 12/30/2022]
Abstract
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.
Collapse
Affiliation(s)
- Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Hemang Parikh
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Haynes Heaton
- 10x Genomics, Pleasanton, CA, USA
- Wellcome Trust Sanger Institute,, Hinxton, Cambridge, UK
| | | | - Len Trigg
- Real Time Genomics, Hamilton, New Zealand
| | | | - Cory Y McLean
- Verily Life Sciences, South San Francisco, CA, USA
- Google Inc., Mountain View, CA, USA
| | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stephen Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Marc Salit
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
- Joint Initiative for Metrology in Biology, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
26
|
Iacoangeli A, Al Khleifat A, Sproviero W, Shatunov A, Jones AR, Morgan SL, Pittman A, Dobson RJ, Newhouse SJ, Al-Chalabi A. DNAscan: personal computer compatible NGS analysis, annotation and visualisation. BMC Bioinformatics 2019; 20:213. [PMID: 31029080 PMCID: PMC6487045 DOI: 10.1186/s12859-019-2791-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2018] [Accepted: 04/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Next Generation Sequencing (NGS) is a commonly used technology for studying the genetic basis of biological processes and it underpins the aspirations of precision medicine. However, there are significant challenges when dealing with NGS data. Firstly, a huge number of bioinformatics tools for a wide range of uses exist, therefore it is challenging to design an analysis pipeline. Secondly, NGS analysis is computationally intensive, requiring expensive infrastructure, and many medical and research centres do not have adequate high performance computing facilities and cloud computing is not always an option due to privacy and ownership issues. Finally, the interpretation of the results is not trivial and most available pipelines lack the utilities to favour this crucial step. RESULTS We have therefore developed a fast and efficient bioinformatics pipeline that allows for the analysis of DNA sequencing data, while requiring little computational effort and memory usage. DNAscan can analyse a whole exome sequencing sample in 1 h and a 40x whole genome sequencing sample in 13 h, on a midrange computer. The pipeline can look for single nucleotide variants, small indels, structural variants, repeat expansions and viral genetic material (or any other organism). Its results are annotated using a customisable variety of databases and are available for an on-the-fly visualisation with a local deployment of the gene.iobio platform. DNAscan is implemented in Python. Its code and documentation are available on GitHub: https://github.com/KHP-Informatics/DNAscan . Instructions for an easy and fast deployment with Docker and Singularity are also provided on GitHub. CONCLUSIONS DNAscan is an extremely fast and computationally efficient pipeline for analysis, visualization and interpretation of NGS data. It is designed to provide a powerful and easy-to-use tool for applications in biomedical research and diagnostic medicine, at minimal computational cost. Its comprehensive approach will maximise the potential audience of users, bringing such analyses within the reach of non-specialist laboratories, and those from centres with limited funding available.
Collapse
Affiliation(s)
- A Iacoangeli
- Department of Biostatistics and Health Informatics, King's College London, London, UK.
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK.
| | - A Al Khleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - W Sproviero
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - A Shatunov
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - A R Jones
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
| | - S L Morgan
- Department of Molecular Neuroscience, UCL, Institute of Neurology, London, UK
| | - A Pittman
- Department of Molecular Neuroscience, UCL, Institute of Neurology, London, UK
| | - R J Dobson
- Department of Biostatistics and Health Informatics, King's College London, London, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - S J Newhouse
- Department of Biostatistics and Health Informatics, King's College London, London, UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, UK
- National Institute for Health Research (NIHR) Biomedical Research Centre and Dementia Unit at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - A Al-Chalabi
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London, UK
- King's College Hospital, Bessemer Road, London, SE5 9RS, UK
| |
Collapse
|
27
|
Liang Y, He L, Zhao Y, Hao Y, Zhou Y, Li M, Li C, Pu X, Wen Z. Comparative Analysis for the Performance of Variant Calling Pipelines on Detecting the de novo Mutations in Humans. Front Pharmacol 2019; 10:358. [PMID: 31105557 PMCID: PMC6499170 DOI: 10.3389/fphar.2019.00358] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 03/21/2019] [Indexed: 01/22/2023] Open
Abstract
Despite of the low occurrence rate in the entire genomes, de novo mutation is proved to be deleterious and will lead to severe genetic diseases via impacting on the gene function. Considering the fact that the traditional family based linkage approaches and the genome-wide association studies are unsuitable for identifying the de novo mutations, in recent years, several pipelines have been proposed to detect them based on the whole-genome or whole-exome sequencing data and were used for calling them in the rare diseases. However, how the performance of these variant calling pipelines on detecting the de novo mutations is still unexplored. For the purpose of facilitating the appropriate choice of the pipelines and reducing the false positive rate, in this study, we thoroughly evaluated the performance of the commonly used trio calling methods on the detection of the de novo single-nucleotide variants (DNSNVs) by conducting a comparative analysis for the calling results. Our results exhibited that different pipelines have a specific tendency to detect the DNSNVs in the genomic regions with different GC contents. Additionally, to refine the calling results for a single pipeline, our proposed filter achieved satisfied results, indicating that the read coverage at the mutation positions can be used as an effective index to identify the high-confidence DNSNVs. Our findings should be good support for the committees to choose an appropriate way to explore the de novo mutations for the rare diseases.
Collapse
Affiliation(s)
- Yu Liang
- College of Chemistry, Sichuan University, Chengdu, China
| | - Li He
- Biogas Appliance Quality Supervision and Inspection Center, Biogas Institute of Ministry of Agriculture, Chengdu, China
| | - Yiru Zhao
- College of Computer Science, Sichuan University, Chengdu, China
| | - Yinyi Hao
- College of Chemistry, Sichuan University, Chengdu, China
| | - Yifan Zhou
- College of Chemistry, Sichuan University, Chengdu, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, China
| | - Chuan Li
- College of Computer Science, Sichuan University, Chengdu, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, China
| |
Collapse
|
28
|
Luo R, Sedlazeck FJ, Lam TW, Schatz MC. A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nat Commun 2019; 10:998. [PMID: 30824707 PMCID: PMC6397153 DOI: 10.1038/s41467-019-09025-z] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2018] [Accepted: 02/15/2019] [Indexed: 12/22/2022] Open
Abstract
The accurate identification of DNA sequence variants is an important, but challenging task in genomics. It is particularly difficult for single molecule sequencing, which has a per-nucleotide error rate of ~5-15%. Meeting this demand, we developed Clairvoyante, a multi-task five-layer convolutional neural network model for predicting variant type (SNP or indel), zygosity, alternative allele and indel length from aligned reads. For the well-characterized NA12878 human sample, Clairvoyante achieves 99.67, 95.78, 90.53% F1-score on 1KP common variants, and 98.65, 92.57, 87.26% F1-score for whole-genome analysis, using Illumina, PacBio, and Oxford Nanopore data, respectively. Training on a second human sample shows Clairvoyante is sample agnostic and finds variants in less than 2 h on a standard server. Furthermore, we present 3,135 variants that are missed using Illumina but supported independently by both PacBio and Oxford Nanopore reads. Clairvoyante is available open-source ( https://github.com/aquaskyline/Clairvoyante ), with modules to train, utilize and visualize the model.
Collapse
Affiliation(s)
- Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China.
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA.
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, 77030, TX, USA
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong, Hong Kong, 999077, China
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, 21218, MD, USA
| |
Collapse
|
29
|
The ketogenic diet influences taxonomic and functional composition of the gut microbiota in children with severe epilepsy. NPJ Biofilms Microbiomes 2019; 5:5. [PMID: 30701077 PMCID: PMC6344533 DOI: 10.1038/s41522-018-0073-2] [Citation(s) in RCA: 147] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2018] [Accepted: 12/11/2018] [Indexed: 02/06/2023] Open
Abstract
The gut microbiota has been linked to various neurological disorders via the gut–brain axis. Diet influences the composition of the gut microbiota. The ketogenic diet (KD) is a high-fat, adequate-protein, low-carbohydrate diet established for treatment of therapy-resistant epilepsy in children. Its efficacy in reducing seizures has been confirmed, but the mechanisms remain elusive. The diet has also shown positive effects in a wide range of other diseases, including Alzheimer’s, depression, autism, cancer, and type 2 diabetes. We collected fecal samples from 12 children with therapy-resistant epilepsy before starting KD and after 3 months on the diet. Parents did not start KD and served as diet controls. Applying shotgun metagenomic DNA sequencing, both taxonomic and functional profiles were established. Here we report that alpha diversity is not changed significantly during the diet, but differences in both taxonomic and functional composition are detected. Relative abundance of bifidobacteria as well as E. rectale and Dialister is significantly diminished during the intervention. An increase in relative abundance of E. coli is observed on KD. Functional analysis revealed changes in 29 SEED subsystems including the reduction of seven pathways involved in carbohydrate metabolism. Decomposition of these shifts indicates that bifidobacteria and Escherichia are important contributors to the observed functional shifts. As relative abundance of health-promoting, fiber-consuming bacteria becomes less abundant during KD, we raise concern about the effects of the diet on the gut microbiota and overall health. Further studies need to investigate whether these changes are necessary for the therapeutic effect of KD. The ketogenic diet changes both the relative abundance of gut microbiota and their metabolic activities. The diet forces a shift from carbohydrates to ketones as a primary energy source and has demonstrated efficacy in reducing epileptic seizures in children. After animal models implicated gut microbiota in this amelioration, Stefanie Prast-Nielsen, of Sweden’s Karolinska Institutet, and her team sequenced microbiotic DNA of fecal samples from 12 children with epilepsy before and after 3 months on a ketogenic diet. Changes included reductions in the numbers of Bifidobacterium and an increase in Escherichia coli. Carbohydrate metabolism significantly changed after 3 months on the diet. Some reductions raise questions about the diet’s potential impact on gut and overall health. More studies are also needed to discern the mechanistic impact of these changes on seizure activity.
Collapse
|
30
|
Cornejo OE, Yee MC, Dominguez V, Andrews M, Sockell A, Strandberg E, Livingstone D, Stack C, Romero A, Umaharan P, Royaert S, Tawari NR, Ng P, Gutierrez O, Phillips W, Mockaitis K, Bustamante CD, Motamayor JC. Population genomic analyses of the chocolate tree, Theobroma cacao L., provide insights into its domestication process. Commun Biol 2018; 1:167. [PMID: 30345393 PMCID: PMC6191438 DOI: 10.1038/s42003-018-0168-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 09/14/2018] [Indexed: 01/24/2023] Open
Abstract
Domestication has had a strong impact on the development of modern societies. We sequenced 200 genomes of the chocolate plant Theobroma cacao L. to show for the first time to our knowledge that a single population, the Criollo population, underwent strong domestication ~3600 years ago (95% CI: 2481-13,806 years ago). We also show that during the process of domestication, there was strong selection for genes involved in the metabolism of the colored protectants anthocyanins and the stimulant theobromine, as well as disease resistance genes. Our analyses show that domesticated populations of T. cacao (Criollo) maintain a higher proportion of high-frequency deleterious mutations. We also show for the first time the negative consequences of the increased accumulation of deleterious mutations during domestication on the fitness of individuals (significant reduction in kilograms of beans per hectare per year as Criollo ancestry increases, as estimated from a GLM, P = 0.000425).
Collapse
Affiliation(s)
- Omar E Cornejo
- School of Biological Sciences, Washington State University, PO Box 644236, Heald Hall 429B, Pullman, Washington, 99164, USA
- Department of Genetics, School of Medicine, Stanford University, 300 Pasteur Dr. Lane Bldg Room L331, Stanford, CA, 94305, USA
| | - Muh-Ching Yee
- Department of Genetics, School of Medicine, Stanford University, 300 Pasteur Dr. Lane Bldg Room L331, Stanford, CA, 94305, USA
- Stanford Functional Genomics Facility, Stanford, CA, 94305, USA
| | - Victor Dominguez
- Department of Biology, Indiana University, 915 E. Third St, Bloomington, IN, 47405, USA
| | - Mary Andrews
- Department of Biology, Indiana University, 915 E. Third St, Bloomington, IN, 47405, USA
| | - Alexandra Sockell
- Department of Genetics, School of Medicine, Stanford University, 300 Pasteur Dr. Lane Bldg Room L331, Stanford, CA, 94305, USA
| | - Erika Strandberg
- Department of Genetics, School of Medicine, Stanford University, 300 Pasteur Dr. Lane Bldg Room L331, Stanford, CA, 94305, USA
- Biomedical Informatics Training Program, 1265 Welch Road, MSOB, X-215, MC 5479, Stanford, CA, 94305-5479, USA
| | - Donald Livingstone
- Mars, Incorporated, 6885 Elm Street, McLean, VA, 22101, USA
- United States Department of Agriculture-Agriculture Research Service, Subtropical Horticulture Research Station, 13601 Old Cutler Rd, Miami, FL, 33158, USA
| | - Conrad Stack
- Mars, Incorporated, 6885 Elm Street, McLean, VA, 22101, USA
| | - Alberto Romero
- Mars, Incorporated, 6885 Elm Street, McLean, VA, 22101, USA
| | - Pathmanathan Umaharan
- Cocoa Research Centre, The University of the West Indies, St. Augustine, Trinidad and Tobago
| | - Stefan Royaert
- Mars, Incorporated, 6885 Elm Street, McLean, VA, 22101, USA
| | - Nilesh R Tawari
- Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, Genome, #02-01, Singapore, 138672, Singapore
| | - Pauline Ng
- Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, Genome, #02-01, Singapore, 138672, Singapore
| | - Osman Gutierrez
- SHRS, USDS-ARS, 13601 Old Cutler Road, Miami, FL, 33158, USA
| | - Wilbert Phillips
- Programa de Mejoramiento de Cacao, CATIE, 7170, Turrialba, Costa Rica
| | - Keithanne Mockaitis
- Department of Biology, Indiana University, 915 E. Third St, Bloomington, IN, 47405, USA
- Pervasive Technology Institute, Indiana University, 2709 E. 10th St., Bloomington, IN, 47408, USA
| | - Carlos D Bustamante
- Department of Genetics, School of Medicine, Stanford University, 300 Pasteur Dr. Lane Bldg Room L331, Stanford, CA, 94305, USA
| | | |
Collapse
|
31
|
Danecek P, McCarthy SA. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 2018; 33:2037-2039. [PMID: 28205675 PMCID: PMC5870570 DOI: 10.1093/bioinformatics/btx100] [Citation(s) in RCA: 208] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/14/2017] [Indexed: 02/06/2023] Open
Abstract
Motivation Prediction of functional variant consequences is an important part of sequencing pipelines, allowing the categorization and prioritization of genetic variants for follow up analysis. However, current predictors analyze variants as isolated events, which can lead to incorrect predictions when adjacent variants alter the same codon, or when a frame-shifting indel is followed by a frame-restoring indel. Exploiting known haplotype information when making consequence predictions can resolve these issues. Results BCFtools/csq is a fast program for haplotype-aware consequence calling which can take into account known phase. Consequence predictions are changed for 501 of 5019 compound variants found in the 81.7M variants in the 1000 Genomes Project data, with an average of 139 compound variants per haplotype. Predictions match existing tools when run in localized mode, but the program is an order of magnitude faster and requires an order of magnitude less memory. Availability and Implementation The program is freely available for commercial and non-commercial use in the BCFtools package which is available for download from http://samtools.github.io/bcftools. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Danecek
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Shane A McCarthy
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
32
|
Forbes TA, Howden SE, Lawlor K, Phipson B, Maksimovic J, Hale L, Wilson S, Quinlan C, Ho G, Holman K, Bennetts B, Crawford J, Trnka P, Oshlack A, Patel C, Mallett A, Simons C, Little MH. Patient-iPSC-Derived Kidney Organoids Show Functional Validation of a Ciliopathic Renal Phenotype and Reveal Underlying Pathogenetic Mechanisms. Am J Hum Genet 2018; 102:816-831. [PMID: 29706353 DOI: 10.1016/j.ajhg.2018.03.014] [Citation(s) in RCA: 136] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 03/05/2018] [Indexed: 02/07/2023] Open
Abstract
Despite the increasing diagnostic rate of genomic sequencing, the genetic basis of more than 50% of heritable kidney disease remains unresolved. Kidney organoids differentiated from induced pluripotent stem cells (iPSCs) of individuals affected by inherited renal disease represent a potential, but unvalidated, platform for the functional validation of novel gene variants and investigation of underlying pathogenetic mechanisms. In this study, trio whole-exome sequencing of a prospectively identified nephronophthisis (NPHP) proband and her parents identified compound-heterozygous variants in IFT140, a gene previously associated with NPHP-related ciliopathies. IFT140 plays a key role in retrograde intraflagellar transport, but the precise downstream cellular mechanisms responsible for disease presentation remain unknown. A one-step reprogramming and gene-editing protocol was used to derive both uncorrected proband iPSCs and isogenic gene-corrected iPSCs, which were differentiated to kidney organoids. Proband organoid tubules demonstrated shortened, club-shaped primary cilia, whereas gene correction rescued this phenotype. Differential expression analysis of epithelial cells isolated from organoids suggested downregulation of genes associated with apicobasal polarity, cell-cell junctions, and dynein motor assembly in proband epithelial cells. Matrigel cyst cultures confirmed a polarization defect in proband versus gene-corrected renal epithelium. As such, this study represents a "proof of concept" for using proband-derived iPSCs to model renal disease and illustrates dysfunctional cellular pathways beyond the primary cilium in the setting of IFT140 mutations, which are established for other NPHP genotypes.
Collapse
|
33
|
Pizzino A, Whitehead M, Sabet Rasekh P, Murphy J, Helman G, Bloom M, Evans SH, Murnick JG, Conry J, Taft RJ, Simons C, Vanderver A, Adang LA. Mutations in SZT2 result in early-onset epileptic encephalopathy and leukoencephalopathy. Am J Med Genet A 2018; 176:1443-1448. [PMID: 29696782 DOI: 10.1002/ajmg.a.38717] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2017] [Revised: 02/13/2018] [Accepted: 03/28/2018] [Indexed: 11/06/2022]
Abstract
Early-onset epileptic encephalopathies (EOEEs) are a genetically heterogeneous collection of severe epilepsies often associated with psychomotor regression. Mutations in SZT2, a known seizure threshold regulator gene, are a newly identified cause of EOEE. We present an individual with EOEE, macrocephaly, and developmental regression with compound heterozygous mutations in SZT2 as identified by whole exome sequencing. Serial imaging characterized the novel finding of progressive loss of central myelination. This case expands our clinical understanding of the SZT2-phenotype and emphasizes the role of this gene in the diagnostic investigation for EOEE and leukoencephalopathies.
Collapse
Affiliation(s)
- Amy Pizzino
- Department of Neurology, Children's National Medical Center, Washington, DC
| | - Matthew Whitehead
- Department of Neuroradiology, The George Washington University School of Medicine, Washington, DC.,Department of Diagnostic Imaging and Radiology, Children's National Medical Center, Washington, DC
| | | | - Jennifer Murphy
- Undiagnosed Disease Program, National Human Genome Research Institute (NHGRI), Bethesda, Maryland
| | - Guy Helman
- Institute for Molecular Bioscience, University of Queensland, St. Lucia, Queensland, Australia
| | - Miriam Bloom
- Department of Pediatrics, Children's National Medical Center, Washington, DC
| | - Sarah H Evans
- Department of Neurology, Children's National Medical Center, Washington, DC
| | - John G Murnick
- Department of Diagnostic Imaging and Radiology, Children's National Medical Center, Washington, DC
| | - Joan Conry
- Department of Neurology, Children's National Medical Center, Washington, DC
| | - Ryan J Taft
- Undiagnosed Disease Program, National Human Genome Research Institute (NHGRI), Bethesda, Maryland.,Illumina, Inc., San Diego, California
| | - Cas Simons
- Undiagnosed Disease Program, National Human Genome Research Institute (NHGRI), Bethesda, Maryland
| | - Adeline Vanderver
- Department of Neurology, Children's National Medical Center, Washington, DC.,Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC.,School of Medicine and Health Sciences, George Washington University, Washington, DC.,Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Laura A Adang
- Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania
| |
Collapse
|
34
|
Choi Y, Chan AP, Kirkness E, Telenti A, Schork NJ. Comparison of phasing strategies for whole human genomes. PLoS Genet 2018; 14:e1007308. [PMID: 29621242 PMCID: PMC5903673 DOI: 10.1371/journal.pgen.1007308] [Citation(s) in RCA: 81] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 04/17/2018] [Accepted: 03/13/2018] [Indexed: 12/17/2022] Open
Abstract
Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not 'phase' the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available 'Genome-In-A-Bottle' (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.
Collapse
Affiliation(s)
- Yongwook Choi
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Agnes P. Chan
- J. Craig Venter Institute, Rockville, Maryland, United States of America
| | - Ewen Kirkness
- Human Longevity, Inc., San Diego, California, United States of America
| | - Amalio Telenti
- J. Craig Venter Institute, La Jolla, California, United States of America
| | - Nicholas J. Schork
- J. Craig Venter Institute, La Jolla, California, United States of America
- University of California San Diego, La Jolla, California, United States of America
- The Translational Genomics Research Institute (TGen), Phoenix, Arizona, United States of America
| |
Collapse
|
35
|
Shringarpure SS, Mathias RA, Hernandez RD, O'Connor TD, Szpiech ZA, Torres R, De La Vega FM, Bustamante CD, Barnes KC, Taub MA. Using genotype array data to compare multi- and single-sample variant calls and improve variant call sets from deep coverage whole-genome sequencing data. Bioinformatics 2018; 33:1147-1153. [PMID: 28035032 PMCID: PMC5408850 DOI: 10.1093/bioinformatics/btw786] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 12/07/2016] [Indexed: 12/30/2022] Open
Abstract
Motivation Variant calling from next-generation sequencing (NGS) data is susceptible to false positive calls due to sequencing, mapping and other errors. To better distinguish true from false positive calls, we present a method that uses genotype array data from the sequenced samples, rather than public data such as HapMap or dbSNP, to train an accurate classifier using Random Forests. We demonstrate our method on a set of variant calls obtained from 642 African-ancestry genomes from the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), sequenced to high depth (30X). Results We have applied our classifier to compare call sets generated with different calling methods, including both single-sample and multi-sample callers. At a False Positive Rate of 5%, our method determines true positive rates of 97.5%, 95% and 99% on variant calls obtained using Illuminas single-sample caller CASAVA, Real Time Genomics multisample variant caller, and the GATK UnifiedGenotyper, respectively. Since NGS sequencing data may be accompanied by genotype data for the same samples, either collected concurrent to sequencing or from a previous study, our method can be trained on each dataset to provide a more accurate computational validation of site calls compared to generic methods. Moreover, our method allows for adjustment based on allele frequency (e.g. a different set of criteria to determine quality for rare versus common variants) and thereby provides insight into sequencing characteristics that indicate call quality for variants of different frequencies. Availability and Implementation Code is available on Github at: https://github.com/suyashss/variant_validation. Contacts suyashs@stanford.edu or mtaub@jhsph.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suyash S Shringarpure
- Departments of Genetics and Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Rasika A Mathias
- 23 and Me Inc, Mountain View, CA, USA.,Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Ryan D Hernandez
- Department of Epidemiology, Bloomberg School of Public Health, JHU, Baltimore, MD, USA.,Department of Bioengineering and Therapeutic Sciences.,Institute for Human Genetics
| | - Timothy D O'Connor
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, CA, USA.,Institute for Genome Sciences.,Program in Personalized and Genomic Medicine
| | - Zachary A Szpiech
- Department of Epidemiology, Bloomberg School of Public Health, JHU, Baltimore, MD, USA
| | - Raul Torres
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Francisco M De La Vega
- Departments of Genetics and Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Carlos D Bustamante
- Departments of Genetics and Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Kathleen C Barnes
- 23 and Me Inc, Mountain View, CA, USA.,Department of Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Margaret A Taub
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | | |
Collapse
|
36
|
A robust targeted sequencing approach for low input and variable quality DNA from clinical samples. NPJ Genom Med 2018; 3:2. [PMID: 29354287 PMCID: PMC5768874 DOI: 10.1038/s41525-017-0041-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2017] [Revised: 11/27/2017] [Accepted: 12/05/2017] [Indexed: 02/07/2023] Open
Abstract
Next-generation deep sequencing of gene panels is being adopted as a diagnostic test to identify actionable mutations in cancer patient samples. However, clinical samples, such as formalin-fixed, paraffin-embedded specimens, frequently provide low quantities of degraded, poor quality DNA. To overcome these issues, many sequencing assays rely on extensive PCR amplification leading to an accumulation of bias and artifacts. Thus, there is a need for a targeted sequencing assay that performs well with DNA of low quality and quantity without relying on extensive PCR amplification. We evaluate the performance of a targeted sequencing assay based on Oligonucleotide Selective Sequencing, which permits the enrichment of genes and regions of interest and the identification of sequence variants from low amounts of damaged DNA. This assay utilizes a repair process adapted to clinical FFPE samples, followed by adaptor ligation to single stranded DNA and a primer-based capture technique. Our approach generates sequence libraries of high fidelity with reduced reliance on extensive PCR amplification—this facilitates the accurate assessment of copy number alterations in addition to delivering accurate single nucleotide variant and insertion/deletion detection. We apply this method to capture and sequence the exons of a panel of 130 cancer-related genes, from which we obtain high read coverage uniformity across the targeted regions at starting input DNA amounts as low as 10 ng per sample. We demonstrate the performance using a series of reference DNA samples, and by identifying sequence variants in DNA from matched clinical samples originating from different tissue types. A new DNA sequencing technology enables comprehensive genetic analyses of poor-quality tumor samples. Hanlee Ji from Stanford University in California, USA, together with colleagues from a company he cofounded called TOMA Biosciences, tested the performance of a targeted sequencing assay known as oligonucleotide-selective sequencing (OS-Seq). They used the “in-solution” version of OS-Seq, which involves a pre-processing step to remove any damaged DNA and then sequences target regions of the genome to look for duplications, insertions or deletions of DNA segments. Using archival specimens (which often contain low quantities of degraded DNA) from patients with lung and colorectal cancer, the researchers showed they could detect sequence variants in a panel of 130 cancer-related genes. The findings suggest the OS-Seq assay could help inform treatment decisions for cancer patients, even with clinical specimens of low quality.
Collapse
|
37
|
Abstract
PURPOSE OF REVIEW Genome sequencing is now available as a clinical diagnostic test. There is a significant knowledge and translation gap for nongenetic specialists of the processes necessary to generate and interpret clinical genome sequencing. The purpose of this review is to provide a primer on contemporary clinical genome sequencing for nongenetic specialists describing the human genome project, current techniques and applications in genome sequencing, limitations of current technology, and techniques on the horizon. RECENT FINDINGS As currently implemented, genome sequencing compares short pieces of an individual's genome with a reference sequence developed by the human genome project. Genome sequencing may be used for obtaining timely diagnostic information, cancer pharmacogenomics, or in clinical cases when previous genetic testing has not revealed a clear diagnosis. At present, the implementation of clinical genome sequencing is limited by the availability of clinicians qualified for interpretation, and current techniques in used clinical testing do not detect all types of genetic variation present in a single genome. SUMMARY Clinicians considering a genetic diagnosis have wide array of testing choices which now includes genome sequencing. Although not a comprehensive test in its current form, genome sequencing offers more information than gene-panel or exome sequencing and has the potential to replace targeted single-gene or gene-panel testing in many clinical scenarios.
Collapse
|
38
|
Shum BO, Henner I, Belluoccio D, Hinchcliffe MJ. Utility of NIST Whole-Genome Reference Materials for the Technical Validation of a Multigene Next-Generation Sequencing Test. J Mol Diagn 2017; 19:602-612. [DOI: 10.1016/j.jmoldx.2017.04.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2017] [Revised: 04/10/2017] [Accepted: 04/11/2017] [Indexed: 01/04/2023] Open
|
39
|
Nafisinia M, Riley LG, Gold WA, Bhattacharya K, Broderick CR, Thorburn DR, Simons C, Christodoulou J. Compound heterozygous mutations in glycyl-tRNA synthetase (GARS) cause mitochondrial respiratory chain dysfunction. PLoS One 2017; 12:e0178125. [PMID: 28594869 PMCID: PMC5464557 DOI: 10.1371/journal.pone.0178125] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 05/07/2017] [Indexed: 01/13/2023] Open
Abstract
Glycyl-tRNA synthetase (GARS; OMIM 600287) is one of thirty-seven tRNA-synthetase genes that catalyses the synthesis of glycyl-tRNA, which is required to insert glycine into proteins within the cytosol and mitochondria. To date, eighteen mutations in GARS have been reported in patients with autosomal-dominant Charcot-Marie-Tooth disease type 2D (CMT2D; OMIM 601472), and/or distal spinal muscular atrophy type V (dSMA-V; OMIM 600794). In this study, we report a patient with clinical and biochemical features suggestive of a mitochondrial respiratory chain (MRC) disorder including mild left ventricular posterior wall hypertrophy, exercise intolerance, and lactic acidosis. Using whole exome sequencing we identified compound heterozygous novel variants, c.803C>T; p.(Thr268Ile) and c.1234C>T; p.(Arg412Cys), in GARS in the proband. Spectrophotometric evaluation of the MRC complexes showed reduced activity of Complex I, III and IV in patient skeletal muscle and reduced Complex I and IV activity in the patient liver, with Complex IV being the most severely affected in both tissues. Immunoblot analysis of GARS protein and subunits of the MRC enzyme complexes in patient fibroblast extracts showed significant reduction in GARS protein levels and Complex IV. Together these studies provide evidence that the identified compound heterozygous GARS variants may be the cause of the mitochondrial dysfunction in our patient.
Collapse
Affiliation(s)
- Michael Nafisinia
- Genetic Metabolic Disorders Research Unit, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
- Discipline of Child & Adolescent Health, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
| | - Lisa G. Riley
- Genetic Metabolic Disorders Research Unit, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
- Discipline of Child & Adolescent Health, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
| | - Wendy A. Gold
- Genetic Metabolic Disorders Research Unit, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
- Discipline of Child & Adolescent Health, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
| | - Kaustuv Bhattacharya
- Discipline of Child & Adolescent Health, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
- Discipline of Genetic Medicine, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
- Genetic Metabolic Disorders Service, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
| | - Carolyn R. Broderick
- Children’s Hospital Institute of Sports Medicine, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
- School of Medical Sciences, UNSW, Sydney, New South Wales, Australia
| | - David R. Thorburn
- Murdoch Childrens Research Institute and Victorian Clinical Genetics Services, Royal Children’s Hospital, and Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
| | - Cas Simons
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland, Australia
| | - John Christodoulou
- Genetic Metabolic Disorders Research Unit, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
- Discipline of Child & Adolescent Health, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
- Discipline of Genetic Medicine, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
- Genetic Metabolic Disorders Service, Western Sydney Genetics Program, The Children’s Hospital at Westmead, Sydney, New South Wales, Australia
- Murdoch Childrens Research Institute and Victorian Clinical Genetics Services, Royal Children’s Hospital, and Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
- * E-mail:
| |
Collapse
|
40
|
Huang AY, Zhang Z, Ye AY, Dou Y, Yan L, Yang X, Zhang Y, Wei L. MosaicHunter: accurate detection of postzygotic single-nucleotide mosaicism through next-generation sequencing of unpaired, trio, and paired samples. Nucleic Acids Res 2017; 45:e76. [PMID: 28132024 PMCID: PMC5449543 DOI: 10.1093/nar/gkx024] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 12/24/2016] [Accepted: 01/26/2017] [Indexed: 02/07/2023] Open
Abstract
Genomic mosaicism arising from postzygotic mutations has long been associated with cancer and more recently with non-cancer diseases. It has also been detected in healthy individuals including healthy parents of children affected with genetic disorders, highlighting its critical role in the origin of genetic mutations. However, most existing software for the genome-wide identification of single-nucleotide mosaicisms (SNMs) requires a paired control tissue obtained from the same individual which is often unavailable for non-cancer individuals and sometimes missing in cancer studies. Here, we present MosaicHunter (http://mosaichunter.cbi.pku.edu.cn), a bioinformatics tool that can identify SNMs in whole-genome and whole-exome sequencing data of unpaired samples without matched controls using Bayesian genotypers. We evaluate the accuracy of MosaicHunter on both simulated and real data and demonstrate that it has improved performance compared with other somatic mutation callers. We further demonstrate that incorporating sequencing data of the parents can be an effective approach to significantly improve the accuracy of detecting SNMs in an individual when a matched control sample is unavailable. Finally, MosaicHunter also has a paired mode that can take advantage of matched control samples when available, making it a useful tool for detecting SNMs in both non-cancer and cancer studies.
Collapse
Affiliation(s)
- August Yue Huang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- National Institute of Biological Sciences, Beijing 102206, People's Republic of China
| | - Zheng Zhang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- School of Life Sciences, Tsinghua-Peking Joint Center for Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China
| | - Adam Yongxin Ye
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- Peking-Tsinghua Center for Life Sciences, Beijing, People's Republic of China
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, People's Republic of China
| | - Yanmei Dou
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
- National Institute of Biological Sciences, Beijing 102206, People's Republic of China
| | - Linlin Yan
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
| | - Xiaoxu Yang
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
| | - Yuehua Zhang
- Peking University First Hospital, Peking University, Beijing 100034, People's Republic of China
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, People's Republic of China
| |
Collapse
|
41
|
Huang G, Wang S, Wang X, You N. An empirical Bayes method for genotyping and SNP detection using multi-sample next-generation sequencing data. Bioinformatics 2016; 32:3240-3245. [DOI: 10.1093/bioinformatics/btw409] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 06/20/2016] [Indexed: 12/30/2022] Open
|
42
|
Vanderver A, Simons C, Helman G, Crawford J, Wolf NI, Bernard G, Pizzino A, Schmidt JL, Takanohashi A, Miller D, Khouzam A, Rajan V, Ramos E, Chowdhury S, Hambuch T, Ru K, Baillie GJ, Grimmond SM, Caldovic L, Devaney J, Bloom M, Evans SH, Murphy JLP, McNeill N, Fogel BL, Schiffmann R, van der Knaap MS, Taft RJ. Whole exome sequencing in patients with white matter abnormalities. Ann Neurol 2016; 79:1031-1037. [PMID: 27159321 DOI: 10.1002/ana.24650] [Citation(s) in RCA: 106] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Revised: 03/27/2016] [Accepted: 03/28/2016] [Indexed: 01/25/2023]
Abstract
Here we report whole exome sequencing (WES) on a cohort of 71 patients with persistently unresolved white matter abnormalities with a suspected diagnosis of leukodystrophy or genetic leukoencephalopathy. WES analyses were performed on trio, or greater, family groups. Diagnostic pathogenic variants were identified in 35% (25 of 71) of patients. Potentially pathogenic variants were identified in clinically relevant genes in a further 7% (5 of 71) of cases, giving a total yield of clinical diagnoses in 42% of individuals. These findings provide evidence that WES can substantially decrease the number of unresolved white matter cases. Ann Neurol 2016;79:1031-1037.
Collapse
Affiliation(s)
- Adeline Vanderver
- Department of Neurology, Children's National Medical Center, Washington, DC.,Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC.,School of Medicine and Health Sciences, George Washington University, Washington, DC
| | - Cas Simons
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia
| | - Guy Helman
- Department of Neurology, Children's National Medical Center, Washington, DC.,Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC
| | - Joanna Crawford
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia
| | - Nicole I Wolf
- Department of Child Neurology, VU University Medical Center and Neuroscience Campus Amsterdam, Amsterdam, the Netherlands
| | - Geneviève Bernard
- Departments of Pediatrics, Neurology, and Neurosurgery, Montreal Children's Hospital, McGill University Health Center, Montreal, Quebec, Canada
| | - Amy Pizzino
- Department of Neurology, Children's National Medical Center, Washington, DC
| | - Johanna L Schmidt
- Department of Neurology, Children's National Medical Center, Washington, DC.,Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC
| | - Asako Takanohashi
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC
| | - David Miller
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia.,University of Melbourne Centre for Cancer Research, University of Melbourne, Parkville, Victoria, Australia
| | | | | | | | | | | | - Kelin Ru
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia
| | - Gregory J Baillie
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia
| | - Sean M Grimmond
- Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia.,University of Melbourne Centre for Cancer Research, University of Melbourne, Parkville, Victoria, Australia
| | - Ljubica Caldovic
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC
| | - Joseph Devaney
- Center for Genetic Medicine Research, Children's National Medical Center, Washington, DC
| | - Miriam Bloom
- Department of Pediatrics, Children's National Medical Center, Washington, DC
| | - Sarah H Evans
- Department of Physical Medicine and Rehabilitation, Children's National Medical Center, Washington, DC
| | | | - Nathan McNeill
- Institute for Metabolic Disease, Baylor Research Institute, Dallas, TX
| | - Brent L Fogel
- Department of Neurology, Program in Neurogenetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA
| | | | | | - Marjo S van der Knaap
- Department of Child Neurology, VU University Medical Center and Neuroscience Campus Amsterdam, Amsterdam, the Netherlands.,Department of Functional Genomics, VU University, Amsterdam, the Netherlands
| | - Ryan J Taft
- School of Medicine and Health Sciences, George Washington University, Washington, DC.,Institute for Molecular Bioscience, University of Queensland, St Lucia, Queensland, Australia.,Illumina Inc, San Diego, CA
| |
Collapse
|
43
|
Sequence-based Association Analysis Reveals an MGST1 eQTL with Pleiotropic Effects on Bovine Milk Composition. Sci Rep 2016; 6:25376. [PMID: 27146958 PMCID: PMC4857175 DOI: 10.1038/srep25376] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/15/2016] [Indexed: 11/08/2022] Open
Abstract
The mammary gland is a prolific lipogenic organ, synthesising copious amounts of triglycerides for secretion into milk. The fat content of milk varies widely both between and within species, and recent independent genome-wide association studies have highlighted a milk fat percentage quantitative trait locus (QTL) of large effect on bovine chromosome 5. Although both EPS8 and MGST1 have been proposed to underlie these signals, the causative status of these genes has not been functionally confirmed. To investigate this QTL in detail, we report genome sequence-based imputation and association mapping in a population of 64,244 taurine cattle. This analysis reveals a cluster of 17 non-coding variants spanning MGST1 that are highly associated with milk fat percentage, and a range of other milk composition traits. Further, we exploit a high-depth mammary RNA sequence dataset to conduct expression QTL (eQTL) mapping in 375 lactating cows, revealing a strong MGST1 eQTL underpinning these effects. These data demonstrate the utility of DNA and RNA sequence-based association mapping, and implicate MGST1, a gene with no obvious mechanistic relationship to milk composition regulation, as causally involved in these processes.
Collapse
|
44
|
Gyarmati P, Kjellander C, Aust C, Song Y, Öhrmalm L, Giske CG. Metagenomic analysis of bloodstream infections in patients with acute leukemia and therapy-induced neutropenia. Sci Rep 2016; 6:23532. [PMID: 26996149 PMCID: PMC4800731 DOI: 10.1038/srep23532] [Citation(s) in RCA: 58] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 03/08/2016] [Indexed: 01/05/2023] Open
Abstract
Leukemic patients are often immunocompromised due to underlying conditions, comorbidities and the effects of chemotherapy, and thus at risk for developing systemic infections. Bloodstream infection (BSI) is a severe complication in neutropenic patients, and is associated with increased mortality. BSI is routinely diagnosed with blood culture, which only detects culturable pathogens. We analyzed 27 blood samples from 9 patients with acute leukemia and suspected BSI at different time points of their antimicrobial treatment using shotgun metagenomics sequencing in order to detect unculturable and non-bacterial pathogens. Our findings confirm the presence of bacterial, fungal and viral pathogens alongside antimicrobial resistance genes. Decreased white blood cell (WBC) counts were associated with the presence of microbial DNA, and was inversely proportional to the number of sequencing reads. This study could indicate the use of high-throughput sequencing for personalized antimicrobial treatments in BSIs.
Collapse
Affiliation(s)
- P Gyarmati
- Karolinska Institutet, Department of Laboratory Medicine, Alfred Nobels Allé 8, Stockholm, 17177 Sweden.,Karolinska University Hospital, Department of Clinical Microbiology L2:02, Stockholm, 17176 Sweden
| | - C Kjellander
- Karolinska Institutet, Department of Medicine, Division of Hematology, Stockholm, 17176 Sweden
| | - C Aust
- Karolinska Institutet, Department of Medicine, Solna, Infectious Diseases Unit, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, 17176 Sweden
| | - Y Song
- Royal Institute of Technology, Science for Life Laboratory, Stockholm, 17176 Sweden
| | - L Öhrmalm
- Karolinska Institutet, Department of Medicine, Solna, Infectious Diseases Unit, Center for Molecular Medicine, Karolinska University Hospital, Stockholm, 17176 Sweden
| | - C G Giske
- Karolinska Institutet, Department of Laboratory Medicine, Alfred Nobels Allé 8, Stockholm, 17177 Sweden.,Karolinska University Hospital, Department of Clinical Microbiology L2:02, Stockholm, 17176 Sweden
| |
Collapse
|
45
|
Narasimhan VM, Hunt KA, Mason D, Baker CL, Karczewski KJ, Barnes MR, Barnett AH, Bates C, Bellary S, Bockett NA, Giorda K, Griffiths CJ, Hemingway H, Jia Z, Kelly MA, Khawaja HA, Lek M, McCarthy S, McEachan R, O'Donnell-Luria A, Paigen K, Parisinos CA, Sheridan E, Southgate L, Tee L, Thomas M, Xue Y, Schnall-Levin M, Petkov PM, Tyler-Smith C, Maher ER, Trembath RC, MacArthur DG, Wright J, Durbin R, van Heel DA. Health and population effects of rare gene knockouts in adult humans with related parents. Science 2016; 352:474-7. [PMID: 26940866 DOI: 10.1126/science.aac8624] [Citation(s) in RCA: 202] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Accepted: 02/18/2016] [Indexed: 12/13/2022]
Abstract
Examining complete gene knockouts within a viable organism can inform on gene function. We sequenced the exomes of 3222 British adults of Pakistani heritage with high parental relatedness, discovering 1111 rare-variant homozygous genotypes with predicted loss of function (knockouts) in 781 genes. We observed 13.7% fewer homozygous knockout genotypes than we expected, implying an average load of 1.6 recessive-lethal-equivalent loss-of-function (LOF) variants per adult. When genetic data were linked to the individuals' lifelong health records, we observed no significant relationship between gene knockouts and clinical consultation or prescription rate. In this data set, we identified a healthy PRDM9-knockout mother and performed phased genome sequencing on her, her child, and control individuals. Our results show that meiotic recombination sites are localized away from PRDM9-dependent hotspots. Thus, natural LOF variants inform on essential genetic loci and demonstrate PRDM9 redundancy in humans.
Collapse
Affiliation(s)
| | - Karen A Hunt
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Dan Mason
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Christopher L Baker
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Konrad J Karczewski
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Michael R Barnes
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Anthony H Barnett
- Diabetes and Endocrine Centre, Heart of England NHS Foundation Trust and University of Birmingham, Birmingham B9 5SS, UK
| | - Chris Bates
- TPP, Mill House, Troy Road, Leeds LS18 5TN, UK
| | - Srikanth Bellary
- Aston Research Centre for Healthy Ageing, Aston University, Birmingham B4 7ET, UK
| | - Nicholas A Bockett
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Kristina Giorda
- 10X Genomics, 7068 Koll Center Parkway, Suite 415, Pleasanton, CA 94566, USA
| | - Christopher J Griffiths
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Harry Hemingway
- Farr Institute of Health Informatics Research, London NW1 2DA, UK. Institute of Health Informatics, University College London, London NW1 2DA, UK
| | - Zhilong Jia
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - M Ann Kelly
- School of Clinical and Experimental Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Hajrah A Khawaja
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Shane McCarthy
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Rosie McEachan
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Anne O'Donnell-Luria
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kenneth Paigen
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Constantinos A Parisinos
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Eamonn Sheridan
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Laura Southgate
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK
| | - Louise Tee
- School of Clinical and Experimental Medicine, University of Birmingham, Birmingham B15 2TT, UK
| | - Mark Thomas
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | - Yali Xue
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK
| | | | - Petko M Petkov
- Center for Genome Dynamics, The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Eamonn R Maher
- Department of Medical Genetics, University of Cambridge and National Institute for Health Research (NIHR) Cambridge Biomedical Research Centre, Box 238, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK. Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Richard C Trembath
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK. Faculty of Life Sciences and Medicine, King's College London, London SE1 1UL, UK
| | - Daniel G MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - John Wright
- Bradford Institute for Health Research, Bradford Teaching Hospitals National Health Service (NHS) Foundation Trust, Bradford BD9 6RJ, UK
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK.
| | - David A van Heel
- Blizard Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London E1 2AT, UK.
| |
Collapse
|
46
|
Goldfeder RL, Priest JR, Zook JM, Grove ME, Waggott D, Wheeler MT, Salit M, Ashley EA. Medical implications of technical accuracy in genome sequencing. Genome Med 2016; 8:24. [PMID: 26932475 PMCID: PMC4774017 DOI: 10.1186/s13073-016-0269-0] [Citation(s) in RCA: 85] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Accepted: 01/21/2016] [Indexed: 12/31/2022] Open
Abstract
Background As whole exome sequencing (WES) and whole genome sequencing (WGS) transition from research tools to clinical diagnostic tests, it is increasingly critical for sequencing methods and analysis pipelines to be technically accurate. The Genome in a Bottle Consortium has recently published a set of benchmark SNV, indel, and homozygous reference genotypes for the pilot whole genome NIST Reference Material based on the NA12878 genome. Methods We examine the relationship between human genome complexity and genes/variants reported to be associated with human disease. Specifically, we map regions of medical relevance to benchmark regions of high or low confidence. We use benchmark data to assess the sensitivity and positive predictive value of two representative sequencing pipelines for specific classes of variation. Results We observe that the accuracy of a variant call depends on the genomic region, variant type, and read depth, and varies by analytical pipeline. We find that most false negative WGS calls result from filtering while most false negative WES variants relate to poor coverage. We find that only 74.6 % of the exonic bases in ClinVar and OMIM genes and 82.1 % of the exonic bases in ACMG-reportable genes are found in high-confidence regions. Only 990 genes in the genome are found entirely within high-confidence regions while 593 of 3,300 ClinVar/OMIM genes have less than 50 % of their total exonic base pairs in high-confidence regions. We find greater than 77 % of the pathogenic or likely pathogenic SNVs currently in ClinVar fall within high-confidence regions. We identify sites that are prone to sequencing errors, including thousands present in publicly available variant databases. Finally, we examine the clinical impact of mandatory reporting of secondary findings, highlighting a false positive variant found in BRCA2. Conclusions Together, these data illustrate the importance of appropriate use and continued improvement of technical benchmarks to ensure accurate and judicious interpretation of next-generation DNA sequencing results in the clinical setting. Electronic supplementary material The online version of this article (doi:10.1186/s13073-016-0269-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rachel L Goldfeder
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - James R Priest
- Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA. .,Department of Pediatrics, Stanford University, Stanford, CA, 94305, USA.
| | - Justin M Zook
- Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Megan E Grove
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - Daryl Waggott
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - Matthew T Wheeler
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA.
| | - Marc Salit
- Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA.
| | - Euan A Ashley
- Department of Medicine, Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Inherited Cardiovascular Disease, Stanford University, Stanford, CA, 94305, USA. .,Department of Genetics, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
47
|
Haplotyping germline and cancer genomes with high-throughput linked-read sequencing. Nat Biotechnol 2016; 34:303-11. [PMID: 26829319 PMCID: PMC4786454 DOI: 10.1038/nbt.3432] [Citation(s) in RCA: 438] [Impact Index Per Article: 54.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2015] [Accepted: 11/12/2015] [Indexed: 01/13/2023]
Abstract
Haplotyping of human chromosomes is a prerequisite for cataloguing the full repertoire of genetic variation. We present a microfluidics-based, linked-read sequencing technology that can phase and haplotype germline and cancer genomes using nanograms of input DNA. This high-throughput platform prepares barcoded libraries for short-read sequencing and computationally reconstructs long-range haplotype and structural variant information. We generate haplotype blocks in a nuclear trio that are concordant with expected inheritance patterns and phase a set of structural variants. We also resolve the structure of the EML4-ALK gene fusion in the NCI-H2228 cancer cell line using phased exome sequencing. Finally, we assign genetic aberrations to specific megabase-scale haplotypes generated from whole-genome sequencing of a primary colorectal adenocarcinoma. This approach resolves haplotype information using up to 100 times less genomic DNA than some methods and enables the accurate detection of structural variants.
Collapse
|
48
|
Konopka T, Nijman SMB. Comparison of genetic variants in matched samples using thesaurus annotation. Bioinformatics 2015; 32:657-63. [PMID: 26545822 PMCID: PMC4795618 DOI: 10.1093/bioinformatics/btv654] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Accepted: 10/30/2015] [Indexed: 12/21/2022] Open
Abstract
MOTIVATION Calling changes in DNA, e.g. as a result of somatic events in cancer, requires analysis of multiple matched sequenced samples. Events in low-mappability regions of the human genome are difficult to encode in variant call files and have been under-reported as a result. However, they can be described accurately through thesaurus annotation-a technique that links multiple genomic loci together to explicate a single variant. RESULTS We here describe software and benchmarks for using thesaurus annotation to detect point changes in DNA from matched samples. In benchmarks on matched normal/tumor samples we show that the technique can recover between five and ten percent more true events than conventional approaches, while strictly limiting false discovery and being fully consistent with popular variant analysis workflows. We also demonstrate the utility of the approach for analysis of de novo mutations in parents/child families. AVAILABILITY AND IMPLEMENTATION Software performing thesaurus annotation is implemented in java; available in source code on github at GeneticThesaurus (https://github.com/tkonopka/GeneticThesaurus) and as an executable on sourceforge at geneticthesaurus (https://sourceforge.net/projects/geneticthesaurus). Mutation calling is implemented in an R package available on github at RGeneticThesaurus (https://github.com/tkonopka/RGeneticThesaurus). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online. CONTACT tomasz.konopka@ludwig.ox.ac.uk.
Collapse
Affiliation(s)
- Tomasz Konopka
- Ludwig Institute for Cancer Research, University of Oxford, Oxford, UK
| | | |
Collapse
|
49
|
Cunha MLR, Meijers JCM, Middeldorp S. Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism. Thromb Haemost 2015; 114:920-32. [PMID: 26446408 DOI: 10.1160/th15-05-0411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/26/2015] [Indexed: 12/13/2022]
Abstract
Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.
Collapse
Affiliation(s)
- Marisa L R Cunha
- Marisa L. R. Cunha, Department of Experimental Vascular Medicine, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel.: +31 20 5662824, Fax: +31 20 6968833, E-mail:
| | | | | |
Collapse
|
50
|
Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, Garrison EP, Marth GT, Quinlan AR, Hall IM. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nat Methods 2015; 12:966-8. [PMID: 26258291 PMCID: PMC4589466 DOI: 10.1038/nmeth.3505] [Citation(s) in RCA: 344] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2014] [Accepted: 05/28/2015] [Indexed: 12/11/2022]
Abstract
SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.
Collapse
Affiliation(s)
- Colby Chiang
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Ryan M. Layer
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, USA
- USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Gregory G. Faust
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Michael R. Lindberg
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - David B. Rose
- Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Erik P. Garrison
- USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
- Wellcome Trust Sanger Institute, Hinxton, UK
| | - Gabor T. Marth
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, USA
- USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Aaron R. Quinlan
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, USA
- USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Ira M. Hall
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|