51
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
52
|
Huang AY, Lee EA. Identification of Somatic Mutations From Bulk and Single-Cell Sequencing Data. FRONTIERS IN AGING 2022; 2:800380. [PMID: 35822012 PMCID: PMC9261417 DOI: 10.3389/fragi.2021.800380] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2021] [Accepted: 12/08/2021] [Indexed: 12/26/2022]
Abstract
Somatic mutations are DNA variants that occur after the fertilization of zygotes and accumulate during the developmental and aging processes in the human lifespan. Somatic mutations have long been known to cause cancer, and more recently have been implicated in a variety of non-cancer diseases. The patterns of somatic mutations, or mutational signatures, also shed light on the underlying mechanisms of the mutational process. Advances in next-generation sequencing over the decades have enabled genome-wide profiling of DNA variants in a high-throughput manner; however, unlike germline mutations, somatic mutations are carried only by a subset of the cell population. Thus, sensitive bioinformatic methods are required to distinguish mutant alleles from sequencing and base calling errors in bulk tissue samples. An alternative way to study somatic mutations, especially those present in an extremely small number of cells or even in a single cell, is to sequence single-cell genomes after whole-genome amplification (WGA); however, it is critical and technically challenging to exclude numerous technical artifacts arising during error-prone and uneven genome amplification in current WGA methods. To address these challenges, multiple bioinformatic tools have been developed. In this review, we summarize the latest progress in methods for identification of somatic mutations and the challenges that remain to be addressed in the future.
Collapse
Affiliation(s)
- August Yue Huang
- Division of Genetics and Genomics, Manton Center for Orphan Diseases, Boston Children's Hospital, Boston, MA, United States, Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| | - Eunjung Alice Lee
- Division of Genetics and Genomics, Manton Center for Orphan Diseases, Boston Children's Hospital, Boston, MA, United States, Department of Pediatrics, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
53
|
Chan HT, Chin YM, Low SK. Circulating Tumor DNA-Based Genomic Profiling Assays in Adult Solid Tumors for Precision Oncology: Recent Advancements and Future Challenges. Cancers (Basel) 2022; 14:3275. [PMID: 35805046 PMCID: PMC9265547 DOI: 10.3390/cancers14133275] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 06/30/2022] [Accepted: 07/02/2022] [Indexed: 12/04/2022] Open
Abstract
Genomic profiling using tumor biopsies remains the standard approach for the selection of approved molecular targeted therapies. However, this is often limited by its invasiveness, feasibility, and poor sample quality. Liquid biopsies provide a less invasive approach while capturing a contemporaneous and comprehensive tumor genomic profile. Recent advancements in the detection of circulating tumor DNA (ctDNA) from plasma samples at satisfactory sensitivity, specificity, and detection concordance to tumor tissues have facilitated the approval of ctDNA-based genomic profiling to be integrated into regular clinical practice. The recent approval of both single-gene and multigene assays to detect genetic biomarkers from plasma cell-free DNA (cfDNA) as companion diagnostic tools for molecular targeted therapies has transformed the therapeutic decision-making procedure for advanced solid tumors. Despite the increasing use of cfDNA-based molecular profiling, there is an ongoing debate about a 'plasma first' or 'tissue first' approach toward genomic testing for advanced solid malignancies. Both approaches present possible advantages and disadvantages, and these factors should be carefully considered to personalize and select the most appropriate genomic assay. This review focuses on the recent advancements of cfDNA-based genomic profiling assays in advanced solid tumors while highlighting the major challenges that should be tackled to formulate evidence-based guidelines in recommending the 'right assay for the right patient at the right time'.
Collapse
Affiliation(s)
- Hiu Ting Chan
- Project for Development of Liquid Biopsy Diagnosis, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo 135-8550, Japan; (Y.M.C.); (S.-K.L.)
| | - Yoon Ming Chin
- Project for Development of Liquid Biopsy Diagnosis, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo 135-8550, Japan; (Y.M.C.); (S.-K.L.)
- Cancer Precision Medicine, Inc., Kawasaki 213-0012, Japan
| | - Siew-Kee Low
- Project for Development of Liquid Biopsy Diagnosis, Cancer Precision Medicine Center, Japanese Foundation for Cancer Research, Tokyo 135-8550, Japan; (Y.M.C.); (S.-K.L.)
| |
Collapse
|
54
|
Wilcox JJS, Arca-Ruibal B, Samour J, Mateuta V, Idaghdour Y, Boissinot S. Linked-Read Sequencing of Eight Falcons Reveals a Unique Genomic Architecture in Flux. Genome Biol Evol 2022; 14:evac090. [PMID: 35700227 PMCID: PMC9214253 DOI: 10.1093/gbe/evac090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 05/27/2022] [Accepted: 06/06/2022] [Indexed: 11/12/2022] Open
Abstract
Falcons are diverse birds of cultural and economic importance. They have undergone major lineage-specific chromosomal rearrangements, resulting in greatly-reduced chromosome counts relative to other birds. Here, we use 10X Genomics linked reads to provide new high-contiguity genomes for two gyrfalcons, a saker falcon, a lanner falcon, three subspecies of peregrine falcons, and the common kestrel. Assisted by a transcriptome sequenced from 22 gyrfalcon tissues, we annotate these genomes for a variety of genomic features, estimate historical demography, and then investigate genomic equilibrium in the context of falcon-specific chromosomal rearrangements. We find that falcon genomes are not in AT-GC equilibrium with a bias in substitutions towards higher AT content; this bias is predominantly but not exclusively driven by hypermutability of CpG sites. Small indels and large structural variants were also biased towards insertions rather than deletions. Patterns of disequilibrium were linked to chromosomal rearrangements: falcons have lost GC content in regions that have fused to larger chromosomes from microchromosomes and gained GC content in regions of macrochromosomes that have translocated to microchromosomes. Inserted bases have accumulated on regions ancestrally belonging to microchromosomes, consistent with insertion-biased gene conversion. We also find an excess of interspersed repeats on regions of microchromosomes that have fused to macrochromosomes. Our results reveal that falcon genomes are in a state of flux. They further suggest that many of the key differences between microchromosomes and macrochromosomes are driven by differences in chromosome size, and indicate a clear role for recombination and biased-gene-conversion in determining genomic equilibrium.
Collapse
Affiliation(s)
- Justin J S Wilcox
- Center for Genomics & Systems Biology, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | | | - Jaime Samour
- Wildlife Management and Falcon Medicine and Breeding Consultancy, Abu Dhabi, United Arab Emirates
| | | | - Youssef Idaghdour
- Center for Genomics & Systems Biology, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
- Biology Program, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| | - Stéphane Boissinot
- Center for Genomics & Systems Biology, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
- Biology Program, New York University Abu Dhabi, Saadiyat Island, Abu Dhabi, United Arab Emirates
| |
Collapse
|
55
|
Arunachalam S, Szlachta K, Brady SW, Ma X, Ju B, Shaner B, Mulder HL, Easton J, Raphael BJ, Myers M, Tinkle C, Allen SJ, Orr BA, Wetmore CJ, Baker SJ, Zhang J. Convergent evolution and multi-wave clonal invasion in H3 K27-altered diffuse midline gliomas treated with a PDGFR inhibitor. Acta Neuropathol Commun 2022; 10:80. [PMID: 35642016 PMCID: PMC9153212 DOI: 10.1186/s40478-022-01381-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Accepted: 05/12/2022] [Indexed: 11/11/2022] Open
Abstract
The majority of diffuse midline gliomas, H3 K27-altered (DMG-H3 K27-a), are infiltrating pediatric brain tumors that arise in the pons with no effective treatment. To understand how clonal evolution contributes to the tumor’s invasive spread, we performed exome sequencing and SNP array profiling on 49 multi-region autopsy samples from 11 patients with pontine DMG-H3 K27-a enrolled in a phase I clinical trial of PDGFR inhibitor crenolanib. For each patient, a phylogenetic tree was constructed by testing multiple possible clonal evolution models to select the one consistent with somatic mutations and copy number variations across all tumor regions. The tree was then used to deconvolute subclonal composition and prevalence at each tumor region to study convergent evolution and invasion patterns. Somatic variants in the PI3K pathway, a late event, are enriched in our cohort, affecting 70% of patients. Convergent evolution of PI3K at distinct phylogenetic branches was detected in 40% of the patients. 24 (~ 50%) of tumor regions were occupied by subclones of mixed lineages with varying molecular ages, indicating multiple waves of invasion across the pons and extrapontine. Subclones harboring a PDGFRA amplicon, including one that amplified a PDGRFAY849C mutant allele, were detected in four patients; their presence in extrapontine tumor and normal brain samples imply their involvement in extrapontine invasion. Our study expands the current knowledge on tumor invasion patterns in DMG-H3 K27-a, which may inform the design of future clinical trials.
Collapse
|
56
|
Li S, Zeng W, Ni X, Zhou Y, Stackpole ML, Noor ZS, Yuan Z, Neal A, Memarzadeh S, Garon EB, Dubinett SM, Li W, Zhou XJ. cfTrack: A Method of Exome-Wide Mutation Analysis of Cell-free DNA to Simultaneously Monitor the Full Spectrum of Cancer Treatment Outcomes Including MRD, Recurrence, and Evolution. Clin Cancer Res 2022; 28:1841-1853. [PMID: 35149536 PMCID: PMC9126584 DOI: 10.1158/1078-0432.ccr-21-1242] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 10/19/2021] [Accepted: 02/09/2022] [Indexed: 01/19/2023]
Abstract
PURPOSE Cell-free DNA (cfDNA) offers a noninvasive approach to monitor cancer. Here we develop a method using whole-exome sequencing (WES) of cfDNA for simultaneously monitoring the full spectrum of cancer treatment outcomes, including minimal residual disease (MRD), recurrence, evolution, and second primary cancers. EXPERIMENTAL DESIGN Three simulation datasets were generated from 26 patients with cancer to benchmark the detection performance of MRD/recurrence and second primary cancers. For further validation, cfDNA samples (n = 76) from patients with cancer (n = 35) with six different cancer types were used for performance validation during various treatments. RESULTS We present a cfDNA-based cancer monitoring method, named cfTrack. Taking advantage of the broad genome coverage of WES data, cfTrack can sensitively detect MRD and cancer recurrence by integrating signals across known clonal tumor mutations of a patient. In addition, cfTrack detects tumor evolution and second primary cancers by de novo identifying emerging tumor mutations. A series of machine learning and statistical denoising techniques are applied to enhance the detection power. On the simulation data, cfTrack achieved an average AUC of 99% on the validation dataset and 100% on the independent dataset in detecting recurrence in samples with tumor fractions ≥0.05%. In addition, cfTrack yielded an average AUC of 88% in detecting second primary cancers in samples with tumor fractions ≥0.2%. On real data, cfTrack accurately monitors tumor evolution during treatment, which cannot be accomplished by previous methods. CONCLUSIONS Our results demonstrated that cfTrack can sensitively and specifically monitor the full spectrum of cancer treatment outcomes using exome-wide mutation analysis of cfDNA.
Collapse
Affiliation(s)
- Shuo Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California.,Bioinformatics Interdepartmental Graduate Program, University of California at Los Angeles, Los Angeles, California.,EarlyDiagnostics Inc., Los Angeles, California
| | - Weihua Zeng
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California
| | - Xiaohui Ni
- EarlyDiagnostics Inc., Los Angeles, California
| | - Yonggang Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California
| | - Mary L. Stackpole
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California.,Bioinformatics Interdepartmental Graduate Program, University of California at Los Angeles, Los Angeles, California.,EarlyDiagnostics Inc., Los Angeles, California
| | - Zorawar S. Noor
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Zuyang Yuan
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California
| | - Adam Neal
- Department of Obstetrics and Gynecology, David Geffen School of Medicine at UCLA, Los Angeles, California.,UCLA Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California Los Angeles, Los Angeles, California
| | - Sanaz Memarzadeh
- Department of Obstetrics and Gynecology, David Geffen School of Medicine at UCLA, Los Angeles, California.,UCLA Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, University of California Los Angeles, Los Angeles, California.,UCLA Jonsson Comprehensive Cancer Center, University of California Los Angeles, Los Angeles, California.,Molecular Biology Institute, University of California Los Angeles, Los Angeles, California.,VA Greater Los Angeles Health Care System, Los Angeles, California
| | - Edward B. Garon
- Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Steven M. Dubinett
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California.,Department of Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California.,VA Greater Los Angeles Health Care System, Los Angeles, California.,Department of Pulmonary and Critical Care Medicine, David Geffen School of Medicine at UCLA, Los Angeles, California.,Department of Molecular and Medical Pharmacology, David Geffen School of Medicine at UCLA, Los Angeles, California.,Department of Microbiology, Immunology and Molecular Genetics, University of California at Los Angeles, Los Angeles, California
| | - Wenyuan Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California.,Corresponding Author: Xianghong Jasmine Zhou, Pathology and Laboratory Medicine, University of California, Los Angeles, CA 90095. Phone: 310–267–0363; E-mail:
| |
Collapse
|
57
|
Yu L, Lopez G, Rassa J, Wang Y, Basavanhally T, Browne A, Huang CP, Dorsey L, Jen J, Hersey S. Direct comparison of circulating tumor DNA sequencing assays with targeted large gene panels. PLoS One 2022; 17:e0266889. [PMID: 35482763 PMCID: PMC9049497 DOI: 10.1371/journal.pone.0266889] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 03/29/2022] [Indexed: 11/24/2022] Open
Abstract
Next generation sequencing (NGS) assays with large targeted gene panels can comprehensively profile cancer somatic mutations in a tumor sample. Given the rapid adoption of such assays for circulating tumor DNA (ctDNA) analysis in clinical oncology, it is essential for the community to understand their analytical performance in liquid biopsy settings. Here, we directly compared five ctDNA NGS assays, most of which having a panel of 400 or more genes, with simulated samples harboring mutations relevant to solid tumors or myeloid malignancy. Our results indicate that the detection sensitivity and reproducibility of all five assays was 90% or higher when the mutations were at 0.5% or 1.0% allele frequency, and with optimal DNA input of 30 ng or 50 ng per vendor’s protocol. The performances decreased and varied dramatically, when mutations were at a 0.1% allele frequency and/or when a lower genomic input of 10 ng DNA was used. Interestingly, one of the assays repeatedly showed higher rate of false positivity than the others across two different sample sets. Multiple intrinsic technical factors pertaining to the NGS assays were further investigated. Notable differences among the assays were seen for depth of coverage and background noise, which profoundly impacted assay performance. The results derived from this study are highly informative and provide a framework to assess and select suitable assays for specific application in cancer monitoring and potential clinical use.
Collapse
Affiliation(s)
- Lizhi Yu
- Translational Sciences and Diagnostics, Translation Medicine, Bristol Myers Squibb, Summit, New Jersey, United States of America
- * E-mail:
| | - Gonzalo Lopez
- Translational Bioinformatics, Informatics and Predictive Sciences, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - John Rassa
- Translational Sciences and Diagnostics, Translation Medicine, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - Yixin Wang
- Translational Sciences and Diagnostics, Translation Medicine, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - Tara Basavanhally
- Translational Bioinformatics, Informatics and Predictive Sciences, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - Andrew Browne
- Translational Bioinformatics, Informatics and Predictive Sciences, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - Chang-Pin Huang
- Translational Research, Immuno-Oncology and Cell Therapy, Bristol Myers Squibb, Seattle, Washington, United States of America
| | - Lauren Dorsey
- Translational Bioinformatics, Informatics and Predictive Sciences, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - Jin Jen
- Translational Bioinformatics, Informatics and Predictive Sciences, Bristol Myers Squibb, Summit, New Jersey, United States of America
| | - Sarah Hersey
- Translational Sciences and Diagnostics, Translation Medicine, Bristol Myers Squibb, Summit, New Jersey, United States of America
| |
Collapse
|
58
|
Barsan V, Xia Y, Klein D, Gonzalez-Pena V, Youssef S, Inaba Y, Mahmud O, Natarajan S, Agarwal V, Pang Y, Autry R, Pui CH, Inaba H, Evans W, Gawad C. Simultaneous monitoring of disease and microbe dynamics through plasma DNA sequencing in pediatric patients with acute lymphoblastic leukemia. SCIENCE ADVANCES 2022; 8:eabj1360. [PMID: 35442732 PMCID: PMC9020671 DOI: 10.1126/sciadv.abj1360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 01/19/2022] [Indexed: 05/09/2023]
Abstract
Treatment of acute lymphoblastic leukemia (ALL) necessitates continuous risk assessment of leukemic disease burden and infections that arise in the setting of immunosuppression. This study was performed to assess the feasibility of a hybrid capture next-generation sequencing panel to longitudinally measure molecular leukemic disease clearance and microbial species abundance in 20 pediatric patients with ALL throughout induction chemotherapy. This proof of concept helps establish a technical and conceptual framework that we anticipate will be expanded and applied to additional patients with leukemia, as well as extended to additional cancer types. Molecular monitoring can help accelerate the attainment of insights into the temporal biology of host-microbe-leukemia interactions, including how those changes correlate with and alter anticancer therapy efficacy. We also anticipate that fewer invasive bone marrow examinations will be required, as these methods improve with standardization and are validated for clinical use.
Collapse
Affiliation(s)
- Valentin Barsan
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Yuntao Xia
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - David Klein
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Veronica Gonzalez-Pena
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Sarah Youssef
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Yuki Inaba
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Ousman Mahmud
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Sivaraman Natarajan
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Vibhu Agarwal
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
| | - Yakun Pang
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Robert Autry
- Department of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Ching-Hon Pui
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Hiroto Inaba
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - William Evans
- Department of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
| | - Charles Gawad
- Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94304, USA
- Department of Oncology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Department of Computational Biology, St. Jude Children’s Research Hospital, Memphis, TN 38105, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
59
|
Wang D, Zhang Y, li R, Li J, Zhang R. Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency. J Adv Res 2022; 44:161-172. [PMID: 36725187 PMCID: PMC9937796 DOI: 10.1016/j.jare.2022.03.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 03/16/2022] [Accepted: 03/27/2022] [Indexed: 02/04/2023] Open
Abstract
INTRODUCTION Clinical precision oncology increasingly relies on accurate genome-wide profiling using large panel next generation sequencing; however, difficulties in accurate and consistent detection of somatic mutation from individual platforms and pipelines remain an open question. OBJECTIVES To obtain paired tumor-normal reference materials that can be effectively constructed and interchangeable with clinical samples, and evaluate the performance of 56 panels under routine testing conditions based on the reference samples. METHODS Genes involved in mismatch repair and DNA proofreading were knocked down using the CRISPR-Cas9 technology to accumulate somatic mutations in a defined GM12878 cell line. They were used as reference materials to comprehensively evaluate the reproducibility and accuracy of detection results of oncopanels and explore the potential influencing factors. RESULTS In total, 14 paired tumor-normal reference DNA samples from engineered cell lines were prepared, and a reference dataset comprising 168 somatic mutations in a high-confidence region of 1.8 Mb were generated. For mutations with an allele frequency (AF) of more than 5% in reference samples, 56 panels collectively reported 1306 errors, including 729 false negatives (FNs), 179 false positives (FPs) and 398 reproducibility errors. The performance metric varied among panels with precision and recall ranging from 0.773 to 1 and 0.683 to 1, respectively. Incorrect and inadequate filtering accounted for a large proportion of false discovery (including FNs and FPs), while low-quality detection, cross-contamination and other sequencing errors during the wet bench process were other sources of FNs and FPs. In addition, low AF (<5%) considerably influenced the reproducibility and comparability among panels. CONCLUSIONS This study provided an integrated practice for developing reference standard to assess oncopanels in detecting somatic mutations and quantitatively revealed the source of detection errors. It will promote optimization, validation, and quality control among laboratories with potential applicability in clinical use.
Collapse
Affiliation(s)
- Duo Wang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Yuanfeng Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Rui li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| |
Collapse
|
60
|
Gasparri F, Sarkar D, Bielickaite S, Poulsen MH, Hauser AS, Pless SA. P2X2 receptor subunit interfaces are missense variant hotspots where mutations tend to increase apparent ATP affinity. Br J Pharmacol 2022; 179:3859-3874. [PMID: 35285517 PMCID: PMC9314836 DOI: 10.1111/bph.15830] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 01/31/2022] [Accepted: 02/09/2022] [Indexed: 11/30/2022] Open
Abstract
Background and Purpose P2X receptors are trimeric ligand‐gated ion channels that open a cation‐selective pore in response to ATP binding to their large extracellular domain. The seven known P2X subtypes can assemble as homotrimeric or heterotrimeric complexes and contribute to numerous physiological functions, including nociception, inflammation and hearing. The overall structure of P2X receptors is well established, but little is known about the range and prevalence of human genetic variations and the functional implications of specific domains. Experimental Approach Here, we examine the impact of P2X2 receptor inter‐subunit interface missense variants identified in the human population or by structural predictions. We test both single and double mutants through electrophysiological and biochemical approaches. Key Results We demonstrate that predicted extracellular domain inter‐subunit interfaces display a higher‐than‐expected density of missense variations and that the majority of mutations that disrupt putative inter‐subunit interactions result in channels with higher apparent ATP affinity. Lastly, we show that double mutants at the subunit interface show significant energetic coupling, especially if located in close proximity. Conclusion and Implications We provide the first structural mapping of the mutational distribution across the human population in a ligand‐gated ion channel and show that the density of missense mutations is constrained between protein domains, indicating evolutionary selection at the domain level. Our data may indicate that, unlike other ligand‐gated ion channels, P2X2 receptors have evolved an intrinsically high threshold for activation, possibly to allow for additional modulation or as a cellular protection mechanism against overstimulation.
Collapse
Affiliation(s)
- Federica Gasparri
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Debayan Sarkar
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Sarune Bielickaite
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | - Mette Homann Poulsen
- Department of Drug Design and Pharmacology, University of Copenhagen, Copenhagen, Denmark
| | | | | |
Collapse
|
61
|
Lazarian G, Cymbalista F, Baran-Marszak F. Impact of Low-Burden TP53 Mutations in the Management of CLL. Front Oncol 2022; 12:841630. [PMID: 35211418 PMCID: PMC8861357 DOI: 10.3389/fonc.2022.841630] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
In chronic lymphocytic leukemia (CLL), TP53 abnormalities are associated with reduced survival and resistance to chemoimmunotherapy (CIT). The recommended threshold to clinically report TP53 mutations is a matter of debate given that next-generation sequencing technologies can detect mutations with a limit of detection of approximately 1% with high confidence. However, the clinical impact of low-burden TP53 mutations with a variant allele frequency (VAF) of less than 10% remains unclear. Longitudinal analysis before and after fludarabine based on NGS sequencing demonstrated that low-burden TP53 mutations were present before the onset of treatment and expanded at relapse to become the predominant clone. Most studies evaluating the prognostic or predictive impact of low-burden TP53 mutations in untreated patients show that low-burden TP53 mutations have the same unfavorable prognostic impact as clonal defects. Moreover, studies designed to assess the predictive impact of low-burden TP53 mutations showed that TP53 mutations, irrespective of mutation burden, have an inferior impact on overall survival for CIT-treated patients. As low-burden and high-burden TP53 mutations have comparable clinical impacts, redefining the VAF threshold may have important implications for the clinical management of CLL.
Collapse
Affiliation(s)
- Gregory Lazarian
- Service d'Hématologie Biologique, Hôpital Avicenne, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Florence Cymbalista
- Service d'Hématologie Biologique, Hôpital Avicenne, Assistance Publique des Hôpitaux de Paris, Paris, France
| | - Fanny Baran-Marszak
- Service d'Hématologie Biologique, Hôpital Avicenne, Assistance Publique des Hôpitaux de Paris, Paris, France
| |
Collapse
|
62
|
Biezuner T, Brilon Y, Arye AB, Oron B, Kadam A, Danin A, Furer N, Minden MD, Hwan Kim DD, Shapira S, Arber N, Dick J, Thavendiranathan P, Moskovitz Y, Kaushansky N, Chapal-Ilani N, Shlush LI. An improved molecular inversion probe based targeted sequencing approach for low variant allele frequency. NAR Genom Bioinform 2022; 4:lqab125. [PMID: 35156021 PMCID: PMC8826764 DOI: 10.1093/nargab/lqab125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 11/25/2021] [Accepted: 01/25/2022] [Indexed: 11/23/2022] Open
Abstract
Deep targeted sequencing technologies are still not widely used in clinical practice due to the complexity of the methods and their cost. The Molecular Inversion Probes (MIP) technology is cost effective and scalable in the number of targets, however, suffers from low overall performance especially in GC rich regions. In order to improve the MIP performance, we sequenced a large cohort of healthy individuals (n = 4417), with a panel of 616 MIPs, at high depth in duplicates. To improve the previous state-of-the-art statistical model for low variant allele frequency, we selected 4635 potentially positive variants and validated them using amplicon sequencing. Using machine learning prediction tools, we significantly improved precision of 10–56.25% (P < 0.0004) to detect variants with VAF > 0.005. We further developed biochemically modified MIP protocol and improved its turn-around-time to ∼4 h. Our new biochemistry significantly improved uniformity, GC-Rich regions coverage, and enabled 95% on target reads in a large MIP panel of 8349 genomic targets. Overall, we demonstrate an enhancement of the MIP targeted sequencing approach in both detection of low frequency variants and in other key parameters, paving its way to become an ultrafast cost-effective research and clinical diagnostic tool.
Collapse
Affiliation(s)
- Tamir Biezuner
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Yardena Brilon
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Asaf Ben Arye
- Department of Statistics and Operations Research, Tel Aviv University, Ramat Aviv, Israel
| | - Barak Oron
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Aditee Kadam
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Adi Danin
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Nili Furer
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Medical Oncology & Hematology, Toronto, ON, Canada
| | - Dennis Dong Hwan Kim
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Medical Oncology & Hematology, Toronto, ON, Canada
| | | | | | - John Dick
- Princess Margaret Cancer Centre, University Health Network (UHN), Department of Molecular Genetics, Toronto, ON, Canada
| | - Paaladinesh Thavendiranathan
- Department of Medicine, Division of Cardiology, Ted Rogers Program in Cardiotoxicity Prevention, Peter Munk Cardiac Center, Toronto General Hospital, University Health Network, University of Toronto, Toronto, ON, Canada
| | - Yoni Moskovitz
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Nathali Kaushansky
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Noa Chapal-Ilani
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| | - Liran I Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot 761001, Israel
| |
Collapse
|
63
|
Medeiros JJF, Capo-Chichi JM, Shlush LI, Dick JE, Arruda A, Minden MD, Abelson S. SmMIP-tools: a computational toolset for processing and analysis of single-molecule molecular inversion probes-derived data. Bioinformatics 2022; 38:2088-2095. [PMID: 35150236 PMCID: PMC9004652 DOI: 10.1093/bioinformatics/btac081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 01/13/2022] [Accepted: 02/07/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-molecule molecular inversion probes (smMIPs) provide an exceptionally cost-effective and modular approach for routine or large-cohort next-generation sequencing. However, processing the derived raw data to generate highly accurate variants calls remains challenging. RESULTS We introduce SmMIP-tools, a comprehensive computational method that promotes the detection of single nucleotide variants and short insertions and deletions from smMIP-based sequencing. Our approach delivered near-perfect performance when benchmarked against a set of known mutations in controlled experiments involving DNA dilutions and outperformed other commonly used computational methods for mutation detection. Comparison against clinically approved diagnostic testing of leukaemia patients demonstrated the ability to detect both previously reported variants and a set of pathogenic mutations that did not pass detection by clinical testing. Collectively, our results indicate that increased performance can be achieved when tailoring data processing and analysis to its related technology. The feasibility of using our method in research and clinical settings to benefit from low-cost smMIP technology is demonstrated. AVAILABILITY AND IMPLEMENTATION The source code for SmMIP-tools, its manual and additional scripts aimed to foster large-scale data processing and analysis are all available on github (https://github.com/abelson-lab/smMIP-tools). Raw sequencing data generated in this study have been submitted to the European Genome-Phenome Archive (EGA; https://ega-archive.org) and can be accessed under accession number EGAS00001005359. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jessie J F Medeiros
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada,Ontario Institute for Cancer Research, Toronto, ON, Canada,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Jose-Mario Capo-Chichi
- Genome Diagnostics, Department of Clinical Laboratory Genetics, University Health Network, Toronto, ON, Canada
| | - Liran I Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - John E Dick
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Andrea Arruda
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada,Department of Hematology and Medical Oncology, University Health Network, Toronto, ON, Canada
| | | |
Collapse
|
64
|
Ventolero MF, Wang S, Hu H, Li X. Computational analyses of bacterial strains from shotgun reads. Brief Bioinform 2022; 23:6524011. [PMID: 35136954 DOI: 10.1093/bib/bbac013] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open
Abstract
Shotgun sequencing is routinely employed to study bacteria in microbial communities. With the vast amount of shotgun sequencing reads generated in a metagenomic project, it is crucial to determine the microbial composition at the strain level. This study investigated 20 computational tools that attempt to infer bacterial strain genomes from shotgun reads. For the first time, we discussed the methodology behind these tools. We also systematically evaluated six novel-strain-targeting tools on the same datasets and found that BHap, mixtureS and StrainFinder performed better than other tools. Because the performance of the best tools is still suboptimal, we discussed future directions that may address the limitations.
Collapse
Affiliation(s)
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.,Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
65
|
Brady SW, Gout AM, Zhang J. Therapeutic and prognostic insights from the analysis of cancer mutational signatures. Trends Genet 2022; 38:194-208. [PMID: 34483003 PMCID: PMC8752466 DOI: 10.1016/j.tig.2021.08.007] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 08/06/2021] [Accepted: 08/11/2021] [Indexed: 02/08/2023]
Abstract
The somatic mutations in each cancer genome are caused by multiple mutational processes, each of which leaves a characteristic imprint (or 'signature'), potentially caused by specific etiologies or exposures. Deconvolution of these signatures offers a glimpse into the evolutionary history of individual tumors. Recent work has shown that mutational signatures may also yield therapeutic and prognostic insights, including the identification of cell-intrinsic signatures as biomarkers of drug response and prognosis. For example, mutational signatures indicating homologous recombination deficiency are associated with poly(ADP)-ribose polymerase (PARP) inhibitor sensitivity, whereas APOBEC-associated signatures are associated with ataxia telangiectasia and Rad3-related kinase (ATR) inhibitor sensitivity. Furthermore, therapy-induced mutational signatures implicated in cancer progression have also been uncovered, including the identification of thiopurine-induced TP53 mutations in leukemia. In this review, we explore the various ways mutational signatures can reveal new therapeutic and prognostic insights, thus extending their traditional role in identifying disease etiology.
Collapse
Affiliation(s)
- Samuel W Brady
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
| | - Alexander M Gout
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA.
| |
Collapse
|
66
|
Sharma A, Jain P, Mahgoub A, Zhou Z, Mahadik K, Chaterji S. Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing. BMC Bioinformatics 2022; 23:25. [PMID: 34991450 PMCID: PMC8734100 DOI: 10.1186/s12859-021-04547-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Accepted: 12/20/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sequencing technologies are prone to errors, making error correction (EC) necessary for downstream applications. EC tools need to be manually configured for optimal performance. We find that the optimal parameters (e.g., k-mer size) are both tool- and dataset-dependent. Moreover, evaluating the performance (i.e., Alignment-rate or Gain) of a given tool usually relies on a reference genome, but quality reference genomes are not always available. We introduce Lerna for the automated configuration of k-mer-based EC tools. Lerna first creates a language model (LM) of the uncorrected genomic reads, and then, based on this LM, calculates a metric called the perplexity metric to evaluate the corrected reads for different parameter choices. Next, it finds the one that produces the highest alignment rate without using a reference genome. The fundamental intuition of our approach is that the perplexity metric is inversely correlated with the quality of the assembly after error correction. Therefore, Lerna leverages the perplexity metric for automated tuning of k-mer sizes without needing a reference genome. RESULTS First, we show that the best k-mer value can vary for different datasets, even for the same EC tool. This motivates our design that automates k-mer size selection without using a reference genome. Second, we show the gains of our LM using its component attention-based transformers. We show the model's estimation of the perplexity metric before and after error correction. The lower the perplexity after correction, the better the k-mer size. We also show that the alignment rate and assembly quality computed for the corrected reads are strongly negatively correlated with the perplexity, enabling the automated selection of k-mer values for better error correction, and hence, improved assembly quality. We validate our approach on both short and long reads. Additionally, we show that our attention-based models have significant runtime improvement for the entire pipeline-18[Formula: see text] faster than previous works, due to parallelizing the attention mechanism and the use of JIT compilation for GPU inferencing. CONCLUSION Lerna improves de novo genome assembly by optimizing EC tools. Our code is made available in a public repository at: https://github.com/icanforce/lerna-genomics .
Collapse
Affiliation(s)
| | - Pranjal Jain
- Indian Institute of Technology Bombay, Mumbai, India
| | | | | | | | | |
Collapse
|
67
|
AlJanahi AA, Lazzarotto CR, Chen S, Shin TH, Cordes S, Fan X, Jabara I, Zhou Y, Young DJ, Lee BC, Yu KR, Li Y, Toms B, Tunc I, Hong SG, Truitt LL, Klermund J, Andrieux G, Kim MY, Cathomen T, Gill S, Tsai SQ, Dunbar CE. Prediction and validation of hematopoietic stem and progenitor cell off-target editing in transplanted rhesus macaques. Mol Ther 2022; 30:209-222. [PMID: 34174439 PMCID: PMC8753565 DOI: 10.1016/j.ymthe.2021.06.016] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 03/18/2021] [Accepted: 06/21/2021] [Indexed: 01/07/2023] Open
Abstract
The programmable nuclease technology CRISPR-Cas9 has revolutionized gene editing in the last decade. Due to the risk of off-target editing, accurate and sensitive methods for off-target characterization are crucial prior to applying CRISPR-Cas9 therapeutically. Here, we utilized a rhesus macaque model to compare the predictive values of CIRCLE-seq, an in vitro off-target prediction method, with in silico prediction (ISP) based solely on genomic sequence comparisons. We use AmpliSeq HD error-corrected sequencing to validate off-target sites predicted by CIRCLE-seq and ISP for a CD33 guide RNA (gRNA) with thousands of off-target sites predicted by ISP and CIRCLE-seq. We found poor correlation between the sites predicted by the two methods. When almost 500 sites predicted by each method were analyzed by error-corrected sequencing of hematopoietic cells following transplantation, 19 off-target sites revealed insertion or deletion mutations. Of these sites, 8 were predicted by both methods, 8 by CIRCLE-seq only, and 3 by ISP only. The levels of cells with these off-target edits exhibited no expansion or abnormal behavior in vivo in animals followed for up to 2 years. In addition, we utilized an unbiased method termed CAST-seq to search for translocations between the on-target site and off-target sites present in animals following transplantation, detecting one specific translocation that persisted in blood cells for at least 1 year following transplantation. In conclusion, neither CIRCLE-seq or ISP predicted all sites, and a combination of careful gRNA design, followed by screening for predicted off-target sites in target cells by multiple methods, may be required for optimizing safety of clinical development.
Collapse
Affiliation(s)
- Aisha A AlJanahi
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA; Department of Biochemistry and Molecular Biology, Georgetown University, Washington, DC 20057, USA
| | - Cicera R Lazzarotto
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Shirley Chen
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Tae-Hoon Shin
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Stefan Cordes
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Xing Fan
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Isabel Jabara
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Yifan Zhou
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - David J Young
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Byung-Chul Lee
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Kyung-Rok Yu
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA; Department of Agricultural Biotechnology and Research Institute of Agriculture and Life Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Yuesheng Li
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | | | - Ilker Tunc
- Bioinformatics and Computational Biology Laboratory, NHLBI, NIH, Bethesda, MD 20892, USA
| | - So Gun Hong
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Lauren L Truitt
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA
| | - Julia Klermund
- Institute for Transfusion Medicine and Gene Therapy, Medical Center - University of Freiburg, 79106 Freiburg, Germany; Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Geoffroy Andrieux
- Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany; Institute of Medical Bioinformatics and Systems Medicine, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany; German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Partner Site Freiburg, 79106 Freiburg, Germany
| | - Miriam Y Kim
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA; Department of Medicine, Division of Oncology, Washington University School of Medicine, Saint Louis, MO 63110, USA
| | - Toni Cathomen
- Institute for Transfusion Medicine and Gene Therapy, Medical Center - University of Freiburg, 79106 Freiburg, Germany; Faculty of Medicine, University of Freiburg, 79110 Freiburg, Germany
| | - Saar Gill
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shengdar Q Tsai
- Department of Hematology, St. Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Cynthia E Dunbar
- Translational Stem Cell Biology Branch, NHLBI, NIH, Building 10-CRC, 5E-3332, 9000 Rockville Pike, Bethesda, MD 20892, USA.
| |
Collapse
|
68
|
Abstract
Motivation Phage–host associations play important roles in microbial communities. But in natural communities, as opposed to culture-based lab studies where phages are discovered and characterized metagenomically, their hosts are generally not known. Several programs have been developed for predicting which phage infects which host based on various sequence similarity measures or machine learning approaches. These are often based on whole viral and host genomes, but in metagenomics-based studies, we rarely have whole genomes but rather must rely on contigs that are sometimes as short as hundreds of bp long. Therefore, we need programs that predict hosts of phage contigs on the basis of these short contigs. Although most existing programs can be applied to metagenomic datasets for these predictions, their accuracies are generally low. Here, we develop ContigNet, a convolutional neural network-based model capable of predicting phage–host matches based on relatively short contigs, and compare it to previously published VirHostMatcher (VHM) and WIsH. Results On the validation set, ContigNet achieves 72–85% area under the receiver operating characteristic curve (AUROC) scores, compared to the maximum of 68% by VHM or WIsH for contigs of lengths between 200 bps to 50 kbps. We also apply the model to the Metagenomic Gut Virus (MGV) catalogue, a dataset containing a wide range of draft genomes from metagenomic samples and achieve 60–70% AUROC scores compared to that of VHM and WIsH of 52%. Surprisingly, ContigNet can also be used to predict plasmid-host contig associations with high accuracy, indicating a similar genetic exchange between mobile genetic elements and their hosts. Availability and implementation The source code of ContigNet and related datasets can be downloaded from https://github.com/tianqitang1/ContigNet.
Collapse
Affiliation(s)
- Tianqi Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Fengzhu Sun
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
69
|
A Retrospective Statistical Validation Approach for Panel of Normal-Based Single-Nucleotide Variant Detection in Tumor Sequencing. J Mol Diagn 2022; 24:41-47. [PMID: 34974877 DOI: 10.1016/j.jmoldx.2021.09.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2021] [Revised: 08/28/2021] [Accepted: 09/28/2021] [Indexed: 11/22/2022] Open
Abstract
An important step of somatic variant calling algorithms for deep sequencing data is quantifying the errors. For targeted sequencing in which hotspot mutations are of interest, site-specific error estimation allows more accurate calling. The site-specific error rates are often estimated from a panel of normal samples, which has limited size and is subject to sampling bias and variance. We propose a novel statistical validation method for single-nucleotide variation (SNV) calling based on historical data. The validation method extracts the high-quality reads from the Binary Alignment/Map (BAM) files, finds the negative samples in the data, and builds a statistical model to call individual samples. It is particularly useful in detecting low-frequency variants that may be missed by traditional panel of normal-based SNV methods. The proposed method makes it possible to launch a simple and parallel validation pipeline for SNV calling and improve the detection limit.
Collapse
|
70
|
Yang SA, Salazar JL, Li-Kroeger D, Yamamoto S. Functional Studies of Genetic Variants Associated with Human Diseases in Notch Signaling-Related Genes Using Drosophila. Methods Mol Biol 2022; 2472:235-276. [PMID: 35674905 PMCID: PMC9396741 DOI: 10.1007/978-1-0716-2201-8_19] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Rare variants in the many genes related to Notch signaling cause diverse Mendelian diseases that affect myriad organ systems. In addition, genome- and exome-wide association studies have linked common and rare variants in Notch-related genes to common diseases and phenotypic traits. Moreover, somatic mutations in these genes have been observed in many types of cancer, some of which are classified as oncogenic and others as tumor suppressive. While functional characterization of some of these variants has been performed through experimental studies, the number of "variants of unknown significance" identified in patients with diverse conditions keeps increasing as high-throughput sequencing technologies become more commonly used in the clinic. Furthermore, as disease gene discovery efforts identify rare variants in human genes that have yet to be linked to a disease, the demand for functional characterization of variants in these "genes of unknown significance" continues to increase. In this chapter, we describe a workflow to functionally characterize a rare variant in a Notch signaling related gene that was found to be associated with late-onset Alzheimer's disease. This pipeline involves informatic analysis of the variant of interest using diverse human and model organism databases, followed by in vivo experiments in the fruit fly Drosophila melanogaster. The protocol described here can be used to study variants that affect amino acids that are not conserved between human and fly. By "humanizing" the almondex gene in Drosophila with mutant alleles and heterologous genomic rescue constructs, a missense variant in TM2D3 (TM2 Domain Containing 3) was shown to be functionally damaging. This, and similar approaches, greatly facilitate functional interpretations of genetic variants in the human genome and propel personalized medicine.
Collapse
Affiliation(s)
- Sheng-An Yang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA
| | - Jose L Salazar
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA
| | - David Li-Kroeger
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA.
- Department of Neurology, Baylor College of Medicine, Houston, TX, USA.
| | - Shinya Yamamoto
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
- Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, USA.
- Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA.
- Development, Disease Models and Therapeutics Graduate Program, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
71
|
Domogala DD, Gambin T, Zemet R, Wu CW, Schulze KV, Yang Y, Wilson TA, Machol I, Liu P, Stankiewicz P. Detection of low-level parental somatic mosaicism for clinically relevant SNVs and indels identified in a large exome sequencing dataset. Hum Genomics 2021; 15:72. [PMID: 34930489 PMCID: PMC8686574 DOI: 10.1186/s40246-021-00369-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 11/27/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Due to the limitations of the current routine diagnostic methods, low-level somatic mosaicism with variant allele fraction (VAF) < 10% is often undetected in clinical settings. To date, only a few studies have attempted to analyze tissue distribution of low-level parental mosaicism in a large clinical exome sequencing (ES) cohort. METHODS Using a customized bioinformatics pipeline, we analyzed apparent de novo single-nucleotide variants or indels identified in the affected probands in ES trio data at Baylor Genetics clinical laboratories. Clinically relevant variants with VAFs between 30 and 70% in probands and lower than 10% in one parent were studied. DNA samples extracted from saliva, buccal cells, redrawn peripheral blood, urine, hair follicles, and nail, representing all three germ layers, were tested using PCR amplicon next-generation sequencing (amplicon NGS) and droplet digital PCR (ddPCR). RESULTS In a cohort of 592 clinical ES trios, we found 61 trios, each with one parent suspected of low-level mosaicism. In 21 parents, the variants were validated using amplicon NGS and seven of them by ddPCR in peripheral blood DNA samples. The parental VAFs in blood samples varied between 0.08 and 9%. The distribution of VAFs in additional tissues ranged from 0.03% in hair follicles to 9% in re-drawn peripheral blood. CONCLUSIONS Our study illustrates the importance of analyzing ES data using sensitive computational and molecular methods for low-level parental somatic mosaicism for clinically relevant variants previously diagnosed in routine clinical diagnostics as apparent de novo.
Collapse
Affiliation(s)
- Daniel D Domogala
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.,Graduate Program in Diagnostic Genetics, School of Health Professions, University of Texas at MD Anderson, Houston, TX, USA
| | - Tomasz Gambin
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.,Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
| | - Roni Zemet
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | - Chung Wah Wu
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.,Baylor Genetics, Houston, TX, USA
| | - Katharina V Schulze
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.,Baylor Genetics, Houston, TX, USA
| | - Yaping Yang
- AiLife Diagnostics, 1920 Country Place Pkwy Suite 100, Pearland, TX, USA
| | - Theresa A Wilson
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA
| | | | - Pengfei Liu
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.,Baylor Genetics, Houston, TX, USA
| | - Paweł Stankiewicz
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX, 77030, USA.
| |
Collapse
|
72
|
Lin LH, Chou CH, Cheng HW, Chang KW, Liu CJ. Precise Identification of Recurrent Somatic Mutations in Oral Cancer Through Whole-Exome Sequencing Using Multiple Mutation Calling Pipelines. Front Oncol 2021; 11:741626. [PMID: 34912705 PMCID: PMC8666431 DOI: 10.3389/fonc.2021.741626] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/11/2021] [Indexed: 01/18/2023] Open
Abstract
Understanding the genomic alterations in oral carcinogenesis remains crucial for the appropriate diagnosis and treatment of oral squamous cell carcinoma (OSCC). To unveil the mutational spectrum, in this study, we conducted whole-exome sequencing (WES), using six mutation calling pipelines and multiple filtering criteria applied to 50 paired OSCC samples. The tumor mutation burden extracted from the data set of somatic variations was significantly associated with age, tumor staging, and survival. Several genes (MUC16, MUC19, KMT2D, TTN, HERC2) with a high frequency of false positive mutations were identified. Moreover, known (TP53, FAT1, EPHA2, NOTCH1, CASP8, and PIK3CA) and novel (HYDIN, ALPK3, ASXL1, USP9X, SKOR2, CPLANE1, STARD9, and NSD2) genes have been found to be significantly and frequently mutated in OSCC. Further analysis of gene alteration status with clinical parameters revealed that canonical pathways, including clathrin-mediated endocytotic signaling, NFκB signaling, PEDF signaling, and calcium signaling were associated with OSCC prognosis. Defining a catalog of targetable genomic alterations showed that 58% of the tumors carried at least one aberrant event that may potentially be targeted by approved therapeutic agents. We found molecular OSCC subgroups which were correlated with etiology and prognosis while defining the landscape of major altered events in the coding regions of OSCC genomes. These findings provide information that will be helpful in the design of clinical trials on targeted therapies and in the stratification of patients with OSCC according to therapeutic efficacy.
Collapse
Affiliation(s)
- Li-Han Lin
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Chung-Hsien Chou
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Hui-Wen Cheng
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Kuo-Wei Chang
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei, Taiwan.,Department of Stomatology, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Chung-Ji Liu
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan.,Department of Oral and Maxillofacial Surgery, Taipei MacKay Memorial Hospital, Taipei, Taiwan
| |
Collapse
|
73
|
Mittelstrass J, Sperone FG, Horton MW. Using transects to disentangle the environmental drivers of plant-microbiome assembly. PLANT, CELL & ENVIRONMENT 2021; 44:3515-3525. [PMID: 34562029 PMCID: PMC9292149 DOI: 10.1111/pce.14190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 09/07/2021] [Accepted: 09/14/2021] [Indexed: 06/13/2023]
Abstract
Environmental heterogeneity is a major driver of plant-microbiome assembly, but the specific climate and soil conditions that are involved remain poorly understood. To better understand plant microbiome formation, we examined the bacteria and fungi that colonize wild strawberry (Fragaria vesca) plants in North American and European populations. Using transects as replicates, we found strong overlap among the environmental conditions that best predict the overall similarity and richness of the plant microbiome, including soil nutrients that replicate across continents. Temperature is also among the main predictors of diversity for both bacteria and fungi in both the leaf and, unexpectedly, the root microbiome. Our results indicate that a small number of environmental factors, and their interactions, consistently contribute to plant microbiome formation, which has implications for predicting the contributions of microbes to plant productivity in ever-changing environments.
Collapse
Affiliation(s)
- Jana Mittelstrass
- Department of Plant and Microbial BiologyUniversity of ZurichZurichSwitzerland
| | - F. Gianluca Sperone
- Department of Environmental Science and GeologyWayne State UniversityDetroitMichiganUSA
| | - Matthew W. Horton
- Department of Plant and Microbial BiologyUniversity of ZurichZurichSwitzerland
| |
Collapse
|
74
|
Garushyants SK, Rogozin IB, Koonin EV. Template switching and duplications in SARS-CoV-2 genomes give rise to insertion variants that merit monitoring. Commun Biol 2021; 4:1343. [PMID: 34848826 PMCID: PMC8632935 DOI: 10.1038/s42003-021-02858-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 11/01/2021] [Indexed: 12/29/2022] Open
Abstract
The appearance of multiple new SARS-CoV-2 variants during the COVID-19 pandemic is a matter of grave concern. Some of these variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the Spike (S) protein is thought to be a determinant of SARS-CoV-2 virulence. Here, we identify 346 unique inserts of different lengths in SARS-CoV-2 genomes and present evidence that these inserts reflect actual virus variance rather than sequencing artifacts. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. At least three inserts in the N-terminal domain of the S protein are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity. Thus, inserts in the S protein can affect its antigenic properties and merit monitoring.
Collapse
Affiliation(s)
- Sofya K Garushyants
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
75
|
Willey JC, Morrison TB, Austermiller B, Crawford EL, Craig DJ, Blomquist TM, Jones WD, Wali A, Lococo JS, Haseley N, Richmond TA, Novoradovskaya N, Kusko R, Chen G, Li QZ, Johann DJ, Deveson IW, Mercer TR, Wu L, Xu J. Advancing NGS quality control to enable measurement of actionable mutations in circulating tumor DNA. CELL REPORTS METHODS 2021; 1:100106. [PMID: 35475002 PMCID: PMC9017191 DOI: 10.1016/j.crmeth.2021.100106] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 05/31/2021] [Accepted: 10/11/2021] [Indexed: 11/25/2022]
Abstract
The primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next-generation sequencing (NGS) method that will enable more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. To accomplish this, a synthetic internal standard spike-in was designed for each actionable mutation target, suitable for use in NGS following hybrid capture enrichment and unique molecular index (UMI) or non-UMI library preparation. When mixed with contrived ctDNA reference samples, internal standards enabled calculation of technical error rate, limit of blank, and limit of detection for each variant at each nucleotide position in each sample. True-positive mutations with variant allele fraction too low for detection by current practice were detected with this method, thereby increasing sensitivity.
Collapse
Affiliation(s)
- James C. Willey
- College of Medicine and Life Sciences, University of Toledo, Toledo, OH 43614, USA
| | - Tom B. Morrison
- AccuGenomics Inc., The Atrium, Suite 105, 1410 Commonwealth Drive, Wilmington, NC 28403, USA
| | - Bradley Austermiller
- AccuGenomics Inc., The Atrium, Suite 105, 1410 Commonwealth Drive, Wilmington, NC 28403, USA
| | - Erin L. Crawford
- College of Medicine and Life Sciences, University of Toledo, Toledo, OH 43614, USA
| | - Daniel J. Craig
- College of Medicine and Life Sciences, University of Toledo, Toledo, OH 43614, USA
| | - Thomas M. Blomquist
- College of Medicine and Life Sciences, University of Toledo, Toledo, OH 43614, USA
| | | | - Aminah Wali
- Q Solutions, EA Genomics, Morrisville, NC 27560, USA
| | | | - Nathan Haseley
- Illumina Inc., 5200 Illumina Way, San Diego, CA 92122, USA
| | | | | | | | - Guangchun Chen
- University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Quan-Zhen Li
- University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Donald J. Johann
- Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, 4301 W Markham Street, Little Rock, AR 72205, USA
| | - Ira W. Deveson
- Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- St. Vincent’s Clinical School, University of New South Wales, Sydney, NSW 2010, Australia
| | - Timothy R. Mercer
- Garvan Institute of Medical Research, Sydney, NSW 2010, Australia
- St. Vincent’s Clinical School, University of New South Wales, Sydney, NSW 2010, Australia
| | - Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR 72079, USA
| |
Collapse
|
76
|
Porath‐Krause A, Strauss AT, Henning JA, Seabloom EW, Borer ET. Pitfalls and pointers: An accessible guide to marker gene amplicon sequencing in ecological applications. Methods Ecol Evol 2021. [DOI: 10.1111/2041-210x.13764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Affiliation(s)
- Anita Porath‐Krause
- Department of Ecology, Evolution, and Behavior University of Minnesota St. Paul MN USA
| | - Alexander T. Strauss
- Department of Ecology, Evolution, and Behavior University of Minnesota St. Paul MN USA
| | - Jeremiah A. Henning
- Department of Ecology, Evolution, and Behavior University of Minnesota St. Paul MN USA
| | - Eric W. Seabloom
- Department of Ecology, Evolution, and Behavior University of Minnesota St. Paul MN USA
| | - Elizabeth T. Borer
- Department of Ecology, Evolution, and Behavior University of Minnesota St. Paul MN USA
| |
Collapse
|
77
|
Cowan RW, Pratt ED, Kang JM, Zhao J, Wilhelm JJ, Abdulla M, Qiao EM, Brennan LP, Ulintz PJ, Bellin MD, Rhim AD. Pancreatic Cancer-Related Mutational Burden Is Not Increased in a Patient Cohort With Clinically Severe Chronic Pancreatitis. Clin Transl Gastroenterol 2021; 12:e00431. [PMID: 34797250 PMCID: PMC8604013 DOI: 10.14309/ctg.0000000000000431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Accepted: 08/30/2021] [Indexed: 11/30/2022] Open
Abstract
INTRODUCTION Chronic pancreatitis is associated with an increased risk of developing pancreatic cancer, and patients with inherited forms of pancreatitis are at greatest risk. We investigated whether clinical severity of pancreatitis could also be an indicator of cancer risk independent of etiology by performing targeted DNA sequencing to assess the mutational burden in 55 cancer-associated genes. METHODS Using picodroplet digital polymerase chain reaction and next-generation sequencing, we reported the genomic profiles of pancreases from severe clinical cases of chronic pancreatitis that necessitated palliative total pancreatectomy with islet autotransplantation. RESULTS We assessed 57 tissue samples from 39 patients with genetic and idiopathic etiologies and found that despite the clinical severity of disease, there was no corresponding increase in mutational burden. The average allele frequency of somatic variants was 1.19% (range 1.00%-5.97%), and distinct regions from the same patient displayed genomic heterogeneity, suggesting that these variants are subclonal. Few oncogenic KRAS mutations were discovered (7% of all samples), although we detected evidence of frequent cancer-related variants in other genes such as TP53, CDKN2A, and SMAD4. Of note, tissue samples with oncogenic KRAS mutations and samples from patients with PRSS1 mutations harbored an increased total number of somatic variants, suggesting that these patients may have increased genomic instability and could be at an increased risk of developing pancreatic cancer. DISCUSSION Overall, we showed that even in those patients with chronic pancreatitis severe enough to warrant total pancreatectomy with islet autotransplantation, pancreatic cancer-related mutational burden is not appreciably increased.
Collapse
Affiliation(s)
- Robert W. Cowan
- Ahmed Cancer Center for Pancreatic Cancer Research, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
- Department of Gastroenterology, Hepatology & Nutrition, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
| | - Erica D. Pratt
- Ahmed Cancer Center for Pancreatic Cancer Research, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
- Department of Gastroenterology, Hepatology & Nutrition, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
| | - Jin Muk Kang
- Ahmed Cancer Center for Pancreatic Cancer Research, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
- Department of Gastroenterology, Hepatology & Nutrition, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
| | - Jun Zhao
- Ahmed Cancer Center for Pancreatic Cancer Research, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
- Department of Translational Molecular Pathology, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA
| | - Joshua J. Wilhelm
- Department of Pediatrics, University of Minnesota Medical School, Minneapolis, Minnesota, USA;
- Department of Surgery, Schulze Diabetes Institute, University of Minnesota, Minneapolis, Minnesota, USA;
| | - Muhamad Abdulla
- Department of Surgery, Schulze Diabetes Institute, University of Minnesota, Minneapolis, Minnesota, USA;
| | - Edmund M. Qiao
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA;
| | - Luke P. Brennan
- University of Michigan Medical School, Ann Arbor, Michigan, USA;
| | - Peter J. Ulintz
- Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, USA;
- BRCF Bioinformatics Core, University of Michigan, Ann Arbor, Michigan, USA.
| | - Melena D. Bellin
- Department of Pediatrics, University of Minnesota Medical School, Minneapolis, Minnesota, USA;
- Department of Surgery, Schulze Diabetes Institute, University of Minnesota, Minneapolis, Minnesota, USA;
| | - Andrew D. Rhim
- Ahmed Cancer Center for Pancreatic Cancer Research, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
- Department of Gastroenterology, Hepatology & Nutrition, MD Anderson Cancer Center, University of Texas, Houston, Texas, USA;
| |
Collapse
|
78
|
Bieler J, Pozzorini C, Garcia J, Tuck AC, Macheret M, Willig A, Couraud S, Xing X, Menu P, Steinmetz LM, Payen L, Xu Z. High-Throughput Nucleotide Resolution Predictions of Assay Limitations Increase the Reliability and Concordance of Clinical Tests. JCO Clin Cancer Inform 2021; 5:1085-1095. [PMID: 34731027 DOI: 10.1200/cci.21.00057] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
PURPOSE The ability of next-generation sequencing (NGS) assays to interrogate thousands of genomic loci has revolutionized genetic testing. However, translation to the clinic is impeded by false-negative results that pose a risk to patients. In response, regulatory bodies are calling for reliability measures to be reported alongside NGS results. Existing methods to estimate reliability do not account for sample- and position-specific variability, which can be significant. Here, we report an approach that computes reliability metrics for every genomic position and sample interrogated by an NGS assay. METHODS Our approach predicts the limit of detection (LOD), the lowest reliably detectable variant fraction, by taking technical factors into account. We initially explored how LOD is affected by input material amount, library conversion rate, sequencing coverage, and sequencing error rate. This revealed that LOD depends heavily on genomic context and sample properties. Using these insights, we developed a computational approach to predict LOD on the basis of a biophysical model of the NGS workflow. We focused on targeted assays for cell-free DNA, but, in principle, this approach applies to any NGS assay. RESULTS We validated our approach by showing that it accurately predicts LOD and distinguishes reliable from unreliable results when screening 580 lung cancer samples for actionable mutations. Compared with a standard variant calling workflow, our approach avoided most false negatives and improved interassay concordance from 94% to 99%. CONCLUSION Our approach, which we name LAVA (LOD-aware variant analysis), reports the LOD for every position and sample interrogated by an NGS assay. This enables reliable results to be identified and improves the transparency and safety of genetic tests.
Collapse
Affiliation(s)
| | | | - Jessica Garcia
- Laboratoire de Biochimie et Biologie Moléculaire, Centre Hospitalier Lyon Sud, Hospices Civils de Lyon, Pierre Bénite, France.,Institut de Cancérologie des Hospices Civils de Lyon, CIRculating CANcer Program (CIRCAN), Lyon, France
| | - Alex C Tuck
- SOPHiA GENETICS SA, Saint Sulpice, Switzerland
| | | | | | - Sébastien Couraud
- Institut de Cancérologie des Hospices Civils de Lyon, CIRculating CANcer Program (CIRCAN), Lyon, France.,Service de Pneumologie aigue spécialisée et cancérologie thoracique, Groupement hospitalier sud, Institut de Cancérologie des Hospices Civils de Lyon, Pierre Bénite, France
| | | | | | - Lars M Steinmetz
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA.,Department of Genetics, School of Medicine, Stanford University, Stanford, CA
| | - Léa Payen
- Laboratoire de Biochimie et Biologie Moléculaire, Centre Hospitalier Lyon Sud, Hospices Civils de Lyon, Pierre Bénite, France.,Institut de Cancérologie des Hospices Civils de Lyon, CIRculating CANcer Program (CIRCAN), Lyon, France
| | - Zhenyu Xu
- SOPHiA GENETICS SA, Saint Sulpice, Switzerland
| |
Collapse
|
79
|
Gallbladder Cancer: Current Insights in Genetic Alterations and Their Possible Therapeutic Implications. Cancers (Basel) 2021; 13:cancers13215257. [PMID: 34771420 PMCID: PMC8582530 DOI: 10.3390/cancers13215257] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 10/14/2021] [Accepted: 10/18/2021] [Indexed: 12/29/2022] Open
Abstract
Simple Summary Knowledge of genetic alterations in gallbladder cancer (GBC) continues to increase. This systematic review provides an overview of frequently occurring genetic alterations in GBC and describes their possible therapeutic implications. We detected three frequently (>5%) altered genes (ATM, ERBB2 and PIK3CA) for which targeted therapies are available in other cancer types. For solid cancers with microsatellite instability or a high tumor mutational burden pembrolizumab is FDA-approved. Altogether, these five biomarkers might be used in future molecular panels to enable precision medicine for patients with GBC. We found only nine clinical trials evaluating targeted therapies in GBC directed at frequently altered genes (ERBB2, ARID1A, ATM and KRAS). This underlines the challenges to perform such clinical trials in this rare, heterogeneous cancer type and emphasizes the need for multicenter clinical trials. Abstract Due to the fast progression in molecular technologies such as next-generation sequencing, knowledge of genetic alterations in gallbladder cancer (GBC) increases. This systematic review provides an overview of frequently occurring genetic alterations occurring in GBC and their possible therapeutic implications. A literature search was performed utilizing PubMed, EMBASE, Cochrane Library, and Web of Science. Only studies reporting genetic alterations in human GBC were included. In total, data were extracted from 62 articles, describing a total of 3893 GBC samples. Frequently detected genetic alterations (>5% in >5 samples across all studies) in GBC for which targeted therapies are available in other cancer types included mutations in ATM, ERBB2, and PIK3CA, and ERBB2 amplifications. High tumor mutational burden (TMB-H) and microsatellite instability (MSI-H) were infrequently observed in GBC (1.7% and 3.5%, respectively). For solid cancers with TMB-H or MSI-H pembrolizumab is FDA-approved and shows an objective response rates of 50% for TMB-H GBC and 41% for MSI-H biliary tract cancer. Only nine clinical trials evaluated targeted therapies in GBC directed at frequently altered genes (ERBB2, ARID1A, ATM, and KRAS). This underlines the challenges to perform such clinical trials in this rare, heterogeneous cancer type and emphasizes the need for multicenter clinical trials.
Collapse
|
80
|
Bello JC, Hausbeck MK, Sakalidis ML. Application of Target Enrichment Sequencing for Population Genetic Analyses of the Obligate Plant Pathogens Pseudoperonospora cubensis and P. humuli in Michigan. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2021; 34:1103-1118. [PMID: 34227836 DOI: 10.1094/mpmi-11-20-0329-ta] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Technological advances in genome sequencing have improved our ability to catalog genomic variation and have led to an expansion of the scope and scale of genetic studies over the past decade. Yet, for agronomically important plant pathogens such as the downy mildews (Peronosporaceae), the scale of genetic studies remains limited. This is, in part, due to the difficulties associated with maintaining obligate pathogens and the logistical constraints involved in the genotyping of these species (e.g., obtaining DNA of sufficient quantity and quality). To gain an evolutionary and ecological perspective of downy mildews, adaptable methods for the genotyping of their populations are required. Here, we describe a targeted enrichment (TE) protocol to genotype isolates from two Pseudoperonospora species (P. cubensis and P. humuli), using less than 50 ng of mixed pathogen and plant DNA for library preparation. We were able to enrich 830 target genes across 128 samples and identified 2,514 high-quality single nucleotide polymorphism (SNP) variants. Using these SNPs, we detected significant genetic differentiation (analysis of molecular variance [AMOVA], P = 0.01) between P. cubensis subpopulations from Cucurbita moschata (clade I) and Cucumis sativus (clade II) in the state of Michigan. No evidence of location-based differentiation was detected within the P. cubensis (clade II) subpopulation in Michigan. However, a significant effect of location on the genetic variation of the P. humuli subpopulation was detected in the state (AMOVA, P = 0.01). Mantel tests found evidence that the genetic distance among P. humuli samples was associated with the physical distance of the hop yards from which the samples were collected (P = 0.005). The differences in the distribution of genetic variation of the Michigan P. humuli and P. cubensis subpopulations suggest differences in the dispersal of these two species. The TE protocol described here provides an additional tool for genotyping obligate biotrophic plant pathogens and the execution of new genetic studies.[Formula: see text] Copyright © 2021 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.
Collapse
Affiliation(s)
- Julian C Bello
- Department of Plant, Soil and Microbial Sciences, Michigan State, University, East Lansing, MI 48824, U.S.A
| | - Mary K Hausbeck
- Department of Plant, Soil and Microbial Sciences, Michigan State, University, East Lansing, MI 48824, U.S.A
| | - Monique L Sakalidis
- Department of Plant, Soil and Microbial Sciences, Michigan State, University, East Lansing, MI 48824, U.S.A
- Department of Forestry, Michigan State University, East Lansing, MI 48824, U.S.A
| |
Collapse
|
81
|
Guenay-Greunke Y, Bohan DA, Traugott M, Wallinger C. Handling of targeted amplicon sequencing data focusing on index hopping and demultiplexing using a nested metabarcoding approach in ecology. Sci Rep 2021; 11:19510. [PMID: 34593851 PMCID: PMC8484467 DOI: 10.1038/s41598-021-98018-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 08/30/2021] [Indexed: 01/23/2023] Open
Abstract
High-throughput sequencing platforms are increasingly being used for targeted amplicon sequencing because they enable cost-effective sequencing of large sample sets. For meaningful interpretation of targeted amplicon sequencing data and comparison between studies, it is critical that bioinformatic analyses do not introduce artefacts and rely on detailed protocols to ensure that all methods are properly performed and documented. The analysis of large sample sets and the use of predefined indexes create challenges, such as adjusting the sequencing depth across samples and taking sequencing errors or index hopping into account. However, the potential biases these factors introduce to high-throughput amplicon sequencing data sets and how they may be overcome have rarely been addressed. On the example of a nested metabarcoding analysis of 1920 carabid beetle regurgitates to assess plant feeding, we investigated: (i) the variation in sequencing depth of individually tagged samples and the effect of library preparation on the data output; (ii) the influence of sequencing errors within index regions and its consequences for demultiplexing; and (iii) the effect of index hopping. Our results demonstrate that despite library quantification, large variation in read counts and sequencing depth occurred among samples and that the sequencing error rate in bioinformatic software is essential for accurate adapter/primer trimming and demultiplexing. Moreover, setting an index hopping threshold to avoid incorrect assignment of samples is highly recommended.
Collapse
Affiliation(s)
- Yasemin Guenay-Greunke
- Applied Animal Ecology, Department of Zoology, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria. .,Institute of Interdisciplinary Mountain Research, IGF, Austrian Academy of Sciences, Technikerstraße 21a, 6020, Innsbruck, Austria.
| | - David A Bohan
- Agroécologie, AgroSup Dijon, INRAE, Université Bourgogne Franche-Comté, 21000, Dijon, France
| | - Michael Traugott
- Applied Animal Ecology, Department of Zoology, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria
| | - Corinna Wallinger
- Applied Animal Ecology, Department of Zoology, University of Innsbruck, Technikerstraße 25, 6020, Innsbruck, Austria.,Institute of Interdisciplinary Mountain Research, IGF, Austrian Academy of Sciences, Technikerstraße 21a, 6020, Innsbruck, Austria
| |
Collapse
|
82
|
Hori A, Ogata-Kawata H, Sasaki A, Takahashi K, Taniguchi K, Migita O, Kawashima A, Okamoto A, Sekizawa A, Sago H, Takada F, Nakabayashi K, Hata K. Improved library preparation protocols for amplicon sequencing-based noninvasive fetal genotyping for RHD-positive D antigen-negative alleles. BMC Res Notes 2021; 14:380. [PMID: 34565457 PMCID: PMC8474863 DOI: 10.1186/s13104-021-05793-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 09/17/2021] [Indexed: 11/23/2022] Open
Abstract
Objective We aimed to simplify our fetal RHD genotyping protocol by changing the method to attach Illumina’s sequencing adaptors to PCR products from the ligation-based method to a PCR-based method, and to improve its reliability and robustness by introducing unique molecular indexes, which allow us to count the numbers of DNA fragments used as PCR templates and to minimize the effects of PCR and sequencing errors. Results Both of the newly established protocols reduced time and cost compared with our conventional protocol. Removal of PCR duplicates using UMIs reduced the frequencies of erroneously mapped sequences reads likely generated by PCR and sequencing errors. The modified protocols will help us facilitate implementing fetal RHD genotyping for East Asian populations into clinical practice. Supplementary Information The online version contains supplementary material available at 10.1186/s13104-021-05793-4.
Collapse
Affiliation(s)
- Asuka Hori
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya, Tokyo, 157-8535, Japan.,Department of Medical Genetics and Genomics, Kitasato University Graduate School of Medical Sciences, Kanagawa, Japan
| | - Hiroko Ogata-Kawata
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya, Tokyo, 157-8535, Japan
| | - Aiko Sasaki
- Center for Maternal-Fetal, Neonatal, and Reproductive Medicine, National Center for Child Health and Development, Tokyo, Japan
| | - Ken Takahashi
- Department of Obstetrics and Gynecology, The Jikei University School of Medicine, Tokyo, Japan
| | - Kosuke Taniguchi
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya, Tokyo, 157-8535, Japan
| | - Ohsuke Migita
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya, Tokyo, 157-8535, Japan.,Faculty of Medicine, University of Tsukuba, Ibaraki, Japan
| | - Akihiro Kawashima
- Department of Obstetrics and Gynecology, Showa University School of Medicine, Tokyo, Japan
| | - Aikou Okamoto
- Department of Obstetrics and Gynecology, The Jikei University School of Medicine, Tokyo, Japan
| | - Akihiko Sekizawa
- Department of Obstetrics and Gynecology, Showa University School of Medicine, Tokyo, Japan
| | - Haruhiko Sago
- Center for Maternal-Fetal, Neonatal, and Reproductive Medicine, National Center for Child Health and Development, Tokyo, Japan
| | - Fumio Takada
- Department of Medical Genetics and Genomics, Kitasato University Graduate School of Medical Sciences, Kanagawa, Japan
| | - Kazuhiko Nakabayashi
- Laboratory of Developmental Genomics, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya, Tokyo, 157-8535, Japan.
| | - Kenichiro Hata
- Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, 2-10-1 Okura, Setagaya, Tokyo, 157-8535, Japan.
| |
Collapse
|
83
|
Rocha Vieira F, Andrew Pecchia J. Fungal community assembly during a high-temperature composting under different pasteurization regimes used to elaborate the Agaricus bisporus substrate. Fungal Biol 2021; 125:826-833. [PMID: 34537178 DOI: 10.1016/j.funbio.2021.05.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 04/10/2021] [Accepted: 05/24/2021] [Indexed: 01/04/2023]
Abstract
Agaricus bisporus cultivation is based on a selective substrate prepared by a meticulous composting process where thermophilic and/or thermotolerant fungi might play an important role in straw biomass depolymerization. Since fungi have physiological limitations to survive and grow in high-temperature environments, we set out different pasteurization regimes (57 °C/6 h, 60 °C/2 h, and 68 °C/2 h) to evaluate the impact on the fungal community assembly. The fungal community profile generated by high-throughput sequencing showed shifts in community diversity and composition under different pasteurization regimes. Most of the recovered sequences belong to the Ascomycota phylum. Among 73 species detected, Mycothermus thermophilus, Talaromyces thermophilus, and Thermomyces lanuginosus were the most abundant. In the current study, we outlined that pasteurization regimes can reshape the fungal community in compost which can potentially impact the A. bisporus development.
Collapse
Affiliation(s)
- Fabricio Rocha Vieira
- Department of Plant Pathology and Environmental Microbiology, The Pennsylvania State University, University Park, PA, USA.
| | - John Andrew Pecchia
- Department of Plant Pathology and Environmental Microbiology, The Pennsylvania State University, University Park, PA, USA
| |
Collapse
|
84
|
Garushyants SK, Rogozin IB, Koonin EV. Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2021:2021.04.23.441209. [PMID: 33907754 PMCID: PMC8077628 DOI: 10.1101/2021.04.23.441209] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
The appearance of multiple new SARS-CoV-2 variants during the winter of 2020-2021 is a matter of grave concern. Some of these new variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the COVID-19 pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on point nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the spike protein is thought to be a determinant of SARS-CoV-2 virulence and other inserts might have contributed to coronavirus pathogenicity as well. Here, we investigate insertions in SARS-CoV-2 genomes and identify 347 unique inserts of different lengths. We present evidence that these inserts reflect actual virus variance rather than sequencing errors. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. We show that inserts in the Spike glycoprotein can affect its antigenic properties and thus merit monitoring. At least, three inserts in the N-terminal domain of the Spike (ins245IME, ins246DSWG, and ins248SSLT) that were first detected in 2021 are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity.
Collapse
Affiliation(s)
- Sofya K. Garushyants
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Igor B. Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
85
|
Wang RJ, Radivojac P, Hahn MW. Distinct error rates for reference and nonreference genotypes estimated by pedigree analysis. Genetics 2021; 217:1-10. [PMID: 33683359 DOI: 10.1093/genetics/iyaa014] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 11/13/2020] [Indexed: 01/06/2023] Open
Abstract
Errors in genotype calling can have perverse effects on genetic analyses, confounding association studies, and obscuring rare variants. Analyses now routinely incorporate error rates to control for spurious findings. However, reliable estimates of the error rate can be difficult to obtain because of their variance between studies. Most studies also report only a single estimate of the error rate even though genotypes can be miscalled in more than one way. Here, we report a method for estimating the rates at which different types of genotyping errors occur at biallelic loci using pedigree information. Our method identifies potential genotyping errors by exploiting instances where the haplotypic phase has not been faithfully transmitted. The expected frequency of inconsistent phase depends on the combination of genotypes in a pedigree and the probability of miscalling each genotype. We develop a model that uses the differences in these frequencies to estimate rates for different types of genotype error. Simulations show that our method accurately estimates these error rates in a variety of scenarios. We apply this method to a dataset from the whole-genome sequencing of owl monkeys (Aotus nancymaae) in three-generation pedigrees. We find significant differences between estimates for different types of genotyping error, with the most common being homozygous reference sites miscalled as heterozygous and vice versa. The approach we describe is applicable to any set of genotypes where haplotypic phase can reliably be called and should prove useful in helping to control for false discoveries.
Collapse
Affiliation(s)
- Richard J Wang
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Matthew W Hahn
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| |
Collapse
|
86
|
Peng X, Dorman KS. AmpliCI: a high-resolution model-based approach for denoising Illumina amplicon data. Bioinformatics 2021; 36:5151-5158. [PMID: 32697845 PMCID: PMC7850112 DOI: 10.1093/bioinformatics/btaa648] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2020] [Revised: 05/14/2020] [Accepted: 07/16/2020] [Indexed: 01/04/2023] Open
Abstract
Motivation Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. Results We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. Availability and implementation Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiyu Peng
- Department of Statistics, Ames, IA 50011, USA.,Interdepartmental Program in Bioinformatics and Computational Biology, Ames, IA 50011, USA
| | - Karin S Dorman
- Department of Statistics, Ames, IA 50011, USA.,Interdepartmental Program in Bioinformatics and Computational Biology, Ames, IA 50011, USA.,Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
87
|
Kohls M, Saremi B, Muchsin I, Fischer N, Becher P, Jung K. A resampling strategy for studying robustness in virus detection pipelines. Comput Biol Chem 2021; 94:107555. [PMID: 34364046 DOI: 10.1016/j.compbiolchem.2021.107555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 07/14/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Next-generation sequencing is regularly used to identify viral sequences in DNA or RNA samples of infected hosts. A major step of most pipelines for virus detection is to map sequence reads against known virus genomes. Due to small differences between the sequences of related viruses, and due to several biological or technical errors, mapping underlies uncertainties. As a consequence, the resulting list of detected viruses can lack robustness. A new approach for generating artificial sequencing reads together with a strategy of resampling from the original findings is proposed that can help to assess the robustness of the originally identified list of viruses. From the original mapping result in form of a SAM file, a set of statistical distributions are derived. These are used in the resampling pipeline to generate new artificial reads which are again mapped versus the reference genomes. By summarizing the resampling procedure, the analyst receives information about whether the presence of a particular virus in the sample gains or losses evidence, and thus about the robustness of the original mapping list but also that of individual viruses in this list. To judge robustness, several indicators are derived from the resampling procedure such as the correlation between original and resampling read counts, or the statistical detection of outliers in the differences of read counts. Additionally, graphical illustrations of read count shifts via Sankey diagrams are provided. To demonstrate the use of the new approach, the resampling approach is applied to three real-world data samples, one of them with laboratory-confirmed Influenza sequences, and to artificially generated data where virus sequences have been spiked into the sequencing data of a host. By applying the resampling pipeline, several viruses drop from the original list while new viruses emerge, showing robustness of those viruses that remain in the list. The evaluation of the new approach shows that the resampling approach is helpful to analyze the viral content of a biological sample, to rate the robustness of original findings and to better show the overall distribution of findings. The method is also applicable to other virus detection pipelines based on read mapping.
Collapse
Affiliation(s)
- Moritz Kohls
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany.
| | - Babak Saremi
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany.
| | - Ihsan Muchsin
- Institute for Virology and Immunobiology, University of Würzburg, Versbacher Straße 7, 97078 Würzburg, Germany.
| | - Nicole Fischer
- Institute of Medical Microbiology, Virology and Hygiene, University Medical Center Hamburg-Eppendorf (UKE), Martinistraße 52, 20251 Hamburg, Germany.
| | - Paul Becher
- Institute of Virology, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17, 30559 Hannover, Germany.
| | - Klaus Jung
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Foundation, Bünteweg 17p, 30559 Hannover, Germany.
| |
Collapse
|
88
|
Meissner ME, Julik EJ, Badalamenti JP, Arndt WG, Mills LJ, Mansky LM. Development of a User-Friendly Pipeline for Mutational Analyses of HIV Using Ultra-Accurate Maximum-Depth Sequencing. Viruses 2021; 13:v13071338. [PMID: 34372543 PMCID: PMC8310143 DOI: 10.3390/v13071338] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 07/06/2021] [Accepted: 07/07/2021] [Indexed: 01/23/2023] Open
Abstract
Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.
Collapse
Affiliation(s)
- Morgan E. Meissner
- Molecular, Cellular, Developmental Biology & Genetics Graduate Program, University of Minnesota, Minneapolis, MN 55455, USA;
- Bioinformatics and Computational Biology Graduate Program, University of Minnesota, Minneapolis, MN 55455, USA
- Institute for Molecular Virology, University of Minnesota, Minneapolis, MN 55455, USA; (E.J.J.); (W.G.A.)
| | - Emily J. Julik
- Institute for Molecular Virology, University of Minnesota, Minneapolis, MN 55455, USA; (E.J.J.); (W.G.A.)
- Division of Basic Sciences, School of Dentistry, University of Minnesota, Minneapolis, MN 55455, USA
| | - Jonathan P. Badalamenti
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, MN 55455, USA;
| | - William G. Arndt
- Institute for Molecular Virology, University of Minnesota, Minneapolis, MN 55455, USA; (E.J.J.); (W.G.A.)
- Division of Basic Sciences, School of Dentistry, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lauren J. Mills
- Bioinformatics and Computational Biology Graduate Program, University of Minnesota, Minneapolis, MN 55455, USA
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455, USA
- Department of Pediatrics, University of Minnesota, Minneapolis, MN 55455, USA
- Correspondence: (L.J.M.); (L.M.M.)
| | - Louis M. Mansky
- Molecular, Cellular, Developmental Biology & Genetics Graduate Program, University of Minnesota, Minneapolis, MN 55455, USA;
- Bioinformatics and Computational Biology Graduate Program, University of Minnesota, Minneapolis, MN 55455, USA
- Institute for Molecular Virology, University of Minnesota, Minneapolis, MN 55455, USA; (E.J.J.); (W.G.A.)
- Division of Basic Sciences, School of Dentistry, University of Minnesota, Minneapolis, MN 55455, USA
- Masonic Cancer Center, University of Minnesota, Minneapolis, MN 55455, USA
- Correspondence: (L.J.M.); (L.M.M.)
| |
Collapse
|
89
|
Kurkowiak M, Grasso G, Faktor J, Scheiblecker L, Winniczuk M, Mayordomo MY, O'Neill JR, Oster B, Vojtesek B, Al-Saadi A, Marek-Trzonkowska N, Hupp TR. An integrated DNA and RNA variant detector identifies a highly conserved three base exon in the MAP4K5 kinase locus. RNA Biol 2021; 18:2556-2575. [PMID: 34190025 PMCID: PMC8632122 DOI: 10.1080/15476286.2021.1932345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
RNA variants that emerge from editing and alternative splicing form important regulatory stages in protein signalling. In this report, we apply an integrated DNA and RNA variant detection workbench to define the range of RNA variants that deviate from the reference genome in a human melanoma cell model. The RNA variants can be grouped into (i) classic ADAR-like or APOBEC-like RNA editing events and (ii) multiple-nucleotide variants (MNVs) including three and six base pair in-frame non-canonical unmapped exons. We focus on validating representative genes of these classes. First, clustered non-synonymous RNA edits (A-I) in the CDK13 gene were validated by Sanger sequencing to confirm the integrity of the RNA variant detection workbench. Second, a highly conserved RNA variant in the MAP4K5 gene was detected that results most likely from the splicing of a non-canonical three-base exon. The two RNA variants produced from the MAP4K5 locus deviate from the genomic reference sequence and produce V569E or V569del isoform variants. Low doses of splicing inhibitors demonstrated that the MAP4K5-V569E variant emerges from an SF3B1-dependent splicing event. Mass spectrometry of the recombinant SBP-tagged MAP4K5V569E and MAP4K5V569del proteins pull-downs in transfected cell systems was used to identify the protein-protein interactions of these two MAP4K5 isoforms and propose possible functions. Together these data highlight the utility of this integrated DNA and RNA variant detection platform to detect RNA variants in cancer cells and support future analysis of RNA variant detection in cancer tissue.
Collapse
Affiliation(s)
- Małgorzata Kurkowiak
- International Centre for Cancer Vaccine Science (ICCVS), University of Gdańsk, 80-822 Gdańsk, Poland
| | - Giuseppa Grasso
- University of Edinburgh, Institute of Genetics and Molecular Medicine, Edinburgh Cancer Research Centre, Edinburgh, Scotland, UK
| | - Jakub Faktor
- International Centre for Cancer Vaccine Science (ICCVS), University of Gdańsk, 80-822 Gdańsk, Poland.,Research Centre for Applied Molecular Oncology, Masaryk Memorial Cancer Institute, Brno, Czech Republic
| | - Lisa Scheiblecker
- Institute of Pharmacology and Toxicology, University of Veterinary Medicine Vienna, 1210 Vienna, Austria
| | - Małgorzata Winniczuk
- International Centre for Cancer Vaccine Science (ICCVS), University of Gdańsk, 80-822 Gdańsk, Poland
| | - Marcos Yebenes Mayordomo
- International Centre for Cancer Vaccine Science (ICCVS), University of Gdańsk, 80-822 Gdańsk, Poland.,University of Edinburgh, Institute of Genetics and Molecular Medicine, Edinburgh Cancer Research Centre, Edinburgh, Scotland, UK
| | - J Robert O'Neill
- Cambridge Oesophagogastric Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - Bodil Oster
- QIAGEN Aarhus, Silkeborgvej 2, 8000 Aarhus, Denmark
| | - Borek Vojtesek
- Research Centre for Applied Molecular Oncology, Masaryk Memorial Cancer Institute, Brno, Czech Republic
| | - Ali Al-Saadi
- University of Edinburgh, Institute of Genetics and Molecular Medicine, Edinburgh Cancer Research Centre, Edinburgh, Scotland, UK
| | - Natalia Marek-Trzonkowska
- International Centre for Cancer Vaccine Science (ICCVS), University of Gdańsk, 80-822 Gdańsk, Poland.,Laboratory of Immunoregulation and Cellular Therapies, Department of Family Medicine, Medical University of Gdańsk, Gdańsk, Poland
| | - Ted R Hupp
- International Centre for Cancer Vaccine Science (ICCVS), University of Gdańsk, 80-822 Gdańsk, Poland.,University of Edinburgh, Institute of Genetics and Molecular Medicine, Edinburgh Cancer Research Centre, Edinburgh, Scotland, UK
| |
Collapse
|
90
|
Gomez-Escribano JP, Holmes NA, Schlimpert S, Bibb MJ, Chandra G, Wilkinson B, Buttner MJ, Bibb MJ. Streptomyces venezuelae NRRL B-65442: genome sequence of a model strain used to study morphological differentiation in filamentous actinobacteria. J Ind Microbiol Biotechnol 2021; 48:6294913. [PMID: 34100946 PMCID: PMC8788739 DOI: 10.1093/jimb/kuab035] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Accepted: 06/01/2021] [Indexed: 12/13/2022]
Abstract
For over a decade, Streptomyces venezuelae has been used to study the molecular mechanisms that control morphological development in streptomycetes and it is now a well-established model strain. Its rapid growth and ability to sporulate in a near-synchronised manner in liquid culture, unusual among streptomycetes, greatly facilitates the application of modern molecular techniques such as ChIP-seq and RNA-seq, as well as fluorescence time-lapse imaging of the complete Streptomyces life cycle. Here we describe a high-quality genome sequence of our isolate of the strain (NRRL B-65442) consisting of an 8.2 Mb chromosome and a 158 kb plasmid, pSVJI1, which had not been reported previously. Surprisingly, while NRRL B-65442 yields green spores on MYM agar, the ATCC type strain 10712 (from which NRRL B-65442 was derived) produces grey spores. While comparison of the genome sequences of the two isolates revealed almost total identity, it did reveal a single nucleotide substitution in a gene, vnz_33525, likely to be involved in spore pigment biosynthesis. Replacement of the vnz_33525 allele of ATCC 10712 with that of NRRL B-65442 resulted in green spores, explaining the discrepancy in spore pigmentation. We also applied CRISPR-Cas9 to delete the essential parB of pSVJI1 to cure the plasmid from the strain without obvious phenotypic consequences.
Collapse
Affiliation(s)
| | - Neil A Holmes
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Susan Schlimpert
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Maureen J Bibb
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Govind Chandra
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Barrie Wilkinson
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Mark J Buttner
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| | - Mervyn J Bibb
- Department of Molecular Microbiology, John Innes Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| |
Collapse
|
91
|
Wang Y, Xue H, Pourcel C, Du Y, Gautheret D. 2-kupl: mapping-free variant detection from DNA-seq data of matched samples. BMC Bioinformatics 2021; 22:304. [PMID: 34090332 PMCID: PMC8180056 DOI: 10.1186/s12859-021-04185-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/11/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of genome variants, including point mutations, indels and structural variants, is a fundamental and challenging computational problem. We address here the problem of variant detection between two deep-sequencing (DNA-seq) samples, such as two human samples from an individual patient, or two samples from distinct bacterial strains. The preferred strategy in such a case is to align each sample to a common reference genome, collect all variants and compare these variants between samples. Such mapping-based protocols have several limitations. DNA sequences with large indels, aggregated mutations and structural variants are hard to map to the reference. Furthermore, DNA sequences cannot be mapped reliably to genomic low complexity regions and repeats. RESULTS We introduce 2-kupl, a k-mer based, mapping-free protocol to detect variants between two DNA-seq samples. On simulated and actual data, 2-kupl achieves higher accuracy than other mapping-free protocols. Applying 2-kupl to prostate cancer whole exome sequencing data, we identify a number of candidate variants in hard-to-map regions and propose potential novel recurrent variants in this disease. CONCLUSIONS We developed a mapping-free protocol for variant calling between matched DNA-seq samples. Our protocol is suitable for variant detection in unmappable genome regions or in the absence of a reference genome.
Collapse
Affiliation(s)
- Yunfeng Wang
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
- Annoroad Gene Technology Co., Ltd, Beijing, 100176 China
| | - Haoliang Xue
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
| | - Christine Pourcel
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
| | - Yang Du
- Annoroad Gene Technology Co., Ltd, Beijing, 100176 China
| | - Daniel Gautheret
- Institute of Integrative Cell Biology (I2BC), Université Paris-Saclay, CNRS, CEA, 1 avenue de la Terrasse, 91190 Gif-sur-Yvette, France
- IHU PRISM, Gustave Roussy, 114 rue Edouard Vaillant, 94800 Villejuif, France
| |
Collapse
|
92
|
Letiagina AE, Omelina ES, Ivankin AV, Pindyurin AV. MPRAdecoder: Processing of the Raw MPRA Data With a priori Unknown Sequences of the Region of Interest and Associated Barcodes. Front Genet 2021; 12:618189. [PMID: 34046055 PMCID: PMC8148044 DOI: 10.3389/fgene.2021.618189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 03/25/2021] [Indexed: 11/13/2022] Open
Abstract
Massively parallel reporter assays (MPRAs) enable high-throughput functional evaluation of numerous DNA regulatory elements and/or their mutant variants. The assays are based on the construction of reporter plasmid libraries containing two variable parts, a region of interest (ROI) and a barcode (BC), located outside and within the transcription unit, respectively. Importantly, each plasmid molecule in a such a highly diverse library is characterized by a unique BC-ROI association. The reporter constructs are delivered to target cells and expression of BCs at the transcript level is assayed by RT-PCR followed by next-generation sequencing (NGS). The obtained values are normalized to the abundance of BCs in the plasmid DNA sample. Altogether, this allows evaluating the regulatory potential of the associated ROI sequences. However, depending on the MPRA library construction design, the BC and ROI sequences as well as their associations can be a priori unknown. In such a case, the BC and ROI sequences, their possible mutant variants, and unambiguous BC-ROI associations have to be identified, whereas all uncertain cases have to be excluded from the analysis. Besides the preparation of additional "mapping" samples for NGS, this also requires specific bioinformatics tools. Here, we present a pipeline for processing raw MPRA data obtained by NGS for reporter construct libraries with a priori unknown sequences of BCs and ROIs. The pipeline robustly identifies unambiguous (so-called genuine) BCs and ROIs associated with them, calculates the normalized expression level for each BC and the averaged values for each ROI, and provides a graphical visualization of the processed data.
Collapse
Affiliation(s)
- Anna E Letiagina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Faculty of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Evgeniya S Omelina
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anton V Ivankin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Alexey V Pindyurin
- Institute of Molecular and Cellular Biology of the Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| |
Collapse
|
93
|
Fitness selection of hyperfusogenic measles virus F proteins associated with neuropathogenic phenotypes. Proc Natl Acad Sci U S A 2021; 118:2026027118. [PMID: 33903248 DOI: 10.1073/pnas.2026027118] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Measles virus (MeV) is resurgent and caused >200,000 deaths in 2019. MeV infection can establish a chronic latent infection of the brain that can recrudesce months to years after recovery from the primary infection. Recrudescent MeV leads to fatal subacute sclerosing panencephalitis (SSPE) or measles inclusion body encephalitis (MIBE) as the virus spreads across multiple brain regions. Most clinical isolates of SSPE/MIBE strains show mutations in the fusion (F) gene that result in a hyperfusogenic phenotype in vitro and allow for efficient spread in primary human neurons. Wild-type MeV receptor-binding protein is indispensable for manifesting these mutant F phenotypes, even though neurons lack canonical MeV receptors (CD150/SLAMF1 or nectin-4). How such hyperfusogenic F mutants are selected and whether they confer a fitness advantage for efficient neuronal spread is unresolved. To better understand the fitness landscape that allows for the selection of such hyperfusogenic F mutants, we conducted a screen of ≥3.1 × 105 MeV-F point mutants in their genomic context. We rescued and amplified our genomic MeV-F mutant libraries in BSR-T7 cells under conditions in which MeV-F-T461I (a known SSPE mutant), but not wild-type MeV, can spread. We recovered known SSPE mutants but also characterized at least 15 hyperfusogenic F mutations with an SSPE phenotype. Structural mapping of these mutants onto the prefusion MeV-F trimer confirm and extend our understanding of the F regulatory domains in MeV-F. Our list of hyperfusogenic F mutants is a valuable resource for future studies into MeV neuropathogenesis and the regulation of paramyxovirus F.
Collapse
|
94
|
Giles HH, Hegde MR, Lyon E, Stanley CM, Kerr ID, Garlapow ME, Eggington JM. The Science and Art of Clinical Genetic Variant Classification and Its Impact on Test Accuracy. Annu Rev Genomics Hum Genet 2021; 22:285-307. [PMID: 33900788 DOI: 10.1146/annurev-genom-121620-082709] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Clinical genetic variant classification science is a growing subspecialty of clinical genetics and genomics. The field's continued improvement is essential for the success of precision medicine in both germline (hereditary) and somatic (oncology) contexts. This review focuses on variant classification for DNA next-generation sequencing tests. We first summarize current limitations in variant discovery and definition, and then describe the current five- and four-tier classification systems outlined in dominant standards and guideline publications for germline and somatic tests, respectively. We then discuss measures of variant classification discordance and the field's bias for positive results, as well as considerations for panel size and population screening in the context of estimates of positive predictive value thatincorporate estimated variant classification imperfections. Finally, we share opinions on the current state of variant classification from some of the authors of the most widely used standards and guideline publications and from other domain experts.
Collapse
Affiliation(s)
- Hunter H Giles
- Center for Genomic Interpretation, Sandy, Utah 84092, USA; , ,
| | - Madhuri R Hegde
- PerkinElmer Genomics, Waltham, Massachusetts 02450, USA; .,Department of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| | - Elaine Lyon
- HudsonAlpha Clinical Services Lab, Huntsville, Alabama 35806, USA;
| | - Christine M Stanley
- C2i Genomics, Cambridge, Massachusetts 02139, USA.,Variantyx, Framingham, Massachusetts 01701, USA;
| | | | | | | |
Collapse
|
95
|
Oliva A, Tobler R, Cooper A, Llamas B, Souilmi Y. Systematic benchmark of ancient DNA read mapping. Brief Bioinform 2021; 22:6217726. [PMID: 33834210 DOI: 10.1093/bib/bbab076] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 01/05/2021] [Accepted: 02/16/2021] [Indexed: 11/12/2022] Open
Abstract
The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA 'reads') against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30-80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software-BWA-aln, BWA-mem, NovoAlign and Bowtie2-and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.
Collapse
Affiliation(s)
- Adrien Oliva
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| | - Raymond Tobler
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| | - Alan Cooper
- Australian Research Council Laureate Fellow specializing in ancient DNA, Australia
| | - Bastien Llamas
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| | - Yassine Souilmi
- Australian Centre for Ancient DNA at the University of Adelaide, Australia
| |
Collapse
|
96
|
Wei ZG, Zhang XD, Cao M, Liu F, Qian Y, Zhang SW. Comparison of Methods for Picking the Operational Taxonomic Units From Amplicon Sequences. Front Microbiol 2021; 12:644012. [PMID: 33841367 PMCID: PMC8024490 DOI: 10.3389/fmicb.2021.644012] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 02/17/2021] [Indexed: 12/31/2022] Open
Abstract
With the advent of next-generation sequencing technology, it has become convenient and cost efficient to thoroughly characterize the microbial diversity and taxonomic composition in various environmental samples. Millions of sequencing data can be generated, and how to utilize this enormous sequence resource has become a critical concern for microbial ecologists. One particular challenge is the OTUs (operational taxonomic units) picking in 16S rRNA sequence analysis. Lucky, this challenge can be directly addressed by sequence clustering that attempts to group similar sequences. Therefore, numerous clustering methods have been proposed to help to cluster 16S rRNA sequences into OTUs. However, each method has its clustering mechanism, and different methods produce diverse outputs. Even a slight parameter change for the same method can also generate distinct results, and how to choose an appropriate method has become a challenge for inexperienced users. A lot of time and resources can be wasted in selecting clustering tools and analyzing the clustering results. In this study, we introduced the recent advance of clustering methods for OTUs picking, which mainly focus on three aspects: (i) the principles of existing clustering algorithms, (ii) benchmark dataset construction for OTU picking and evaluation metrics, and (iii) the performance of different methods with various distance thresholds on benchmark datasets. This paper aims to assist biological researchers to select the reasonable clustering methods for analyzing their collected sequences and help algorithm developers to design more efficient sequences clustering methods.
Collapse
Affiliation(s)
- Ze-Gang Wei
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| | - Xiao-Dan Zhang
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Ming Cao
- Faculty of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- School of Mathematics and Statistics, Shaanxi Xueqian Normal University, Xi’an, China
| | - Fei Liu
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Yu Qian
- Institute of Physics and Optoelectronics Technology, Baoji University of Arts and Sciences, Baoji, China
| | - Shao-Wu Zhang
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
97
|
Chrisman BS, Paskov K, Stockham N, Tabatabaei K, Jung JY, Washington P, Varma M, Sun MW, Maleki S, Wall DP. Indels in SARS-CoV-2 occur at template-switching hotspots. BioData Min 2021; 14:20. [PMID: 33743803 PMCID: PMC7980745 DOI: 10.1186/s13040-021-00251-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 02/23/2021] [Indexed: 11/10/2022] Open
Abstract
The evolutionary dynamics of SARS-CoV-2 have been carefully monitored since the COVID-19 pandemic began in December 2019. However, analysis has focused primarily on single nucleotide polymorphisms and largely ignored the role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution. Using sequences from the GISAID database, we catalogue over 100 insertions and deletions in the SARS-CoV-2 consensus sequences. We hypothesize that these indels are artifacts of recombination events between SARS-CoV-2 replicates whereby RNA-dependent RNA polymerase (RdRp) re-associates with a homologous template at a different loci ("imperfect homologous recombination"). We provide several independent pieces of evidence that suggest this. (1) The indels from the GISAID consensus sequences are clustered at specific regions of the genome. (2) These regions are also enriched for 5' and 3' breakpoints in the transcription regulatory site (TRS) independent transcriptome, presumably sites of RNA-dependent RNA polymerase (RdRp) template-switching. (3) Within raw reads, these indel hotspots have cases of both high intra-host heterogeneity and intra-host homogeneity, suggesting that these indels are both consequences of de novo recombination events within a host and artifacts of previous recombination. We briefly analyze the indels in the context of RNA secondary structure, noting that indels preferentially occur in "arms" and loop structures of the predicted folded RNA, suggesting that secondary structure may be a mechanism for TRS-independent template-switching in SARS-CoV-2 or other coronaviruses. These insights into the relationship between structural variation and recombination in SARS-CoV-2 can improve our reconstructions of the SARS-CoV-2 evolutionary history as well as our understanding of the process of RdRp template-switching in RNA viruses.
Collapse
Affiliation(s)
| | - Kelley Paskov
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Nate Stockham
- Department of Neuroscience, Stanford University, Stanford, USA
| | - Kevin Tabatabaei
- Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Jae-Yoon Jung
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Peter Washington
- Department of Bioengineering, Stanford University, Stanford, USA
| | - Maya Varma
- Department of Computer Science, Stanford University, Stanford, USA
| | - Min Woo Sun
- Department of Biomedical Data Science, Stanford University, Stanford, USA
| | - Sepideh Maleki
- Department of Computer Science, University of Texas Austin, Austin, USA
| | - Dennis P Wall
- Department of Biomedical Data Science, Stanford University, Stanford, USA.
- Department of Pediatrics (Systems Medicine), Stanford University, Stanford, USA.
| |
Collapse
|
98
|
Seyran M, Hassan SS, Uversky VN, Pal Choudhury P, Uhal BD, Lundstrom K, Attrish D, Rezaei N, Aljabali AAA, Ghosh S, Pizzol D, Adadi P, El-Aziz TMA, Kandimalla R, Tambuwala MM, Lal A, Azad GK, Sherchan SP, Baetas-da-Cruz W, Palù G, Brufsky AM. Urgent Need for Field Surveys of Coronaviruses in Southeast Asia to Understand the SARS-CoV-2 Phylogeny and Risk Assessment for Future Outbreaks. Biomolecules 2021; 11:398. [PMID: 33803118 PMCID: PMC7999587 DOI: 10.3390/biom11030398] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/19/2021] [Accepted: 02/20/2021] [Indexed: 02/06/2023] Open
Abstract
Phylogenetic analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is focused on a single isolate of bat coronaviruses (bat CoVs) which does not adequately represent genetically related coronaviruses (CoVs) [...].
Collapse
Affiliation(s)
- Murat Seyran
- Doctoral Studies in Natural and Technical Sciences (SPL 44), University of Vienna, Währinger Straße, A-1090 Vienna, Austria;
| | - Sk. Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur 721140, West Bengal, India;
| | - Vladimir N. Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
| | - Pabitra Pal Choudhury
- Applied Statistics Unit, Indian Statistical Institute, Kolkata 700108, West Bengal, India;
| | - Bruce D. Uhal
- Department of Physiology, Michigan State University, East Lansing, MI 48824, USA;
| | | | - Diksha Attrish
- Dr. B R Ambedkar Center for Biomedical Research (ACBR), University of Delhi (North Camps), Delhi-110007, India;
| | - Nima Rezaei
- Research Center for Immunodeficiencies, Pediatrics Center of Excellence, Children’s Medical Center, Tehran, University of Medical Sciences, Tehran 1419733151, Iran;
- Network of Immunity in Infection, Malignancy and Autoimmunity (NIIMA), Universal Scientific Education and Research Network (USERN), Tehran 1419733151, Iran
| | - Alaa A. A. Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University-Faculty of Pharmacy, Irbid 566, Jordan;
| | - Shinjini Ghosh
- Department of Biophysics, Molecular Biology and Bioinformatics, University of Calcutta, Kolkata 700009, West Bengal, India;
| | - Damiano Pizzol
- Italian Agency for Development Cooperation—Khartoum, Sudan Street 33, Al Amarat 13374, Sudan;
| | - Parise Adadi
- Department of Food Science, University of Otago, Dunedin 9054, New Zealand;
| | - Tarek Mohamed Abd El-Aziz
- Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San Antonio, TX 78229-3900, USA;
- Zoology Department, Faculty of Science, Minia University, El-Minia 61519, Egypt
| | - Ramesh Kandimalla
- CSIR-Indian Institute of Chemical Technology Uppal Road, Tarnaka, Hyderabad 500007, Telangana State, India;
| | - Murtaza M. Tambuwala
- School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine BT52 1SA, Northern Ireland, UK;
| | - Amos Lal
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN 55905, USA;
| | | | - Samendra P. Sherchan
- Department of Environmental Health Sciences, Tulane University, New Orleans, LA 70112, USA;
| | - Wagner Baetas-da-Cruz
- Translational Laboratory in Molecular Physiology, Centre for Experimental Surgery, College of Medicine, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro 21941901, Brazil;
| | - Giorgio Palù
- Department of Molecular Medicine, University of Padova, Via Gabelli 63, 35121 Padova, Italy
| | - Adam M. Brufsky
- UPMC Hillman Cancer Center, Department of Medicine, Division of Hematology/Oncology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA;
| |
Collapse
|
99
|
Werner S, Galliot A, Pichot F, Kemmer T, Marchand V, Sednev MV, Lence T, Roignant JY, König J, Höbartner C, Motorin Y, Hildebrandt A, Helm M. NOseq: amplicon sequencing evaluation method for RNA m6A sites after chemical deamination. Nucleic Acids Res 2021; 49:e23. [PMID: 33313868 PMCID: PMC7913672 DOI: 10.1093/nar/gkaa1173] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 11/13/2020] [Accepted: 11/20/2020] [Indexed: 12/26/2022] Open
Abstract
Methods for the detection of m6A by RNA-Seq technologies are increasingly sought after. We here present NOseq, a method to detect m6A residues in defined amplicons by virtue of their resistance to chemical deamination, effected by nitrous acid. Partial deamination in NOseq affects all exocyclic amino groups present in nucleobases and thus also changes sequence information. The method uses a mapping algorithm specifically adapted to the sequence degeneration caused by deamination events. Thus, m6A sites with partial modification levels of ∼50% were detected in defined amplicons, and this threshold can be lowered to ∼10% by combination with m6A immunoprecipitation. NOseq faithfully detected known m6A sites in human rRNA, and the long non-coding RNA MALAT1, and positively validated several m6A candidate sites, drawn from miCLIP data with an m6A antibody, in the transcriptome of Drosophila melanogaster. Conceptually related to bisulfite sequencing, NOseq presents a novel amplicon-based sequencing approach for the validation of m6A sites in defined sequences.
Collapse
Affiliation(s)
- Stephan Werner
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Mainz, Staudingerweg 5, 55128 Mainz, Germany
| | - Aurellia Galliot
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Mainz, Staudingerweg 5, 55128 Mainz, Germany
| | - Florian Pichot
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Mainz, Staudingerweg 5, 55128 Mainz, Germany
| | - Thomas Kemmer
- Institute of Computer Science, Johannes Gutenberg-University Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Virginie Marchand
- Université de Lorraine, CNRS, INSERM, Epitranscriptomics and Sequencing (EpiRNA-Seq) Core Facility, UMS2008/US40 IBSLor, Biopôle UL, F-54000 Nancy, France
| | - Maksim V Sednev
- Institute of Organic Chemistry, Julius Maximilian University Würzburg, Am Hubland, 97074 Würzburg, Germany
| | - Tina Lence
- Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Jean-Yves Roignant
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Mainz, Staudingerweg 5, 55128 Mainz, Germany.,Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany.,Génopode - Center for Integrative Genomics, Université de Lausanne, 1015 Lausanne, Switzerland
| | - Julian König
- Institute of Molecular Biology, Ackermannweg 4, 55128 Mainz, Germany
| | - Claudia Höbartner
- Institute of Organic Chemistry, Julius Maximilian University Würzburg, Am Hubland, 97074 Würzburg, Germany
| | - Yuri Motorin
- Université de Lorraine, CNRS, UMR7365 IMoPA, Biopôle UL, F-54000 Nancy, France
| | - Andreas Hildebrandt
- Institute of Computer Science, Johannes Gutenberg-University Mainz, Staudingerweg 9, 55128 Mainz, Germany
| | - Mark Helm
- Institute of Pharmaceutical and Biomedical Sciences, Johannes Gutenberg-University Mainz, Staudingerweg 5, 55128 Mainz, Germany
| |
Collapse
|
100
|
Stoler N, Nekrutenko A. Sequencing error profiles of Illumina sequencing instruments. NAR Genom Bioinform 2021; 3:lqab019. [PMID: 33817639 PMCID: PMC8002175 DOI: 10.1093/nargab/lqab019] [Citation(s) in RCA: 149] [Impact Index Per Article: 49.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 02/01/2021] [Accepted: 03/16/2021] [Indexed: 12/13/2022] Open
Abstract
Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.
Collapse
Affiliation(s)
- Nicholas Stoler
- Graduate Program in Bioinformatics and Genomics, The Huck Institutes for Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|