1
|
Martín R, Gaitán N, Jarlier F, Feuerbach L, de Soyres H, Arbonés M, Gutman T, Puiggròs M, Ferriz A, Gonzalez A, Estelles L, Gut I, Capella-Gutierrez S, Stein LD, Brors B, Royo R, Hupé P, Torrents D. ONCOLINER: A new solution for monitoring, improving, and harmonizing somatic variant calling across genomic oncology centers. CELL GENOMICS 2024; 4:100639. [PMID: 39216474 DOI: 10.1016/j.xgen.2024.100639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 06/13/2024] [Accepted: 08/07/2024] [Indexed: 09/04/2024]
Abstract
The characterization of somatic genomic variation associated with the biology of tumors is fundamental for cancer research and personalized medicine, as it guides the reliability and impact of cancer studies and genomic-based decisions in clinical oncology. However, the quality and scope of tumor genome analysis across cancer research centers and hospitals are currently highly heterogeneous, limiting the consistency of tumor diagnoses across hospitals and the possibilities of data sharing and data integration across studies. With the aim of providing users with actionable and personalized recommendations for the overall enhancement and harmonization of somatic variant identification across research and clinical environments, we have developed ONCOLINER. Using specifically designed mosaic and tumorized genomes for the analysis of recall and precision across somatic SNVs, insertions or deletions (indels), and structural variants (SVs), we demonstrate that ONCOLINER is capable of improving and harmonizing genome analysis across three state-of-the-art variant discovery pipelines in genomic oncology.
Collapse
Affiliation(s)
- Rodrigo Martín
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Nicolás Gaitán
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Frédéric Jarlier
- Institut Curie, Paris, France; U900, Paris, France; PSL Research University, Paris, France; Mines Paris Tech, Fontainebleau, France
| | - Lars Feuerbach
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Henri de Soyres
- Institut Curie, Paris, France; U900, Paris, France; PSL Research University, Paris, France; Mines Paris Tech, Fontainebleau, France
| | - Marc Arbonés
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Tom Gutman
- Institut Curie, Paris, France; U900, Paris, France; PSL Research University, Paris, France; Mines Paris Tech, Fontainebleau, France
| | - Montserrat Puiggròs
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Alvaro Ferriz
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Asier Gonzalez
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | | | - Ivo Gut
- Centro Nacional de Análisis Genómico, Barcelona, Spain
| | | | - Lincoln D Stein
- Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada; Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Benedikt Brors
- Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany; German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Romina Royo
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Philippe Hupé
- Institut Curie, Paris, France; U900, Paris, France; PSL Research University, Paris, France; Mines Paris Tech, Fontainebleau, France; UMR144, CNRS, Paris, France
| | - David Torrents
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain; Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.
| |
Collapse
|
2
|
Bagheri S, Taghvaei M, Familiar A, Haldar D, Zandifar A, Khalili N, Vossough A, Nabavizadeh A. Statistical plots in oncologic imaging, a primer for neuroradiologists. Neuroradiol J 2024; 37:418-433. [PMID: 37529843 PMCID: PMC11366205 DOI: 10.1177/19714009231193158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/03/2023] Open
Abstract
The simplest approach to convey the results of scientific analysis, which can include complex comparisons, is typically through the use of visual items, including figures and plots. These statistical plots play a critical role in scientific studies, making data more accessible, engaging, and informative. A growing number of visual representations have been utilized recently to graphically display the results of oncologic imaging, including radiomic and radiogenomic studies. Here, we review the applications, distinct properties, benefits, and drawbacks of various statistical plots. Furthermore, we provide neuroradiologists with a comprehensive understanding of how to use these plots to effectively communicate analytical results based on imaging data.
Collapse
Affiliation(s)
- Sina Bagheri
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Data-Driven Discovery in Biomedicine (D3b), Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Mohammad Taghvaei
- Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Ariana Familiar
- Center for Data-Driven Discovery in Biomedicine (D3b), Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Debanjan Haldar
- Center for Data-Driven Discovery in Biomedicine (D3b), Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Alireza Zandifar
- Department of Radiology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Nastaran Khalili
- Center for Data-Driven Discovery in Biomedicine (D3b), Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Arastoo Vossough
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Data-Driven Discovery in Biomedicine (D3b), Children’s Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Radiology, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| | - Ali Nabavizadeh
- Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Center for Data-Driven Discovery in Biomedicine (D3b), Children’s Hospital of Philadelphia, Philadelphia, PA, USA
| |
Collapse
|
3
|
Atzeni R, Massidda M, Pieroni E, Rallo V, Pisu M, Angius A. A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer. Int J Mol Sci 2024; 25:8044. [PMID: 39125613 PMCID: PMC11311285 DOI: 10.3390/ijms25158044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/11/2024] [Accepted: 07/22/2024] [Indexed: 08/12/2024] Open
Abstract
Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. Musta was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. Musta is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. Musta is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. Musta has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. Musta has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.
Collapse
Affiliation(s)
- Rossano Atzeni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Matteo Massidda
- Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy;
| | - Enrico Pieroni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Vincenzo Rallo
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| | - Massimo Pisu
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Andrea Angius
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| |
Collapse
|
4
|
Tan KT, Slevin MK, Leibowitz ML, Garrity-Janger M, Shan J, Li H, Meyerson M. Neotelomeres and telomere-spanning chromosomal arm fusions in cancer genomes revealed by long-read sequencing. CELL GENOMICS 2024; 4:100588. [PMID: 38917803 PMCID: PMC11293586 DOI: 10.1016/j.xgen.2024.100588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 11/09/2023] [Accepted: 05/30/2024] [Indexed: 06/27/2024]
Abstract
Alterations in the structure and location of telomeres are pivotal in cancer genome evolution. Here, we applied both long-read and short-read genome sequencing to assess telomere repeat-containing structures in cancers and cancer cell lines. Using long-read genome sequences that span telomeric repeats, we defined four types of telomere repeat variations in cancer cells: neotelomeres where telomere addition heals chromosome breaks, chromosomal arm fusions spanning telomere repeats, fusions of neotelomeres, and peri-centromeric fusions with adjoined telomere and centromere repeats. These results provide a framework for the systematic study of telomeric repeats in cancer genomes, which could serve as a model for understanding the somatic evolution of other repetitive genomic elements.
Collapse
Affiliation(s)
- Kar-Tong Tan
- Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | | | - Mitchell L Leibowitz
- Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Max Garrity-Janger
- Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Jidong Shan
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Heng Li
- Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA.
| | - Matthew Meyerson
- Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02215, USA.
| |
Collapse
|
5
|
Lai J, Yang Y, Liu Y, Scharpf RB, Karchin R. Assessing the merits: an opinion on the effectiveness of simulation techniques in tumor subclonal reconstruction. BIOINFORMATICS ADVANCES 2024; 4:vbae094. [PMID: 38948008 PMCID: PMC11213631 DOI: 10.1093/bioadv/vbae094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 05/28/2024] [Accepted: 06/15/2024] [Indexed: 07/02/2024]
Abstract
Summary Neoplastic tumors originate from a single cell, and their evolution can be traced through lineages characterized by mutations, copy number alterations, and structural variants. These lineages are reconstructed and mapped onto evolutionary trees with algorithmic approaches. However, without ground truth benchmark sets, the validity of an algorithm remains uncertain, limiting potential clinical applicability. With a growing number of algorithms available, there is urgent need for standardized benchmark sets to evaluate their merits. Benchmark sets rely on in silico simulations of tumor sequence, but there are no accepted standards for simulation tools, presenting a major obstacle to progress in this field. Availability and implementation All analysis done in the paper was based on publicly available data from the publication of each accessed tool.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yi Yang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
| | - Robert B Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, United States
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21231, United States
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, United States
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, United States
| |
Collapse
|
6
|
Salcedo A, Tarabichi M, Buchanan A, Espiritu SMG, Zhang H, Zhu K, Ou Yang TH, Leshchiner I, Anastassiou D, Guan Y, Jang GH, Mootor MFE, Haase K, Deshwar AG, Zou W, Umar I, Dentro S, Wintersinger JA, Chiotti K, Demeulemeester J, Jolly C, Sycza L, Ko M, Wedge DC, Morris QD, Ellrott K, Van Loo P, Boutros PC. Crowd-sourced benchmarking of single-sample tumor subclonal reconstruction. Nat Biotechnol 2024:10.1038/s41587-024-02250-y. [PMID: 38862616 DOI: 10.1038/s41587-024-02250-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 04/17/2024] [Indexed: 06/13/2024]
Abstract
Subclonal reconstruction algorithms use bulk DNA sequencing data to quantify parameters of tumor evolution, allowing an assessment of how cancers initiate, progress and respond to selective pressures. We launched the ICGC-TCGA (International Cancer Genome Consortium-The Cancer Genome Atlas) DREAM Somatic Mutation Calling Tumor Heterogeneity and Evolution Challenge to benchmark existing subclonal reconstruction algorithms. This 7-year community effort used cloud computing to benchmark 31 subclonal reconstruction algorithms on 51 simulated tumors. Algorithms were scored on seven independent tasks, leading to 12,061 total runs. Algorithm choice influenced performance substantially more than tumor features but purity-adjusted read depth, copy-number state and read mappability were associated with the performance of most algorithms on most tasks. No single algorithm was a top performer for all seven tasks and existing ensemble strategies were unable to outperform the best individual methods, highlighting a key research need. All containerized methods, evaluation code and datasets are available to support further assessment of the determinants of subclonal reconstruction accuracy and development of improved methods to understand tumor evolution.
Collapse
Affiliation(s)
- Adriana Salcedo
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA.
- Institute for Precision Health, University of California, Los Angeles, CA, USA.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
| | - Maxime Tarabichi
- The Francis Crick Institute, London, UK.
- Wellcome Sanger Institute, Hinxton, UK.
- Institute for Interdisciplinary Research, Université Libre de Bruxelles, Brussels, Belgium.
| | - Alex Buchanan
- Oregon Health and Sciences University, Portland, OR, USA
| | | | - Hongjiu Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Kaiyi Zhu
- Department of Systems Biology, Columbia University, New York, NY, USA
- Center for Cancer Systems Therapeutics, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | - Tai-Hsien Ou Yang
- Department of Systems Biology, Columbia University, New York, NY, USA
- Center for Cancer Systems Therapeutics, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
| | | | - Dimitris Anastassiou
- Department of Systems Biology, Columbia University, New York, NY, USA
- Center for Cancer Systems Therapeutics, Columbia University, New York, NY, USA
- Department of Electrical Engineering, Columbia University, New York, NY, USA
- Herbert Irving Comprehensive Cancer Center, Columbia University, New York, NY, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Electronic Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA
| | - Gun Ho Jang
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Mohammed F E Mootor
- Department of Human Genetics, University of California, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA
- Institute for Precision Health, University of California, Los Angeles, CA, USA
| | | | - Amit G Deshwar
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - William Zou
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Imaad Umar
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Stefan Dentro
- The Francis Crick Institute, London, UK
- Wellcome Sanger Institute, Hinxton, UK
| | - Jeff A Wintersinger
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| | - Kami Chiotti
- Oregon Health and Sciences University, Portland, OR, USA
| | - Jonas Demeulemeester
- The Francis Crick Institute, London, UK
- VIB Center for Cancer Biology, Leuven, Belgium
- Department of Oncology, KU Leuven, Leuven, Belgium
| | | | - Lesia Sycza
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Minjeong Ko
- Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - David C Wedge
- Big Data Institute, University of Oxford, Oxford, UK
- Manchester Cancer Research Center, University of Manchester, Manchester, UK
| | - Quaid D Morris
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute, Toronto, Ontario, Canada
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kyle Ellrott
- Oregon Health and Sciences University, Portland, OR, USA.
| | - Peter Van Loo
- The Francis Crick Institute, London, UK.
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Paul C Boutros
- Department of Human Genetics, University of California, Los Angeles, CA, USA.
- Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA.
- Institute for Precision Health, University of California, Los Angeles, CA, USA.
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
- Department of Pharmacology and Toxicology, University of Toronto, Toronto, Ontario, Canada.
- Department of Urology, University of California, Los Angeles, CA, USA.
- Broad Stem Cell Research Center, University of California, Los Angeles, CA, USA.
- California NanoSystems Institute, University of California, Los Angeles, CA, USA.
| |
Collapse
|
7
|
Machaca V, Goyzueta V, Cruz MG, Sejje E, Pilco LM, López J, Túpac Y. Transformers meets neoantigen detection: a systematic literature review. J Integr Bioinform 2024; 21:jib-2023-0043. [PMID: 38960869 PMCID: PMC11377031 DOI: 10.1515/jib-2023-0043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 03/20/2024] [Indexed: 07/05/2024] Open
Abstract
Cancer immunology offers a new alternative to traditional cancer treatments, such as radiotherapy and chemotherapy. One notable alternative is the development of personalized vaccines based on cancer neoantigens. Moreover, Transformers are considered a revolutionary development in artificial intelligence with a significant impact on natural language processing (NLP) tasks and have been utilized in proteomics studies in recent years. In this context, we conducted a systematic literature review to investigate how Transformers are applied in each stage of the neoantigen detection process. Additionally, we mapped current pipelines and examined the results of clinical trials involving cancer vaccines.
Collapse
Affiliation(s)
| | | | | | - Erika Sejje
- Universidad Nacional de San Agustín, Arequipa, Perú
| | | | | | - Yván Túpac
- 187038 Universidad Católica San Pablo , Arequipa, Perú
| |
Collapse
|
8
|
Sergi A, Beltrame L, Marchini S, Masseroli M. Integrated approach to generate artificial samples with low tumor fraction for somatic variant calling benchmarking. BMC Bioinformatics 2024; 25:180. [PMID: 38720249 PMCID: PMC11077792 DOI: 10.1186/s12859-024-05793-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 04/19/2024] [Indexed: 05/12/2024] Open
Abstract
BACKGROUND High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling. RESULTS Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants. CONCLUSIONS Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.
Collapse
Affiliation(s)
- Aldo Sergi
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy.
- IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Milan, Rozzano, Italy.
| | - Luca Beltrame
- IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Milan, Rozzano, Italy
| | - Sergio Marchini
- IRCCS Humanitas Research Hospital, Via Manzoni 56, 20089, Milan, Rozzano, Italy
| | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| |
Collapse
|
9
|
Malamon JS, Farrell JJ, Xia LC, Dombroski BA, Das RG, Way J, Kuzma AB, Valladares O, Leung YY, Scanlon AJ, Lopez IAB, Brehony J, Worley KC, Zhang NR, Wang LS, Farrer LA, Schellenberg GD, Lee WP, Vardarajan BN. A comparative study of structural variant calling in WGS from Alzheimer's disease families. Life Sci Alliance 2024; 7:e202302181. [PMID: 38418088 PMCID: PMC10902710 DOI: 10.26508/lsa.202302181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 02/07/2024] [Accepted: 02/08/2024] [Indexed: 03/01/2024] Open
Abstract
Detecting structural variants (SVs) in whole-genome sequencing poses significant challenges. We present a protocol for variant calling, merging, genotyping, sensitivity analysis, and laboratory validation for generating a high-quality SV call set in whole-genome sequencing from the Alzheimer's Disease Sequencing Project comprising 578 individuals from 111 families. Employing two complementary pipelines, Scalpel and Parliament, for SV/indel calling, we assessed sensitivity through sample replicates (N = 9) with in silico variant spike-ins. We developed a novel metric, D-score, to evaluate caller specificity for deletions. The accuracy of deletions was evaluated by Sanger sequencing. We generated a high-quality call set of 152,301 deletions of diverse sizes. Sanger sequencing validated 114 of 146 detected deletions (78.1%). Scalpel excelled in accuracy for deletions ≤100 bp, whereas Parliament was optimal for deletions >900 bp. Overall, 83.0% and 72.5% of calls by Scalpel and Parliament were validated, respectively, including all 11 deletions called by both Parliament and Scalpel between 101 and 900 bp. Our flexible protocol successfully generated a high-quality deletion call set and a truth set of Sanger sequencing-validated deletions with precise breakpoints spanning 1-17,000 bp.
Collapse
Affiliation(s)
- John S Malamon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - John J Farrell
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
| | - Li Charlie Xia
- https://ror.org/03mtd9a03 Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Rueben G Das
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jessica Way
- Broad Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Amanda B Kuzma
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Otto Valladares
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Allison J Scanlon
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Irving Antonio Barrera Lopez
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Jack Brehony
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Kim C Worley
- https://ror.org/02pttbw34 Human Genome Sequencing Center, and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
| | - Nancy R Zhang
- Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Lindsay A Farrer
- Biomedical Genetics Section, Department of Medicine, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Neurology and Ophthalmology, Boston University School of Medicine, Boston University, Boston, MA, USA
- Departments of Epidemiology and Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Badri N Vardarajan
- https://ror.org/01esghr10 Gertrude H. Sergievsky Center and Taub Institute of Aging Brain, Department of Neurology, Columbia University Medical Center, New York, NY, USA
| |
Collapse
|
10
|
Lai J, Liu Y, Scharpf RB, Karchin R. Evaluation of simulation methods for tumor subclonal reconstruction. ARXIV 2024:arXiv:2402.09599v1. [PMID: 38410652 PMCID: PMC10896360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
Most neoplastic tumors originate from a single cell, and their evolution can be genetically traced through lineages characterized by common alterations such as small somatic mutations (SSMs), copy number alterations (CNAs), structural variants (SVs), and aneuploidies. Due to the complexity of these alterations in most tumors and the errors introduced by sequencing protocols and calling algorithms, tumor subclonal reconstruction algorithms are necessary to recapitulate the DNA sequence composition and tumor evolution in silico. With a growing number of these algorithms available, there is a pressing need for consistent and comprehensive benchmarking, which relies on realistic tumor sequencing generated by simulation tools. Here, we examine the current simulation methods, identifying their strengths and weaknesses, and provide recommendations for their improvement. Our review also explores potential new directions for research in this area. This work aims to serve as a resource for understanding and enhancing tumor genomic simulations, contributing to the advancement of the field.
Collapse
Affiliation(s)
- Jiaying Lai
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Yunzhou Liu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Robert B. Scharpf
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
- Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD
- Department of Oncology, Johns Hopkins Medical Institutions, Baltimore, MD
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
11
|
Kong D, Zhang S, Guo M, Li S, Wang Q, Gou J, Wu Y, Chen Y, Yang Y, Dai C, Tian Z, Wee ATS, Liu Y, Wei D. Ultra-Fast Single-Nucleotide-Variation Detection Enabled by Argonaute-Mediated Transistor Platform. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2307366. [PMID: 37805919 DOI: 10.1002/adma.202307366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Revised: 10/03/2023] [Indexed: 10/09/2023]
Abstract
"Test-and-go" single-nucleotide variation (SNV) detection within several minutes remains challenging, especially in low-abundance samples, since existing methods face a trade-off between sensitivity and testing speed. Sensitive detection usually relies on complex and time-consuming nucleic acid amplification or sequencing. Here, a graphene field-effect transistor (GFET) platform mediated by Argonaute protein that enables rapid, sensitive, and specific SNV detection is developed. The Argonaute protein provides a nanoscale binding channel to preorganize the DNA probe, accelerating target binding and rapidly recognizing SNVs with single-nucleotide resolution in unamplified tumor-associated microRNA, circulating tumor DNA, virus RNA, and reverse transcribed cDNA when a mismatch occurs in the seed region. An integrated microchip simultaneously detects multiple SNVs in agreement with sequencing results within 5 min, achieving the fastest SNV detection in a "test-and-go" manner without the requirement of nucleic acid extraction, reverse transcription, and amplification.
Collapse
Affiliation(s)
- Derong Kong
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Shen Zhang
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Mingquan Guo
- Department of Laboratory Medicine, Shanghai Public Health Clinical Center, Fudan University, Shanghai, 200433, P. R. China
| | - Shenwei Li
- Shanghai International Travel Healthcare Center, Shanghai, 200335, P. R. China
| | - Qiang Wang
- Shanghai International Travel Healthcare Center, Shanghai, 200335, P. R. China
| | - Jian Gou
- Department of Physics, National University of Singapore, Singapore, 117542, Singapore
| | - Yungen Wu
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Yiheng Chen
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Yuetong Yang
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
| | - Changhao Dai
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| | - Zhengan Tian
- Shanghai International Travel Healthcare Center, Shanghai, 200335, P. R. China
| | - Andrew Thye Shen Wee
- Department of Physics, National University of Singapore, Singapore, 117542, Singapore
| | - Yunqi Liu
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
- Institute of Chemistry, Chinese Academy of Sciences, Beijing, 100190, P. R. China
| | - Dacheng Wei
- State Key Laboratory of Molecular Engineering of Polymers, Department of Macromolecular Science, Fudan University, Shanghai, 200433, P. R. China
- Laboratory of Molecular Materials and Devices, Fudan University, Shanghai, 200433, P. R. China
| |
Collapse
|
12
|
Ostroverkhova D, Tyryshkin K, Beach AK, Moore EA, Masoudi-Sobhanzadeh Y, Barbari SR, Rogozin IB, Shaitan KV, Panchenko AR, Shcherbakova PV. DNA polymerase ε and δ variants drive mutagenesis in polypurine tracts in human tumors. Cell Rep 2024; 43:113655. [PMID: 38219146 PMCID: PMC10830898 DOI: 10.1016/j.celrep.2023.113655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 11/07/2023] [Accepted: 12/19/2023] [Indexed: 01/16/2024] Open
Abstract
Alterations in the exonuclease domain of DNA polymerase ε cause ultramutated cancers. These cancers accumulate AGA>ATA transversions; however, their genomic features beyond the trinucleotide motifs are obscure. We analyze the extended DNA context of ultramutation using whole-exome sequencing data from 524 endometrial and 395 colorectal tumors. We find that G>T transversions in POLE-mutant tumors predominantly affect sequences containing at least six consecutive purines, with a striking preference for certain positions within polypurine tracts. Using this signature, we develop a machine-learning classifier to identify tumors with hitherto unknown POLE drivers and validate two drivers, POLE-E978G and POLE-S461L, by functional assays in yeast. Unlike other pathogenic variants, the E978G substitution affects the polymerase domain of Pol ε. We further show that tumors with POLD1 drivers share the extended signature of POLE ultramutation. These findings expand the understanding of ultramutation mechanisms and highlight peculiar mutagenic properties of polypurine tracts in the human genome.
Collapse
Affiliation(s)
- Daria Ostroverkhova
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, Kingston, ON, Canada
| | - Kathrin Tyryshkin
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, Kingston, ON, Canada
| | - Annette K Beach
- Eppley Institute for Research in Cancer and Allied Diseases, Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, USA
| | - Elizabeth A Moore
- Eppley Institute for Research in Cancer and Allied Diseases, Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, USA
| | - Yosef Masoudi-Sobhanzadeh
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, Kingston, ON, Canada
| | - Stephanie R Barbari
- Eppley Institute for Research in Cancer and Allied Diseases, Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, USA
| | - Igor B Rogozin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Anna R Panchenko
- Department of Pathology and Molecular Medicine, School of Medicine, Queen's University, Kingston, ON, Canada.
| | - Polina V Shcherbakova
- Eppley Institute for Research in Cancer and Allied Diseases, Fred & Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE, USA.
| |
Collapse
|
13
|
Zhang T, Jia H, Song T, Lv L, Gulhan DC, Wang H, Guo W, Xi R, Guo H, Shen N. De novo identification of expressed cancer somatic mutations from single-cell RNA sequencing data. Genome Med 2023; 15:115. [PMID: 38111063 PMCID: PMC10726641 DOI: 10.1186/s13073-023-01269-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 12/04/2023] [Indexed: 12/20/2023] Open
Abstract
Identifying expressed somatic mutations from single-cell RNA sequencing data de novo is challenging but highly valuable. We propose RESA - Recurrently Expressed SNV Analysis, a computational framework to identify expressed somatic mutations from scRNA-seq data. RESA achieves an average precision of 0.77 on three in silico spike-in datasets. In extensive benchmarking against existing methods using 19 datasets, RESA consistently outperforms them. Furthermore, we applied RESA to analyze intratumor mutational heterogeneity in a melanoma drug resistance dataset. By enabling high precision detection of expressed somatic mutations, RESA substantially enhances the reliability of mutational analysis in scRNA-seq. RESA is available at https://github.com/ShenLab-Genomics/RESA .
Collapse
Affiliation(s)
- Tianyun Zhang
- Department of Hepatobiliary and Pancreatic Surgery of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
| | - Hanying Jia
- Department of Hepatobiliary and Pancreatic Surgery of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
- Kidney Disease Center, the First Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang, 311121, China
| | - Tairan Song
- Department of Hepatobiliary and Pancreatic Surgery of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
| | - Lin Lv
- Department of Hepatobiliary and Pancreatic Surgery of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
| | - Doga C Gulhan
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115, USA
| | - Haishuai Wang
- College of Computer Science, Zhejiang University, Hangzhou, 311121, Zhejiang, China
| | - Wei Guo
- Zhejiang University-University of Edinburgh Institute, School of Medicine, Zhejiang University, Jiaxing, 314400, China
| | - Ruibin Xi
- School of Mathematical Sciences and Center for Statistical Science, Peking University, 5 Yiheyuan Road, Beijing, 100871, China
| | - Hongshan Guo
- Department of Hepatobiliary and Pancreatic Surgery of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China
- Bone Marrow Transplantation Center, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, Zhejiang, China
| | - Ning Shen
- Department of Hepatobiliary and Pancreatic Surgery of the First Affiliated Hospital & Liangzhu Laboratory, Zhejiang University School of Medicine, Hangzhou, 311121, China.
| |
Collapse
|
14
|
Ha YJ, Kang S, Kim J, Kim J, Jo SY, Kim S. Comprehensive benchmarking and guidelines of mosaic variant calling strategies. Nat Methods 2023; 20:2058-2067. [PMID: 37828153 PMCID: PMC10703685 DOI: 10.1038/s41592-023-02043-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Accepted: 09/12/2023] [Indexed: 10/14/2023]
Abstract
Rapid advances in sequencing and analysis technologies have enabled the accurate detection of diverse forms of genomic variants represented as heterozygous, homozygous and mosaic mutations. However, the best practices for mosaic variant calling remain disorganized owing to the technical and conceptual difficulties faced in evaluation. Here we present our benchmark of 11 feasible mosaic variant detection approaches based on a systematically designed whole-exome-level reference standard that mimics mosaic samples, supported by 354,258 control positive mosaic single-nucleotide variants and insertion-deletion mutations and 33,111,725 control negatives. We identified not only the best practice for mosaic variant detection but also the condition-dependent strengths and weaknesses of the current methods. Furthermore, feature-level evaluation and their combinatorial usage across multiple algorithms direct the way for immediate to prolonged improvements in mosaic variant detection. Our results will guide researchers in selecting suitable calling algorithms and suggest future strategies for developers.
Collapse
Affiliation(s)
- Yoo-Jin Ha
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Seungseok Kang
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jisoo Kim
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Junhan Kim
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Se-Young Jo
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Sangwoo Kim
- Translational Genome Informatics Laboratory, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Republic of Korea.
- Brain Korea 21 PLUS Project for Medical Science, Yonsei University College of Medicine, Seoul, Republic of Korea.
- POSTECH Biotechnology Center, Pohang University of Science and Technology, Pohang, Republic of Korea.
| |
Collapse
|
15
|
Zhang H, Lundberg M, Tarka M, Hasselquist D, Hansson B. Evidence of Site-Specific and Male-Biased Germline Mutation Rate in a Wild Songbird. Genome Biol Evol 2023; 15:evad180. [PMID: 37793164 PMCID: PMC10627410 DOI: 10.1093/gbe/evad180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/07/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023] Open
Abstract
Germline mutations are the ultimate source of genetic variation and the raw material for organismal evolution. Despite their significance, the frequency and genomic locations of mutations, as well as potential sex bias, are yet to be widely investigated in most species. To address these gaps, we conducted whole-genome sequencing of 12 great reed warblers (Acrocephalus arundinaceus) in a pedigree spanning 3 generations to identify single-nucleotide de novo mutations (DNMs) and estimate the germline mutation rate. We detected 82 DNMs within the pedigree, primarily enriched at CpG sites but otherwise randomly located along the chromosomes. Furthermore, we observed a pronounced sex bias in DNM occurrence, with male warblers exhibiting three times more mutations than females. After correction for false negatives and adjusting for callable sites, we obtained a mutation rate of 7.16 × 10-9 mutations per site per generation (m/s/g) for the autosomes and 5.10 × 10-9 m/s/g for the Z chromosome. To demonstrate the utility of species-specific mutation rates, we applied our autosomal mutation rate in models reconstructing the demographic history of the great reed warbler. We uncovered signs of drastic population size reductions predating the last glacial period (LGP) and reduced gene flow between western and eastern populations during the LGP. In conclusion, our results provide one of the few direct estimates of the mutation rate in wild songbirds and evidence for male-driven mutations in accordance with theoretical expectations.
Collapse
Affiliation(s)
- Hongkai Zhang
- Department of Biology, Lund University, Lund, Sweden
| | - Max Lundberg
- Department of Biology, Lund University, Lund, Sweden
| | - Maja Tarka
- Department of Biology, Lund University, Lund, Sweden
| | | | - Bengt Hansson
- Department of Biology, Lund University, Lund, Sweden
| |
Collapse
|
16
|
Burda K, Konczal M. Validation of machine learning approach for direct mutation rate estimation. Mol Ecol Resour 2023; 23:1757-1771. [PMID: 37486035 DOI: 10.1111/1755-0998.13841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 06/16/2023] [Accepted: 07/05/2023] [Indexed: 07/25/2023]
Abstract
Mutations are the primary source of all genetic variation. Knowledge about their rates is critical for any evolutionary genetic analyses, but for a long time, that knowledge has remained elusive and indirectly inferred. In recent years, parent-offspring comparisons have yielded the first direct mutation rate estimates. The analyses are, however, challenging due to high rate of false positives and no consensus regarding standardized filtering of candidate de novo mutations. Here, we validate the application of a machine learning approach for such a task and estimate the mutation rate for the guppy (Poecilia reticulata), a model species in eco-evolutionary studies. We sequenced 4 parents and 20 offspring, followed by screening their genomes for de novo mutations. The initial large number of candidate de novo mutations was hard-filtered to remove false-positive results. These results were compared with mutation rate estimated with a supervised machine learning approach. Both approaches were followed by molecular validation of all candidate de novo mutations and yielded similar results. The ML method uniquely identified three mutations, but overall required more hands-on curation and had higher rates of false positives and false negatives. Both methods concordantly showed no difference in mutation rates between families. Estimated here the guppy mutation rate is among the lowest directly estimated mutation rates in vertebrates; however, previous research has also found low estimated rates in other teleost fishes. We discuss potential explanations for such a pattern, as well as future utility and limitations of machine learning approaches.
Collapse
Affiliation(s)
- Katarzyna Burda
- Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| | - Mateusz Konczal
- Evolutionary Biology Group, Faculty of Biology, Adam Mickiewicz University, Poznań, Poland
| |
Collapse
|
17
|
Menzel M, Ossowski S, Kral S, Metzger P, Horak P, Marienfeld R, Boerries M, Wolter S, Ball M, Neumann O, Armeanu-Ebinger S, Schroeder C, Matysiak U, Goldschmid H, Schipperges V, Fürstberger A, Allgäuer M, Eberhardt T, Niewöhner J, Blaumeiser A, Ploeger C, Haack TB, Tay TKY, Kelemen O, Pauli T, Kirchner M, Kluck K, Ott A, Renner M, Admard J, Gschwind A, Lassmann S, Kestler H, Fend F, Illert AL, Werner M, Möller P, Seufferlein TTW, Malek N, Schirmacher P, Fröhling S, Kazdal D, Budczies J, Stenzinger A. Multicentric pilot study to standardize clinical whole exome sequencing (WES) for cancer patients. NPJ Precis Oncol 2023; 7:106. [PMID: 37864096 PMCID: PMC10589320 DOI: 10.1038/s41698-023-00457-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 09/26/2023] [Indexed: 10/22/2023] Open
Abstract
A growing number of druggable targets and national initiatives for precision oncology necessitate broad genomic profiling for many cancer patients. Whole exome sequencing (WES) offers unbiased analysis of the entire coding sequence, segmentation-based detection of copy number alterations (CNAs), and accurate determination of complex biomarkers including tumor mutational burden (TMB), homologous recombination repair deficiency (HRD), and microsatellite instability (MSI). To assess the inter-institution variability of clinical WES, we performed a comparative pilot study between German Centers of Personalized Medicine (ZPMs) from five participating institutions. Tumor and matched normal DNA from 30 patients were analyzed using custom sequencing protocols and bioinformatic pipelines. Calling of somatic variants was highly concordant with a positive percentage agreement (PPA) between 91 and 95% and a positive predictive value (PPV) between 82 and 95% compared with a three-institution consensus and full agreement for 16 of 17 druggable targets. Explanations for deviations included low VAF or coverage, differing annotations, and different filter protocols. CNAs showed overall agreement in 76% for the genomic sequence with high wet-lab variability. Complex biomarkers correlated strongly between institutions (HRD: 0.79-1, TMB: 0.97-0.99) and all institutions agreed on microsatellite instability. This study will contribute to the development of quality control frameworks for comprehensive genomic profiling and sheds light onto parameters that require stringent standardization.
Collapse
Affiliation(s)
- Michael Menzel
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Stephan Ossowski
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics (IBMI), University of Tübingen, Tübingen, Germany
| | - Sebastian Kral
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Patrick Metzger
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Peter Horak
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Ralf Marienfeld
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
| | - Melanie Boerries
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- Comprehensive Cancer Center Freiburg (CCCF), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Steffen Wolter
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Markus Ball
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Olaf Neumann
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Sorin Armeanu-Ebinger
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Christopher Schroeder
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Uta Matysiak
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Hannah Goldschmid
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Vincent Schipperges
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Axel Fürstberger
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
- Institute of Medical Systems Biology, Ulm University, Ulm, Germany
| | - Michael Allgäuer
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Timo Eberhardt
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
| | - Jakob Niewöhner
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
| | - Andreas Blaumeiser
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Carolin Ploeger
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Tobias Bernd Haack
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Timothy Kwang Yong Tay
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Department of Anatomical Pathology, Singapore General Hospital, Singapore, Singapore
| | - Olga Kelemen
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Thomas Pauli
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- Institute of Medical Bioinformatics and Systems Medicine (IBSM), Medical Center - University of Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Martina Kirchner
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Klaus Kluck
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Alexander Ott
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Marcus Renner
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
| | - Jakob Admard
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Axel Gschwind
- Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany
- Center for Personalized Medicine (ZPM), Tübingen, Germany
| | - Silke Lassmann
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
| | - Hans Kestler
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
- Center for Personalized Medicine (ZPM), Ulm, Germany
| | - Falko Fend
- Institute of Pathology and Neuropathology, University Hospital Tübingen, Tübingen, Germany
| | - Anna Lena Illert
- Department of Medicine I, Medical Center-University of Freiburg, Faculty of Medicine, University of Freiburg, 79085, Freiburg, Germany
- Medical Department for Hematology and Oncology, Klinikum Rechts der Isar, Technische Universität München, 80333, Munich, Germany
- German Cancer Consortium (DKTK) Partner Site Munich, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Martin Werner
- Institute for Surgical Pathology, Medical Center, University of Freiburg, Freiburg, Germany
- Center for Personalized Medicine (ZPM), Freiburg, Germany
- German Cancer Consortium (DKTK) Partner Site Freiburg, and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Peter Möller
- Institute of Pathology, University Hospital Ulm, Ulm, Germany
| | | | - Nisar Malek
- Center for Personalized Medicine (ZPM), Tübingen, Germany
- Department of Internal Medicine I, University Hospital Tübingen, Tübingen, Germany
| | - Peter Schirmacher
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Stefan Fröhling
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
- Division of Translational Medical Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany
- German Cancer Consortium (DKTK), Heidelberg, Germany
| | - Daniel Kazdal
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany
- Center for Personalized Medicine (ZPM), Heidelberg, Germany
| | - Jan Budczies
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.
- Center for Personalized Medicine (ZPM), Heidelberg, Germany.
- German Cancer Consortium (DKTK), Heidelberg, Germany.
| | - Albrecht Stenzinger
- Institute of Pathology, Heidelberg University Hospital, Heidelberg, Germany.
- Center for Personalized Medicine (ZPM), Heidelberg, Germany.
- German Cancer Consortium (DKTK), Heidelberg, Germany.
| |
Collapse
|
18
|
O’Sullivan B, Seoighe C. Comprehensive and realistic simulation of tumour genomic sequencing data. NAR Cancer 2023; 5:zcad051. [PMID: 37746635 PMCID: PMC10516706 DOI: 10.1093/narcan/zcad051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 08/25/2023] [Accepted: 09/08/2023] [Indexed: 09/26/2023] Open
Abstract
Accurate identification of somatic mutations and allele frequencies in cancer has critical research and clinical applications. Several computational tools have been developed for this purpose but, in the absence of comprehensive 'ground truth' data, assessing the accuracy of these methods is challenging. We created a computational framework to simulate tumour and matched normal sequencing data for which the source of all loci that contain non-reference bases is known, based on a phased, personalized genome. Unlike existing methods, we account for sampling errors inherent in the sequencing process. Using this framework, we assess accuracy and biases in inferred mutations and their frequencies in an established somatic mutation calling pipeline. We demonstrate bias in existing methods of mutant allele frequency estimation and show, for the first time, the observed mutation frequency spectrum corresponding to a theoretical model of tumour evolution. We highlight the impact of quality filters on detection sensitivity of clinically actionable variants and provide definitive assessment of false positive and false negative mutation calls. Our simulation framework provides an improved means to assess the accuracy of somatic mutation calling pipelines and a detailed picture of the effects of technical parameters and experimental factors on somatic mutation calling in cancer samples.
Collapse
Affiliation(s)
- Brian O’Sullivan
- School of Mathematical and Statistical Sciences, University of Galway, University Road, Galway H91 TK33, Ireland
| | - Cathal Seoighe
- School of Mathematical and Statistical Sciences, University of Galway, University Road, Galway H91 TK33, Ireland
| |
Collapse
|
19
|
Li Y, Dou Y, Da Veiga Leprevost F, Geffen Y, Calinawan AP, Aguet F, Akiyama Y, Anand S, Birger C, Cao S, Chaudhary R, Chilappagari P, Cieslik M, Colaprico A, Zhou DC, Day C, Domagalski MJ, Esai Selvan M, Fenyö D, Foltz SM, Francis A, Gonzalez-Robles T, Gümüş ZH, Heiman D, Holck M, Hong R, Hu Y, Jaehnig EJ, Ji J, Jiang W, Katsnelson L, Ketchum KA, Klein RJ, Lei JT, Liang WW, Liao Y, Lindgren CM, Ma W, Ma L, MacCoss MJ, Martins Rodrigues F, McKerrow W, Nguyen N, Oldroyd R, Pilozzi A, Pugliese P, Reva B, Rudnick P, Ruggles KV, Rykunov D, Savage SR, Schnaubelt M, Schraink T, Shi Z, Singhal D, Song X, Storrs E, Terekhanova NV, Thangudu RR, Thiagarajan M, Wang LB, Wang JM, Wang Y, Wen B, Wu Y, Wyczalkowski MA, Xin Y, Yao L, Yi X, Zhang H, Zhang Q, Zuhl M, Getz G, Ding L, Nesvizhskii AI, Wang P, Robles AI, Zhang B, Payne SH. Proteogenomic data and resources for pan-cancer analysis. Cancer Cell 2023; 41:1397-1406. [PMID: 37582339 PMCID: PMC10506762 DOI: 10.1016/j.ccell.2023.06.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 11/15/2022] [Accepted: 06/27/2023] [Indexed: 08/17/2023]
Abstract
The National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium (CPTAC) investigates tumors from a proteogenomic perspective, creating rich multi-omics datasets connecting genomic aberrations to cancer phenotypes. To facilitate pan-cancer investigations, we have generated harmonized genomic, transcriptomic, proteomic, and clinical data for >1000 tumors in 10 cohorts to create a cohesive and powerful dataset for scientific discovery. We outline efforts by the CPTAC pan-cancer working group in data harmonization, data dissemination, and computational resources for aiding biological discoveries. We also discuss challenges for multi-omics data integration and analysis, specifically the unique challenges of working with both nucleotide sequencing and mass spectrometry proteomics data.
Collapse
Affiliation(s)
- Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Yongchao Dou
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Yifat Geffen
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Anna P Calinawan
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - François Aguet
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Yo Akiyama
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Shankara Anand
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Chet Birger
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | | | | | - Marcin Cieslik
- Department of Computational Medicine & Bioinformatics, Department of Pathology, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Antonio Colaprico
- Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Daniel Cui Zhou
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Corbin Day
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | | | - Myvizhi Esai Selvan
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Fenyö
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Steven M Foltz
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | | | - Tania Gonzalez-Robles
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Zeynep H Gümüş
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - David Heiman
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | | | - Runyu Hong
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Yingwei Hu
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Eric J Jaehnig
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Jiayi Ji
- Tisch Cancer Institute and Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Wen Jiang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Lizabeth Katsnelson
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | | | - Robert J Klein
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Jonathan T Lei
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Wen-Wei Liang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Yuxing Liao
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Caleb M Lindgren
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Weiping Ma
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Lei Ma
- ICF, Rockville, MD 20850, USA
| | - Michael J MacCoss
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Fernanda Martins Rodrigues
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Wilson McKerrow
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | | | - Robert Oldroyd
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | | | - Pietro Pugliese
- Department of Sciences and Technologies, University of Sannio, Benevento 82100, Italy
| | - Boris Reva
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Paul Rudnick
- Spectragen Informatics, Bainbridge Island, WA 98110, USA
| | - Kelly V Ruggles
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Dmitry Rykunov
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sara R Savage
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Michael Schnaubelt
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Tobias Schraink
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Zhiao Shi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | | | - Xiaoyu Song
- Tisch Cancer Institute and Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Erik Storrs
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Nadezhda V Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | | | | | - Liang-Bo Wang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Joshua M Wang
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Ying Wang
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Bo Wen
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yige Wu
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Yi Xin
- ICF, Rockville, MD 20850, USA
| | - Lijun Yao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Xinpei Yi
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hui Zhang
- Department of Pathology, Johns Hopkins University, Baltimore, MD 21231, USA
| | - Qing Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA
| | | | - Gad Getz
- Broad Institute of MIT and Harvard, Cambridge, MA 02141, USA; Cancer Center and Department of Pathology, Mass. General Hospital, Boston, MA 02114, USA; Harvard Medical School, Boston, MA 02115, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63130, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63130, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63130, USA
| | | | - Pei Wang
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ana I Robles
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA.
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX 77030, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT 84602, USA.
| |
Collapse
|
20
|
Lin Y, Darolti I, van der Bijl W, Morris J, Mank JE. Extensive variation in germline de novo mutations in Poecilia reticulata. Genome Res 2023; 33:1317-1324. [PMID: 37442578 PMCID: PMC10547258 DOI: 10.1101/gr.277936.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 07/07/2023] [Indexed: 07/15/2023]
Abstract
The rate of germline mutation is fundamental to evolutionary processes, as it generates the variation upon which selection acts. The guppy, Poecilia reticulata, is a model of rapid adaptation, however the relative contribution of standing genetic variation versus de novo mutation (DNM) to evolution in this species remains unclear. Here, we use pedigree-based approaches to quantify and characterize germline DNMs in three large guppy families. Our results suggest germline mutation rate in the guppy varies substantially across individuals and families. Most DNMs are shared across multiple siblings, suggesting they arose during early embryonic development. DNMs are randomly distributed throughout the genome, and male-biased mutation rate is low, as would be expected from the short guppy generation time. Overall, our study shows remarkable variation in germline mutation rate and provides insights into rapid evolution of guppies.
Collapse
Affiliation(s)
- Yuying Lin
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada;
| | - Iulia Darolti
- Department of Ecology and Evolution, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Wouter van der Bijl
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jake Morris
- School of Biological Science, University of Bristol, Bristol BS8 1TQ, United Kingdom
| | - Judith E Mank
- Department of Zoology and Biodiversity Research Centre, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| |
Collapse
|
21
|
Srivatsa A, Lei H, Schwartz R. A Clonal Evolution Simulator for Planning Somatic Evolution Studies. J Comput Biol 2023; 30:831-847. [PMID: 37184853 PMCID: PMC10457648 DOI: 10.1089/cmb.2023.0086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
Somatic evolution plays a key role in development, cell differentiation, and normal aging, but also in diseases such as cancer. Understanding mechanisms of somatic mutability and how they can vary between cell lineages will likely play a crucial role in biological discovery and medical applications. This need has led to a proliferation of new technologies for profiling single-cell variation, each with distinctive capabilities and limitations that can be leveraged alone or in combination with other technologies. The enormous space of options for assaying somatic variation, however, presents unsolved informatics problems with regard to selecting optimal combinations of technologies for designing appropriate studies for any particular scientific questions. Versatile simulation tools are needed to explore and optimize potential study designs if researchers are to deploy multiomic technologies most effectively. In this study, we present a simulator allowing for the generation of synthetic data from a wide range of clonal lineages, variant classes, and sequencing technology choices, intended to provide a platform for effective study design in somatic lineage analysis. Users can input various properties of the somatic evolutionary system, mutation classes, and biotechnology options, and then generate samples of synthetic sequence reads and their corresponding ground truth parameters for a given study design. We demonstrate the utility of the simulator for testing and optimizing study designs for various experimental queries.
Collapse
Affiliation(s)
- Arjun Srivatsa
- Department of Computational Biology, and Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Haoyun Lei
- Department of Computational Biology, and Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Russell Schwartz
- Department of Computational Biology, and Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
22
|
Huang L, Wang D, Chen H, Hu J, Dai X, Liu C, Li A, Shen X, Qi C, Sun H, Zhang D, Chen T, Jiang Y. CRISPR-detector: fast and accurate detection, visualization, and annotation of genome-wide mutations induced by genome editing events. J Genet Genomics 2023; 50:563-572. [PMID: 37003351 DOI: 10.1016/j.jgg.2023.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/05/2023] [Accepted: 03/08/2023] [Indexed: 04/01/2023]
Abstract
The leading-edge CRISPR/CRISPR-associated technology is revolutionizing biotechnologies through genome editing. To track on/off-target events with emerging new editing techniques, improved bioinformatic tools are indispensable. Existing tools suffer from limitations in speed and scalability, especially with whole-genome sequencing (WGS) data analysis. To address these limitations, we have developed a comprehensive tool called CRISPR-detector, a web-based and locally deployable pipeline for genome editing sequence analysis. The core analysis module of CRISPR-detector is based on the Sentieon TNscope pipeline, with additional novel annotation and visualization modules designed to fit CRISPR applications. Co-analysis of the treated and control samples is performed to remove existing background variants prior to genome editing. CRISPR-detector offers optimized scalability, enabling WGS data analysis beyond Browser Extensible Data file-defined regions, with improved accuracy due to haplotype-based variant calling to handle sequencing errors. In addition, the tool also provides integrated structural variation calling and includes functional and clinical annotations of editing-induced mutations appreciated by users. These advantages facilitate rapid and efficient detection of mutations induced by genome editing events, especially for datasets generated from WGS. The web-based version of CRISPR-detector is available at https://db.cngb.org/crispr-detector, and the locally deployable version is available at https://github.com/hlcas/CRISPR-detector.
Collapse
Affiliation(s)
- Lei Huang
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - Dan Wang
- BNU-HKBU United International College, Zhuhai, Guangdong 519087, China.
| | | | - Jinnan Hu
- Sentieon Inc, San Jose, CA 94042, USA
| | - Xuechen Dai
- BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - Chuan Liu
- BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - Anduo Li
- BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | - Xuechun Shen
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
| | - Chen Qi
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Haixi Sun
- BGI-Shenzhen, Shenzhen, Guangdong 518083, China
| | | | - Tong Chen
- National Resource Center for Chinese Materia Medica, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| | - Yuan Jiang
- BGI-Shenzhen, Shenzhen, Guangdong 518083, China.
| |
Collapse
|
23
|
Jeon H, Ahn J, Na B, Hong S, Sael L, Kim S, Yoon S, Baek D. AIVariant: a deep learning-based somatic variant detector for highly contaminated tumor samples. Exp Mol Med 2023; 55:1734-1742. [PMID: 37524869 PMCID: PMC10474289 DOI: 10.1038/s12276-023-01049-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/10/2023] [Accepted: 04/24/2023] [Indexed: 08/02/2023] Open
Abstract
The detection of somatic DNA variants in tumor samples with low tumor purity or sequencing depth remains a daunting challenge despite numerous attempts to address this problem. In this study, we constructed a substantially extended set of actual positive variants originating from a wide range of tumor purities and sequencing depths, as well as actual negative variants derived from sequencer-specific sequencing errors. A deep learning model named AIVariant, trained on this extended dataset, outperforms previously reported methods when tested under various tumor purities and sequencing depths, especially low tumor purity and sequencing depth.
Collapse
Affiliation(s)
- Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea
- Genome4me Inc., Seoul, 08826, Republic of Korea
| | - Junhak Ahn
- Genome4me Inc., Seoul, 08826, Republic of Korea
- School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea
| | - Byunggook Na
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Soona Hong
- AIGENDRUG Co., Ltd., Seoul, 08826, Republic of Korea
| | - Lee Sael
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, Republic of Korea
| | - Sun Kim
- Department of Computer Science and Engineering, Seoul National University, Seoul, 08826, Republic of Korea
| | - Sungroh Yoon
- Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, Republic of Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea
| | - Daehyun Baek
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, Republic of Korea.
- Genome4me Inc., Seoul, 08826, Republic of Korea.
- School of Biological Sciences, Seoul National University, Seoul, 08826, Republic of Korea.
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, 08826, Republic of Korea.
| |
Collapse
|
24
|
Lin LH, Chang KW, Cheng HW, Liu CJ. Identification of Somatic Mutations in Plasma Cell-Free DNA from Patients with Metastatic Oral Squamous Cell Carcinoma. Int J Mol Sci 2023; 24:10408. [PMID: 37373553 DOI: 10.3390/ijms241210408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 06/01/2023] [Accepted: 06/15/2023] [Indexed: 06/29/2023] Open
Abstract
The accurate diagnosis and treatment of oral squamous cell carcinoma (OSCC) requires an understanding of its genomic alterations. Liquid biopsies, especially cell-free DNA (cfDNA) analysis, are a minimally invasive technique used for genomic profiling. We conducted comprehensive whole-exome sequencing (WES) of 50 paired OSCC cell-free plasma with whole blood samples using multiple mutation calling pipelines and filtering criteria. Integrative Genomics Viewer (IGV) was used to validate somatic mutations. Mutation burden and mutant genes were correlated to clinico-pathological parameters. The plasma mutation burden of cfDNA was significantly associated with clinical staging and distant metastasis status. The genes TTN, PLEC, SYNE1, and USH2A were most frequently mutated in OSCC, and known driver genes, including KMT2D, LRP1B, TRRAP, and FLNA, were also significantly and frequently mutated. Additionally, the novel mutated genes CCDC168, HMCN2, STARD9, and CRAMP1 were significantly and frequently present in patients with OSCC. The mutated genes most frequently found in patients with metastatic OSCC were RORC, SLC49A3, and NUMBL. Further analysis revealed that branched-chain amino acid (BCAA) catabolism, extracellular matrix-receptor interaction, and the hypoxia-related pathway were associated with OSCC prognosis. Choline metabolism in cancer, O-glycan biosynthesis, and protein processing in the endoplasmic reticulum pathway were associated with distant metastatic status. About 20% of tumors carried at least one aberrant event in BCAA catabolism signaling that could possibly be targeted by an approved therapeutic agent. We identified molecular-level OSCC that were correlated with etiology and prognosis while defining the landscape of major altered events of the OSCC plasma genome. These findings will be useful in the design of clinical trials for targeted therapies and the stratification of patients with OSCC according to therapeutic efficacy.
Collapse
Affiliation(s)
- Li-Han Lin
- Department of Medical Research, MacKay Memorial Hospital No. 92, Sec. 2, Chung San N. Rd., Taipei 10449, Taiwan
| | - Kuo-Wei Chang
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
- Department of Stomatology, Taipei Veterans General Hospital, Taipei 11121, Taiwan
| | - Hui-Wen Cheng
- Department of Medical Research, MacKay Memorial Hospital No. 92, Sec. 2, Chung San N. Rd., Taipei 10449, Taiwan
| | - Chung-Ji Liu
- Department of Medical Research, MacKay Memorial Hospital No. 92, Sec. 2, Chung San N. Rd., Taipei 10449, Taiwan
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei 11221, Taiwan
- Department of Oral and Maxillofacial Surgery, Taipei MacKay Memorial Hospital, Taipei 10449, Taiwan
| |
Collapse
|
25
|
Yang X, Xu X, Breuss MW, Antaki D, Ball LL, Chung C, Shen J, Li C, George RD, Wang Y, Bae T, Cheng Y, Abyzov A, Wei L, Alexandrov LB, Sebat JL, Gleeson JG. Control-independent mosaic single nucleotide variant detection with DeepMosaic. Nat Biotechnol 2023; 41:870-877. [PMID: 36593400 PMCID: PMC10314968 DOI: 10.1038/s41587-022-01559-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 10/10/2022] [Indexed: 01/04/2023]
Abstract
Mosaic variants (MVs) reflect mutagenic processes during embryonic development and environmental exposure, accumulate with aging and underlie diseases such as cancer and autism. The detection of noncancer MVs has been computationally challenging due to the sparse representation of nonclonally expanded MVs. Here we present DeepMosaic, combining an image-based visualization module for single nucleotide MVs and a convolutional neural network-based classification module for control-independent MV detection. DeepMosaic was trained on 180,000 simulated or experimentally assessed MVs, and was benchmarked on 619,740 simulated MVs and 530 independent biologically tested MVs from 16 genomes and 181 exomes. DeepMosaic achieved higher accuracy compared with existing methods on biological data, with a sensitivity of 0.78, specificity of 0.83 and positive predictive value of 0.96 on noncancer whole-genome sequencing data, as well as doubling the validation rate over previous best-practice methods on noncancer whole-exome sequencing data (0.43 versus 0.18). DeepMosaic represents an accurate MV classifier for noncancer samples that can be implemented as an alternative or complement to existing methods.
Collapse
Affiliation(s)
- Xiaoxu Yang
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA.
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.
| | - Xin Xu
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Martin W Breuss
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
- Department of Pediatrics, Section of Genetics and Metabolism, University of Colorado School of Medicine, Aurora, CO, USA
| | - Danny Antaki
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Laurel L Ball
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Changuk Chung
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Jiawei Shen
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Chen Li
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Renee D George
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA
| | - Yifan Wang
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Taejeong Bae
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Yuhe Cheng
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, USA
| | - Alexej Abyzov
- Department of Quantitative Health Sciences, Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
| | - Liping Wei
- Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing, China
| | - Ludmil B Alexandrov
- Department of Cellular and Molecular Medicine, UC San Diego, La Jolla, CA, USA
- Department of Bioengineering, UC San Diego, La Jolla, CA, USA
- Moores Cancer Center, UC San Diego, La Jolla, CA, USA
| | - Jonathan L Sebat
- Beyster Center for Genomics of Psychiatric Diseases, University of California, San Diego, La Jolla, CA, USA
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, USA
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA
| | - Joseph G Gleeson
- Department of Neurosciences, University of California, San Diego, La Jolla, CA, USA.
- Rady Children's Institute for Genomic Medicine, San Diego, CA, USA.
| |
Collapse
|
26
|
Wang Y, Wang J, Fang W, Xiao X, Wang Q, Zhao J, Liu J, Yang S, Liu Y, Lai X, Song X. TMBserval: a statistical explainable learning model reveals weighted tumor mutation burden better categorizing therapeutic benefits. Front Immunol 2023; 14:1151755. [PMID: 37234148 PMCID: PMC10208409 DOI: 10.3389/fimmu.2023.1151755] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Accepted: 04/24/2023] [Indexed: 05/27/2023] Open
Abstract
A high tumor mutation burden (TMB) is known to drive the response to immune checkpoint inhibitors (ICI) and is associated with favorable prognoses. However, because it is a one-dimensional numerical representation of non-synonymous genetic alterations, TMB suffers from clinical challenges due to its equal quantification. Since not all mutations elicit the same antitumor rejection, the effect on immunity of neoantigens encoded by different types or locations of somatic mutations may vary. In addition, other typical genomic features, including complex structural variants, are not captured by the conventional TMB metric. Given the diversity of cancer subtypes and the complexity of treatment regimens, this paper proposes that tumor mutations capable of causing various degrees of immunogenicity should be calculated separately. TMB should therefore, be segmented into more exact, higher dimensional feature vectors to exhaustively measure the foreignness of tumors. We systematically reviewed patients' multifaceted efficacy based on a refined TMB metric, investigated the association between multidimensional mutations and integrative immunotherapy outcomes, and developed a convergent categorical decision-making framework, TMBserval (Statistical Explainable machine learning with Regression-based VALidation). TMBserval integrates a multiple-instance learning concept with statistics to create a statistically interpretable model that addresses the broad interdependencies between multidimensional mutation burdens and decision endpoints. TMBserval is a pan-cancer-oriented many-to-many nonlinear regression model with discrimination and calibration power. Simulations and experimental analyses using data from 137 actual patients both demonstrated that our method could discriminate between patient groups in a high-dimensional feature space, thereby rationally expanding the beneficiary population of immunotherapy.
Collapse
Affiliation(s)
- Yixuan Wang
- Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Jiayin Wang
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Wenfeng Fang
- State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, China
| | - Xiao Xiao
- Genomics Institute, Geneplus-Shenzhen, Shenzhen, China
| | - Quan Wang
- Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Jian Zhao
- Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Jingjing Liu
- Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Shuanying Yang
- Department of Respiratory and Critical Care Medicine, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, China
| | - Yuqian Liu
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Xin Lai
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, China
| | - Xiaofeng Song
- Department of Biomedical Engineering, College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| |
Collapse
|
27
|
Hoskins I, Sun S, Cote A, Roth FP, Cenik C. satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect. Genome Biol 2023; 24:82. [PMID: 37081510 PMCID: PMC10116734 DOI: 10.1186/s13059-023-02922-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 04/04/2023] [Indexed: 04/22/2023] Open
Abstract
The impact of millions of individual genetic variants on molecular phenotypes in coding sequences remains unknown. Multiplexed assays of variant effect (MAVEs) are scalable methods to annotate relevant variants, but existing software lacks standardization, requires cumbersome configuration, and does not scale to large targets. We present satmut_utils as a flexible solution for simulation and variant quantification. We then benchmark MAVE software using simulated and real MAVE data. We finally determine mRNA abundance for thousands of cystathionine beta-synthase variants using two experimental methods. The satmut_utils package enables high-performance analysis of MAVEs and reveals the capability of variants to alter mRNA abundance.
Collapse
Affiliation(s)
- Ian Hoskins
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA
| | - Song Sun
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Atina Cote
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Frederick P Roth
- The Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, ON, Canada
| | - Can Cenik
- Department of Molecular Biosciences, University of Texas at Austin, Austin, TX, 78712, USA.
| |
Collapse
|
28
|
Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet 2023:10.1038/s41576-023-00590-0. [PMID: 37059810 DOI: 10.1038/s41576-023-00590-0] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/22/2023] [Indexed: 04/16/2023]
Abstract
Genetic variant calling from DNA sequencing has enabled understanding of germline variation in hundreds of thousands of humans. Sequencing technologies and variant-calling methods have advanced rapidly, routinely providing reliable variant calls in most of the human genome. We describe how advances in long reads, deep learning, de novo assembly and pangenomes have expanded access to variant calls in increasingly challenging, repetitive genomic regions, including medically relevant regions, and how new benchmark sets and benchmarking methods illuminate their strengths and limitations. Finally, we explore the possible future of more complete characterization of human genome variation in light of the recent completion of a telomere-to-telomere human genome reference assembly and human pangenomes, and we consider the innovations needed to benchmark their newly accessible repetitive regions and complex variants.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan Dwarshuis
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Karen H Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, USA
| | | | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| |
Collapse
|
29
|
Honma H, Takahashi N, Arisue N, Sugishita T. Analysis of genome instability and implications for the consequent phenotype in Plasmodium falciparum containing mutated MSH2-1 (P513T). Microb Genom 2023; 9. [PMID: 37083479 DOI: 10.1099/mgen.0.001003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023] Open
Abstract
Malarial parasites exhibit extensive genomic plasticity, which induces the antigen diversification and the development of antimalarial drug resistance. Only a few studies have examined the genome maintenance mechanisms of parasites. The study aimed at elucidating the impact of a mutation in a DNA mismatch repair gene on genome stability by maintaining the mutant and wild-type parasites through serial in vitro cultures for approximately 400 days and analysing the subsequent spontaneous mutations. A P513T mutant of the DNA mismatch repair protein PfMSH2-1 from Plasmodium falciparum 3D7 was created. The mutation did not influence the base substitution rate but significantly increased the insertion/deletion (indel) mutation rate in short tandem repeats (STRs) and minisatellite loci. STR mutability was affected by allele size, genomic category and certain repeat motifs. In the mutants, significant telomere healing and homologous recombination at chromosomal ends caused extensive gene loss and generation of chimeric genes, resulting in large-scale chromosomal alteration. Additionally, the mutant showed increased tolerance to N-methyl-N'-nitro-N-nitrosoguanidine, suggesting that PfMSH2-1 was involved in recognizing DNA methylation damage. This work provides valuable insights into the role of PfMSH2-1 in genome stability and demonstrates that the genomic destabilization caused by its dysfunction may lead to antigen diversification.
Collapse
Affiliation(s)
- Hajime Honma
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
- Department of International Affairs and Tropical Medicine, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| | - Nobuyuki Takahashi
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
- Department of International Affairs and Tropical Medicine, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| | - Nobuko Arisue
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| | - Tomohiko Sugishita
- Section of Global Health, Division of Public Health, Department of Hygiene and Public Health, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
- Department of International Affairs and Tropical Medicine, Tokyo Women's Medical University, 8-1 Kawada-cho, Shinjuku, Tokyo 162-8666, Japan
| |
Collapse
|
30
|
Establishment of four head and neck squamous cell carcinoma cell lines: importance of reference DNA for accurate genomic characterisation. J Laryngol Otol 2023; 137:301-307. [PMID: 35317874 PMCID: PMC9975763 DOI: 10.1017/s0022215122000846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
OBJECTIVE There is significant interest in developing early passage cell lines with matched normal reference DNA to facilitate a precision medicine approach in assessing drug response. This study aimed to establish early passage cell lines, and perform whole exome sequencing and short tandem repeat profiling on matched normal reference DNA, primary tumour and corresponding cell lines. METHODS A cell culture based, in vitro study was conducted of patients with primary human papillomavirus positive and human papillomavirus negative tumours. RESULTS Four early passage cell lines were established. Two cell lines were human papillomavirus positive, confirmed by sequencing and p16 immunoblotting. Short tandem repeat profiling confirmed that all cell lines were established from their index tumours. Whole exome sequencing revealed that the matched normal reference DNA was critical for accurate mutational analysis: a high rate of false positive mutation calls were excluded (87.6 per cent). CONCLUSION Early passage cell lines were successfully established. Patient-matched reference DNA is important for accurate cell line mutational calls.
Collapse
|
31
|
Performance evaluation of six popular short-read simulators. Heredity (Edinb) 2023; 130:55-63. [PMID: 36496447 PMCID: PMC9905089 DOI: 10.1038/s41437-022-00577-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 11/10/2022] [Accepted: 11/11/2022] [Indexed: 12/14/2022] Open
Abstract
High-throughput sequencing data enables the comprehensive study of genomes and the variation therein. Essential for the interpretation of this genomic data is a thorough understanding of the computational methods used for processing and analysis. Whereas "gold-standard" empirical datasets exist for this purpose in humans, synthetic (i.e., simulated) sequencing data can offer important insights into the capabilities and limitations of computational pipelines for any arbitrary species and/or study design-yet, the ability of read simulator software to emulate genomic characteristics of empirical datasets remains poorly understood. We here compare the performance of six popular short-read simulators-ART, DWGSIM, InSilicoSeq, Mason, NEAT, and wgsim-and discuss important considerations for selecting suitable models for benchmarking.
Collapse
|
32
|
Duncavage EJ, Coleman JF, de Baca ME, Kadri S, Leon A, Routbort M, Roy S, Suarez CJ, Vanderbilt C, Zook JM. Recommendations for the Use of in Silico Approaches for Next-Generation Sequencing Bioinformatic Pipeline Validation: A Joint Report of the Association for Molecular Pathology, Association for Pathology Informatics, and College of American Pathologists. J Mol Diagn 2023; 25:3-16. [PMID: 36244574 DOI: 10.1016/j.jmoldx.2022.09.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 09/14/2022] [Accepted: 09/28/2022] [Indexed: 11/21/2022] Open
Abstract
In silico approaches for next-generation sequencing (NGS) data modeling have utility in the clinical laboratory as a tool for clinical assay validation. In silico NGS data can take a variety of forms, including pure simulated data or manipulated data files in which variants are inserted into existing data files. In silico data enable simulation of a range of variants that may be difficult to obtain from a single physical sample. Such data allow laboratories to more accurately test the performance of clinical bioinformatics pipelines without sequencing additional cases. For example, clinical laboratories may use in silico data to simulate low variant allele fraction variants to test the analytical sensitivity of variant calling software or simulate a range of insertion/deletion sizes to determine the performance of insertion/deletion calling software. In this article, the Working Group reviews the different types of in silico data with their strengths and limitations, methods to generate in silico data, and how data can be used in the clinical molecular diagnostic laboratory. Survey data indicate how in silico NGS data are currently being used. Finally, potential applications for which in silico data may become useful in the future are presented.
Collapse
Affiliation(s)
- Eric J Duncavage
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, Missouri.
| | - Joshua F Coleman
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, University of Utah, Salt Lake City, Utah
| | - Monica E de Baca
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Pacific Pathology Partners, Seattle, Washington
| | - Sabah Kadri
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Anne and Robert H Lurie Children's Hospital of Chicago, Chicago, Illinois
| | - Annette Leon
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Color Health, Burlingame, California
| | - Mark Routbort
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Hematopathology, MD Anderson Cancer Center, Houston, Texas
| | - Somak Roy
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Laboratory Medicine, Cincinnati Children's Hospital, Cincinnati, Ohio
| | - Carlos J Suarez
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Stanford University, Palo Alto, California
| | - Chad Vanderbilt
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Justin M Zook
- In Silico Pipeline Validation Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Biomarker and Genomic Sciences Group, National Institute of Standards and Technology, Gaithersburg, Maryland
| |
Collapse
|
33
|
Talsania K, Shen TW, Chen X, Jaeger E, Li Z, Chen Z, Chen W, Tran B, Kusko R, Wang L, Pang AWC, Yang Z, Choudhari S, Colgan M, Fang LT, Carroll A, Shetty J, Kriga Y, German O, Smirnova T, Liu T, Li J, Kellman B, Hong K, Hastie AR, Natarajan A, Moshrefi A, Granat A, Truong T, Bombardi R, Mankinen V, Meerzaman D, Mason CE, Collins J, Stahlberg E, Xiao C, Wang C, Xiao W, Zhao Y. Structural variant analysis of a cancer reference cell line sample using multiple sequencing technologies. Genome Biol 2022; 23:255. [PMID: 36514120 PMCID: PMC9746098 DOI: 10.1186/s13059-022-02816-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 11/17/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The cancer genome is commonly altered with thousands of structural rearrangements including insertions, deletions, translocation, inversions, duplications, and copy number variations. Thus, structural variant (SV) characterization plays a paramount role in cancer target identification, oncology diagnostics, and personalized medicine. As part of the SEQC2 Consortium effort, the present study established and evaluated a consensus SV call set using a breast cancer reference cell line and matched normal control derived from the same donor, which were used in our companion benchmarking studies as reference samples. RESULTS We systematically investigated somatic SVs in the reference cancer cell line by comparing to a matched normal cell line using multiple NGS platforms including Illumina short-read, 10X Genomics linked reads, PacBio long reads, Oxford Nanopore long reads, and high-throughput chromosome conformation capture (Hi-C). We established a consensus SV call set of a total of 1788 SVs including 717 deletions, 230 duplications, 551 insertions, 133 inversions, 146 translocations, and 11 breakends for the reference cancer cell line. To independently evaluate and cross-validate the accuracy of our consensus SV call set, we used orthogonal methods including PCR-based validation, Affymetrix arrays, Bionano optical mapping, and identification of fusion genes detected from RNA-seq. We evaluated the strengths and weaknesses of each NGS technology for SV determination, and our findings provide an actionable guide to improve cancer genome SV detection sensitivity and accuracy. CONCLUSIONS A high-confidence consensus SV call set was established for the reference cancer cell line. A large subset of the variants identified was validated by multiple orthogonal methods.
Collapse
Affiliation(s)
- Keyur Talsania
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tsai-Wei Shen
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Xiongfong Chen
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Zhipan Li
- Sentieon Inc, Mountain View, CA, USA
| | - Zhong Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | | | - Limin Wang
- Laboratory of Human Carcinogenesis, Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA
| | | | - Zhaowei Yang
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | - Sulbha Choudhari
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Michael Colgan
- Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Li Tai Fang
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc, 1301 Shoreway Road, Belmont, CA, 94002, USA
| | | | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yuliya Kriga
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Oksana German
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tatyana Smirnova
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tiantain Liu
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Jing Li
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
| | | | - Karl Hong
- Bionano Genomics, San Diego, CA92121, USA
| | | | | | | | | | | | | | | | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Jack Collins
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Eric Stahlberg
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
| | - Wenming Xiao
- Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA.
| | - Yongmei Zhao
- Sequencing Facility Bioinformatics Group, Advanced Biomedical and Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
- Bioinformatics and Computational Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
| |
Collapse
|
34
|
Krishnamachari K, Lu D, Swift-Scott A, Yeraliyev A, Lee K, Huang W, Leng SN, Skanderup AJ. Accurate somatic variant detection using weakly supervised deep learning. Nat Commun 2022; 13:4248. [PMID: 35869060 PMCID: PMC9307817 DOI: 10.1038/s41467-022-31765-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Accepted: 06/29/2022] [Indexed: 11/09/2022] Open
Abstract
AbstractIdentification of somatic mutations in tumor samples is commonly based on statistical methods in combination with heuristic filters. Here we develop VarNet, an end-to-end deep learning approach for identification of somatic variants from aligned tumor and matched normal DNA reads. VarNet is trained using image representations of 4.6 million high-confidence somatic variants annotated in 356 tumor whole genomes. We benchmark VarNet across a range of publicly available datasets, demonstrating performance often exceeding current state-of-the-art methods. Overall, our results demonstrate how a scalable deep learning approach could augment and potentially supplant human engineered features and heuristic filters in somatic variant calling.
Collapse
|
35
|
Jin J, Chen Z, Liu J, Du H, Zhang G. Towards an accurate and robust analysis pipeline for somatic mutation calling. Front Genet 2022; 13:979928. [PMID: 36457740 PMCID: PMC9705725 DOI: 10.3389/fgene.2022.979928] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 11/01/2022] [Indexed: 12/24/2023] Open
Abstract
Accurate and robust somatic mutation detection is essential for cancer treatment, diagnostics and research. Various analysis pipelines give different results and thus should be systematically evaluated. In this study, we benchmarked 5 commonly-used somatic mutation calling pipelines (VarScan, VarDictJava, Mutect2, Strelka2 and FANSe) for their precision, recall and speed, using standard benchmarking datasets based on a series of real-world whole-exome sequencing datasets. All the 5 pipelines showed very high precision in all cases, and high recall rate in mutation rates higher than 10%. However, for the low frequency mutations, these pipelines showed large difference. FANSe showed the highest accuracy (especially the sensitivity) in all cases, and VarScan and VarDictJava outperformed Mutect2 and Strelka2 in low frequency mutations at all sequencing depths. The flaws in filter was the major cause of the low sensitivity of the four pipelines other than FANSe. Concerning the speed, FANSe pipeline was 8.8∼19x faster than the other pipelines. Our benchmarking results demonstrated performance of the somatic calling pipelines and provided a reference for a proper choice of such pipelines in cancer applications.
Collapse
Affiliation(s)
- Jingjie Jin
- Key Laboratory of Functional Protein Research, Guangdong Higher Education Institutes, Jinan University, Guangzhou, China
- MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
| | - Zixi Chen
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | | | - Hongli Du
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, China
| | - Gong Zhang
- Key Laboratory of Functional Protein Research, Guangdong Higher Education Institutes, Jinan University, Guangzhou, China
- MOE Key Laboratory of Tumor Molecular Biology, Institute of Life and Health Engineering, Jinan University, Guangzhou, China
- Chi-Biotech Co. Ltd., Shenzhen, China
| |
Collapse
|
36
|
Valecha M, Posada D. Somatic variant calling from single-cell DNA sequencing data. Comput Struct Biotechnol J 2022; 20:2978-2985. [PMID: 35782734 PMCID: PMC9218383 DOI: 10.1016/j.csbj.2022.06.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 06/06/2022] [Accepted: 06/06/2022] [Indexed: 11/03/2022] Open
Abstract
Single-cell sequencing has gained popularity in recent years. Despite its numerous applications, single-cell DNA sequencing data is highly error-prone due to technical biases arising from uneven sequencing coverage, allelic dropout, and amplification error. With these artifacts, the identification of somatic genomic variants becomes a challenging task, and over the years, several methods have been developed explicitly for this type of data. Single-cell variant callers implement distinct strategies, make different use of the data, and typically result in many discordant calls when applied to real data. Here, we review current approaches for single-cell variant calling, emphasizing single nucleotide variants. We highlight their potential benefits and shortcomings to help users choose a suitable tool for their data at hand.
Collapse
Key Words
- ADO, allelic dropout
- Allele dropout
- Amplification error
- CNV, copy number variant
- Indel, short insertion or deletion
- LDO, locus dropout
- SNV, single nucleotide variant
- SV, structural variant
- Single-cell genomics
- Somatic variants
- VAF, variant allele frequency
- Variant calling
- hSNP, heterozygous single-nucleotide polymorphism
- scATAC-seq, single-cell sequencing assay for transposase-accessible chromatin
- scDNA-seq, single-cell DNA sequencing
- scHi-C, single-cell Hi-C sequencing
- scMethyl-seq, single-cell Methylation sequencing
- scRNA-seq, single-cell RNA sequencing
- scWGA, single-cell whole-genome amplification
Collapse
Affiliation(s)
- Monica Valecha
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
| | - David Posada
- CINBIO, Universidade de Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), SERGAS-UVIGO, Spain
- Department of Biochemistry, Genetics, and Immunology, Universidade de Vigo, 36310 Vigo, Spain
| |
Collapse
|
37
|
Espejo Valle-Inclan J, Besselink NJ, de Bruijn E, Cameron DL, Ebler J, Kutzera J, van Lieshout S, Marschall T, Nelen M, Priestley P, Renkens I, Roemer MG, van Roosmalen MJ, Wenger AM, Ylstra B, Fijneman RJ, Kloosterman WP, Cuppen E. A multi-platform reference for somatic structural variation detection. CELL GENOMICS 2022; 2:100139. [PMID: 36778136 PMCID: PMC9903816 DOI: 10.1016/j.xgen.2022.100139] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 05/06/2021] [Accepted: 05/06/2022] [Indexed: 10/18/2022]
Abstract
Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality, gold-standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines. Here, we performed somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different sequencing technologies. Based on the evidence from multiple technologies combined with extensive experimental validation, we compiled a comprehensive set of carefully curated and validated somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects. The truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.
Collapse
Affiliation(s)
| | - Nicolle J.M. Besselink
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | | | - Daniel L. Cameron
- Hartwig Medical Foundation, Amsterdam, the Netherlands,Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
| | - Jana Ebler
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Joachim Kutzera
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | | | - Tobias Marschall
- Institute for Medical Biometry and Bioinformatics, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Marcel Nelen
- Department of Human Genetics, Radboud UMC, Nijmegen, the Netherlands
| | | | - Ivo Renkens
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands
| | - Margaretha G.M. Roemer
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, the Netherlands
| | | | | | - Bauke Ylstra
- Department of Pathology, Amsterdam UMC, Vrije Universiteit Amsterdam, Cancer Center Amsterdam, Amsterdam, the Netherlands
| | - Remond J.A. Fijneman
- Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Wigard P. Kloosterman
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands,Corresponding author
| | - Edwin Cuppen
- Center for Molecular Medicine and Oncode Institute, UMC Utrecht, Utrecht, the Netherlands,Hartwig Medical Foundation, Amsterdam, the Netherlands,Corresponding author
| |
Collapse
|
38
|
Olson ND, Wagner J, McDaniel J, Stephens SH, Westreich ST, Prasanna AG, Johanson E, Boja E, Maier EJ, Serang O, Jáspez D, Lorenzo-Salazar JM, Muñoz-Barrera A, Rubio-Rodríguez LA, Flores C, Kyriakidis K, Malousi A, Shafin K, Pesout T, Jain M, Paten B, Chang PC, Kolesnikov A, Nattestad M, Baid G, Goel S, Yang H, Carroll A, Eveleigh R, Bourgey M, Bourque G, Li G, Ma C, Tang L, Du Y, Zhang S, Morata J, Tonda R, Parra G, Trotta JR, Brueffer C, Demirkaya-Budak S, Kabakci-Zorlu D, Turgut D, Kalay Ö, Budak G, Narcı K, Arslan E, Brown R, Johnson IJ, Dolgoborodov A, Semenyuk V, Jain A, Tetikol HS, Jain V, Ruehle M, Lajoie B, Roddey C, Catreux S, Mehio R, Ahsan MU, Liu Q, Wang K, Ebrahim Sahraeian SM, Fang LT, Mohiyuddin M, Hung C, Jain C, Feng H, Li Z, Chen L, Sedlazeck FJ, Zook JM. PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions. CELL GENOMICS 2022; 2:S2666-979X(22)00058-1. [PMID: 35720974 PMCID: PMC9205427 DOI: 10.1016/j.xgen.2022.100129] [Citation(s) in RCA: 54] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 11/01/2021] [Accepted: 04/08/2022] [Indexed: 11/19/2022]
Abstract
The precisionFDA Truth Challenge V2 aimed to assess the state of the art of variant calling in challenging genomic regions. Starting with FASTQs, 20 challenge participants applied their variant-calling pipelines and submitted 64 variant call sets for one or more sequencing technologies (Illumina, PacBio HiFi, and Oxford Nanopore Technologies). Submissions were evaluated following best practices for benchmarking small variants with updated Genome in a Bottle benchmark sets and genome stratifications. Challenge submissions included numerous innovative methods, with graph-based and machine learning methods scoring best for short-read and long-read datasets, respectively. With machine learning approaches, combining multiple sequencing technologies performed particularly well. Recent developments in sequencing and variant calling have enabled benchmarking variants in challenging genomic regions, paving the way for the identification of previously unknown clinically relevant variants.
Collapse
Affiliation(s)
- Nathan D. Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| | | | | | | | - Elaine Johanson
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Emily Boja
- Office of Health Informatics, Office of the Chief Scientist, Office of the Commissioner, US Food and Drug Administration, Silver Spring, MD, USA
| | - Ezekiel J. Maier
- Booz Allen Hamilton, 8283 Greensboro Drive, Mclean, VA 22102, USA
| | - Omar Serang
- DNAnexus, Inc., 1975 W El Camino Real #204, Mountain View, CA 94040, USA
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
- Research Unit, Hospital Universitario N.S. de Candelaria, Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 San Cristóbal de La Laguna, Spain
| | - Konstantinos Kyriakidis
- School of Pharmacy, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
| | - Andigoni Malousi
- Genomics and Epigenomics Translational Research (GENeTres), Center for Interdisciplinary Research and Innovation, 570 01 Thessaloniki, Greece
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki (AUTH), 541 24 Thessaloniki, Greece
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Trevor Pesout
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Miten Jain
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, CA, USA
| | - Pi-Chuan Chang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | | | - Maria Nattestad
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Gunjan Baid
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Sidharth Goel
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Howard Yang
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Andrew Carroll
- Google Inc, 1600 Amphitheater Pkwy, Mountain View, CA 94040, USA
| | - Robert Eveleigh
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Mathieu Bourgey
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Guillaume Bourque
- The Canadian Center for Computational Genomics (C3G), Montréal, QC, Canada
| | - Gen Li
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ChouXian Ma
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - LinQi Tang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - YuanPing Du
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - ShaoWei Zhang
- HuXinDao, QingZhuHu TaiYangShan Road, KaiFu, ChangSha, HuNan, China
| | - Jordi Morata
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Raúl Tonda
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Genís Parra
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jean-Rémi Trotta
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, 08028 Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Christian Brueffer
- Division of Oncology, Department of Clinical Sciences, Lund University, Lund, Sweden
| | | | | | - Deniz Turgut
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Özem Kalay
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Gungor Budak
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Kübra Narcı
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | - Elif Arslan
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | - Amit Jain
- Seven Bridges Genomics, Inc, Charlestown, MA, USA
| | | | | | | | | | | | | | | | - Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Qian Liu
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA 95050, USA
| | | | | | - Chirag Jain
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | | | | | | | - Fritz J. Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Justin M. Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, MS8312, Gaithersburg, MD 20899, USA
| |
Collapse
|
39
|
Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Collapse
|
40
|
Wang D, Zhang Y, li R, Li J, Zhang R. Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency. J Adv Res 2022; 44:161-172. [PMID: 36725187 PMCID: PMC9937796 DOI: 10.1016/j.jare.2022.03.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 03/16/2022] [Accepted: 03/27/2022] [Indexed: 02/04/2023] Open
Abstract
INTRODUCTION Clinical precision oncology increasingly relies on accurate genome-wide profiling using large panel next generation sequencing; however, difficulties in accurate and consistent detection of somatic mutation from individual platforms and pipelines remain an open question. OBJECTIVES To obtain paired tumor-normal reference materials that can be effectively constructed and interchangeable with clinical samples, and evaluate the performance of 56 panels under routine testing conditions based on the reference samples. METHODS Genes involved in mismatch repair and DNA proofreading were knocked down using the CRISPR-Cas9 technology to accumulate somatic mutations in a defined GM12878 cell line. They were used as reference materials to comprehensively evaluate the reproducibility and accuracy of detection results of oncopanels and explore the potential influencing factors. RESULTS In total, 14 paired tumor-normal reference DNA samples from engineered cell lines were prepared, and a reference dataset comprising 168 somatic mutations in a high-confidence region of 1.8 Mb were generated. For mutations with an allele frequency (AF) of more than 5% in reference samples, 56 panels collectively reported 1306 errors, including 729 false negatives (FNs), 179 false positives (FPs) and 398 reproducibility errors. The performance metric varied among panels with precision and recall ranging from 0.773 to 1 and 0.683 to 1, respectively. Incorrect and inadequate filtering accounted for a large proportion of false discovery (including FNs and FPs), while low-quality detection, cross-contamination and other sequencing errors during the wet bench process were other sources of FNs and FPs. In addition, low AF (<5%) considerably influenced the reproducibility and comparability among panels. CONCLUSIONS This study provided an integrated practice for developing reference standard to assess oncopanels in detecting somatic mutations and quantitatively revealed the source of detection errors. It will promote optimization, validation, and quality control among laboratories with potential applicability in clinical use.
Collapse
Affiliation(s)
- Duo Wang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Yuanfeng Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Rui li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| |
Collapse
|
41
|
PanCancer analysis of somatic mutations in repetitive regions reveals recurrent mutations in snRNA U2. NPJ Genom Med 2022; 7:19. [PMID: 35288589 PMCID: PMC8921233 DOI: 10.1038/s41525-022-00292-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/15/2022] [Indexed: 11/27/2022] Open
Abstract
Current somatic mutation callers are biased against repetitive regions, preventing the identification of potential driver alterations in these loci. We developed a mutation caller for repetitive regions, and applied it to study repetitive non protein-coding genes in more than 2200 whole-genome cases. We identified a recurrent mutation at position c.28 in the gene encoding the snRNA U2. This mutation is present in B-cell derived tumors, as well as in prostate and pancreatic cancer, suggesting U2 c.28 constitutes a driver candidate associated with worse prognosis. We showed that the GRCh37 reference genome is incomplete, lacking the U2 cluster in chromosome 17, preventing the identification of mutations in this gene. Furthermore, the 5′-flanking region of WDR74, previously described as frequently mutated in cancer, constitutes a functional copy of U2. These data reinforce the relevance of non-coding mutations in cancer, and highlight current challenges of cancer genomic research in characterizing mutations affecting repetitive genes.
Collapse
|
42
|
Perez-Roman E, Borredá C, López-García Usach A, Talon M. Single-nucleotide mosaicism in citrus: Estimations of somatic mutation rates and total number of variants. THE PLANT GENOME 2022; 15:e20162. [PMID: 34796688 DOI: 10.1002/tpg2.20162] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 09/02/2021] [Indexed: 06/13/2023]
Abstract
Most of the hundreds of citrus varieties are derived from spontaneous mutations. We characterized the dynamics of single-nucleotide mosaicism in a 36-yr-old clementine (Citrus ×clementina hort. ex Tanaka) tree, a commercial citrus whose vegetative behavior is known in detail. Whole-genome sequencing identified 73 reliable somatic mutations, 48% of which were transitions from G/C to A/T, suggesting ultraviolet (UV) exposure as mutagen. The mutations accumulated in sectorized areas of the tree in a nested hierarchy determined by the branching pattern, although some variants detected in the basal parts were also found in the new growth and were fixed in some branches and leaves of much younger age. The estimate of mutation rates in our tree was 4.4 × 10-10 bp-1 yr-1 , a rate in the range reported in other perennials. Assuming a perfect configuration and taking advantage of previous counts on the number of total leaves of typical clementine trees, these mutation determinations allowed to estimate for the first time the total number of variants present in a standard adult tree (1,500-5,000) and the somatic mutations generated in a typical leaf flush (0.92-1.19). From an evolutionary standpoint, the sectoral distribution of somatic mutations and the habit of periodic foliar renewal of long-lived plants appear to increase genetic heterogeneity and, therefore, the adaptive role of somatic mutations reducing the mutational load and providing fitness benefits.
Collapse
Affiliation(s)
- Estela Perez-Roman
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, Valencia, 46113, Spain
| | - Carles Borredá
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, Valencia, 46113, Spain
| | - Antoni López-García Usach
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, Valencia, 46113, Spain
| | - Manuel Talon
- Centro de Genómica, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, Valencia, 46113, Spain
| |
Collapse
|
43
|
Anzar I, Sverchkova A, Samarakoon P, Ellingsen EB, Gaudernack G, Stratford R, Clancy T. Personalized
HLA
typing leads to the discovery of novel
HLA
alleles and tumor‐specific
HLA
variants. HLA 2022; 99:313-327. [PMID: 35073457 PMCID: PMC9546058 DOI: 10.1111/tan.14562] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/08/2022] [Accepted: 01/21/2022] [Indexed: 11/29/2022]
Abstract
Accurate and full‐length typing of the HLA region is important in many clinical and research settings. With the advent of next generation sequencing (NGS), several HLA typing algorithms have been developed, including many that are applicable to whole exome sequencing (WES). However, most of these solutions operate by providing the closest‐matched HLA allele among the known alleles in IPD‐IMGT/HLA Database. These database‐matching approaches have demonstrated very high performance when typing well characterized HLA alleles. However, as they rely on the completeness of the HLA database, they are not optimal for detecting novel or less well characterized alleles. Furthermore, the database‐matching approaches are also not adequate in the context of cancer, where a comprehensive characterization of somatic HLA variation and expression patterns of a tumor's HLA locus may guide therapy and clinical outcome, because of the pivotal role HLA alleles play in tumor antigen recognition and immune escape. Here, we describe a personalized HLA typing approach applied to WES data that leverages the strengths of database‐matching approaches while simultaneously allowing for the discovery of novel HLA alleles and tumor‐specific HLA variants, through the systematic integration of germline and somatic variant calling. We applied this approach on WES from 10 metastatic melanoma patients and validated the HLA typing results using HLA targeted NGS sequencing from patients where at least one HLA germline candidate was detected on Class I HLA. Targeted NGS sequencing confirmed 100% performance for the 1st and 2nd fields. In total, five out of the six detected HLA germline variants were because of Class I ambiguities at the third or fourth fields, and their detection recovered the correct HLA allele genotype. The sixth germline variant let to the formal discovery of a novel Class I allele. Finally, we demonstrated a substantially improved somatic variant detection accuracy in HLA alleles with a 91% of success rate in simulated experiments. The approach described here may allow the field to genotype more accurately using WES data, leading to the discovery of novel HLA alleles and help characterize the relationship between somatic variation in the HLA region and immunosurveillance.
Collapse
Affiliation(s)
- Irantzu Anzar
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | - Angelina Sverchkova
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | - Pubudu Samarakoon
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | | | - Gustav Gaudernack
- Ultimovacs ASA, Oslo Cancer Cluster, Ullernchausseen 64/66 Oslo Norway
| | - Richard Stratford
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | - Trevor Clancy
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| |
Collapse
|
44
|
Sahraeian SME, Fang LT, Karagiannis K, Moos M, Smith S, Santana-Quintero L, Xiao C, Colgan M, Hong H, Mohiyuddin M, Xiao W. Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. Genome Biol 2022; 23:12. [PMID: 34996510 PMCID: PMC8740374 DOI: 10.1186/s13059-021-02592-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. RESULTS In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. CONCLUSIONS The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.
Collapse
Affiliation(s)
| | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA, 95050, USA
| | - Konstantinos Karagiannis
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Malcolm Moos
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Sean Smith
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Luis Santana-Quintero
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Michael Colgan
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Huixiao Hong
- Bioinformatics branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR, 72079, USA
| | | | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA.
| |
Collapse
|
45
|
Rashid I, Campos M, Collier T, Crepeau M, Weakley A, Gripkey H, Lee Y, Schmidt H, Lanzaro GC. Spontaneous mutation rate estimates for the principal malaria vectors Anopheles coluzzii and Anopheles stephensi. Sci Rep 2022; 12:226. [PMID: 34996998 PMCID: PMC8742016 DOI: 10.1038/s41598-021-03943-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 12/07/2021] [Indexed: 11/17/2022] Open
Abstract
Using high-depth whole genome sequencing of F0 mating pairs and multiple individual F1 offspring, we estimated the nuclear mutation rate per generation in the malaria vectors Anopheles coluzzii and Anopheles stephensi by detecting de novo genetic mutations. A purpose-built computer program was employed to filter actual mutations from a deep background of superficially similar artifacts resulting from read misalignment. Performance of filtering parameters was determined using software-simulated mutations, and the resulting estimate of false negative rate was used to correct final mutation rate estimates. Spontaneous mutation rates by base substitution were estimated at 1.00 × 10−9 (95% confidence interval, 2.06 × 10−10—2.91 × 10−9) and 1.36 × 10−9 (95% confidence interval, 4.42 × 10−10—3.18 × 10−9) per site per generation in A. coluzzii and A. stephensi respectively. Although similar studies have been performed on other insect species including dipterans, this is the first study to empirically measure mutation rates in the important genus Anopheles, and thus provides an estimate of µ that will be of utility for comparative evolutionary genomics, as well as for population genetic analysis of malaria vector mosquito species.
Collapse
Affiliation(s)
- Iliyas Rashid
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA.,Section of Cell and Developmental Biology, University of California, San Diego, La Jolla, CA, USA.,Tata Institute for Genetics and Society, Center at inStem, Bangalore, Karnataka, 560065, India
| | - Melina Campos
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Travis Collier
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Marc Crepeau
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Allison Weakley
- Department of ChEM-H Operations, Stanford University, 450 Serra Mall, Stanford, CA, 94305, USA
| | - Hans Gripkey
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA
| | - Yoosook Lee
- Florida Medical Entomology Laboratory, University of Florida, 200 9th St SE, Vero Beach, FL, 32962, USA
| | - Hanno Schmidt
- Anthropology, Institute of Organismic and Molecular Evolution (iomE), Johannes Gutenberg University of Mainz, Saarstraße 21, 55122, Mainz, Germany
| | - Gregory C Lanzaro
- Vector Genetics Laboratory, Department of Pathology, Microbiology and Immunology, UC Davis, 1089 Veterinary Medicine Dr, 4225 VM3B, Davis, CA, 95616, USA.
| |
Collapse
|
46
|
Laganà A. Computational Approaches for the Investigation of Intra-tumor Heterogeneity and Clonal Evolution from Bulk Sequencing Data in Precision Oncology Applications. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:101-118. [DOI: 10.1007/978-3-030-91836-1_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
47
|
Ji S, Montierth MD, Wang W. MuSE: A Novel Approach to Mutation Calling with Sample-Specific Error Modeling. Methods Mol Biol 2022; 2493:21-27. [PMID: 35751806 DOI: 10.1007/978-1-0716-2293-3_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Accurate detection of somatic mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We have developed MuSE, Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of tumor and normal tissue at each reference base. It adopts a sample-specific error model to depict inter-tumor heterogeneity, which greatly improves the overall accuracy. Here, we describe the method and provide a tutorial on the installation and application of MuSE.
Collapse
Affiliation(s)
- Shuangxi Ji
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Matthew D Montierth
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Quantitative Computational Biology, Baylor College of Medicine, Houston, TX, USA
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
48
|
Wong J, Gruber E, Maher B, Waltham M, Sabouri-Thompson Z, Jong I, Luong Q, Levy S, Kumar B, Brasacchio D, Jia W, So J, Skinner H, Lewis A, Hogg SJ, Vervoort S, DiCorleto C, Uhe M, Gamgee J, Opat S, Gregory GP, Polekhina G, Reynolds J, Hawkes EA, Kailainathan G, Gasiorowski R, Kats LM, Shortt J. Integrated clinical and genomic evaluation of guadecitabine (SGI-110) in peripheral T-cell lymphoma. Leukemia 2022; 36:1654-1665. [PMID: 35459873 PMCID: PMC9162925 DOI: 10.1038/s41375-022-01571-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/28/2022] [Accepted: 04/04/2022] [Indexed: 01/03/2023]
Abstract
Peripheral T-cell lymphoma (PTCL) is a rare, heterogenous malignancy with dismal outcomes at relapse. Hypomethylating agents (HMA) have an emerging role in PTCL, supported by shared mutations with myelodysplasia (MDS). Response rates to azacitidine in PTCL of follicular helper cell origin are promising. Guadecitabine is a decitabine analogue with efficacy in MDS. In this phase II, single-arm trial, PTCL patients received guadecitabine on days 1-5 of 28-day cycles. Primary end points were overall response rate (ORR) and safety. Translational sub-studies included cell free plasma DNA sequencing and functional genomic screening using an epigenetically-targeted CRISPR/Cas9 library to identify response predictors. Among 20 predominantly relapsed/refractory patients, the ORR was 40% (10% complete responses). Most frequent grade 3-4 adverse events were neutropenia and thrombocytopenia. At 10 months median follow-up, median progression free survival (PFS) and overall survival (OS) were 2.9 and 10.4 months respectively. RHOAG17V mutations associated with improved PFS (median 5.47 vs. 1.35 months; Wilcoxon p = 0.02, Log-Rank p = 0.06). 4/7 patients with TP53 variants responded. Deletion of the histone methyltransferase SETD2 sensitised to HMA but TET2 deletion did not. Guadecitabine conveyed an acceptable ORR and toxicity profile; decitabine analogues may provide a backbone for future combinatorial regimens co-targeting histone methyltransferases.
Collapse
Affiliation(s)
- Jonathan Wong
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia ,grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Emily Gruber
- grid.1008.90000 0001 2179 088XSir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC Australia ,grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Belinda Maher
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia ,grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Mark Waltham
- grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Zahra Sabouri-Thompson
- grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Ian Jong
- grid.419789.a0000 0000 9295 3933Monash Health Imaging, Monash Health, Clayton, VIC Australia ,grid.1002.30000 0004 1936 7857Department of Imaging, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Quinton Luong
- grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Sidney Levy
- grid.419789.a0000 0000 9295 3933Monash Health Imaging, Monash Health, Clayton, VIC Australia ,grid.1002.30000 0004 1936 7857Department of Imaging, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Beena Kumar
- grid.419789.a0000 0000 9295 3933Monash Pathology, Monash Health, Clayton, VIC Australia
| | - Daniella Brasacchio
- grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Wendy Jia
- grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Joan So
- grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Hugh Skinner
- grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Alexander Lewis
- grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Simon J. Hogg
- grid.1008.90000 0001 2179 088XSir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC Australia ,grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Stephin Vervoort
- grid.1008.90000 0001 2179 088XSir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC Australia ,grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Carmen DiCorleto
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia
| | - Micheleine Uhe
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia
| | - Jeanette Gamgee
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia
| | - Stephen Opat
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia ,grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Gareth P. Gregory
- grid.419789.a0000 0000 9295 3933Monash Haematology, Monash Health, Clayton, VIC Australia ,grid.1002.30000 0004 1936 7857Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC Australia
| | - Galina Polekhina
- grid.1002.30000 0004 1936 7857Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC Australia
| | - John Reynolds
- grid.1002.30000 0004 1936 7857Biostatistics Consulting Platform, Monash University and Alfred Health, Prahran, VIC Australia
| | - Eliza A. Hawkes
- grid.482637.cOlivia Newton John Cancer Wellness and Research Centre, at Austin Health, Heidelberg, VIC Australia ,grid.1002.30000 0004 1936 7857Transfusion Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC Australia
| | - Gajan Kailainathan
- grid.414685.a0000 0004 0392 3935Haematology Department, Concord Repatriation General Hospital, Concord, NSW Australia
| | - Robin Gasiorowski
- grid.414685.a0000 0004 0392 3935Haematology Department, Concord Repatriation General Hospital, Concord, NSW Australia ,grid.1013.30000 0004 1936 834XUniversity of Sydney, Sydney, NSW Australia
| | - Lev M. Kats
- grid.1008.90000 0001 2179 088XSir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC Australia ,grid.1055.10000000403978434Peter MacCallum Cancer Centre, Melbourne, VIC Australia
| | - Jake Shortt
- Monash Haematology, Monash Health, Clayton, VIC, Australia. .,Blood Cancer Therapeutics Laboratory, Department of Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, VIC, Australia. .,Sir Peter MacCallum Department of Oncology, University of Melbourne, Parkville, VIC, Australia. .,Peter MacCallum Cancer Centre, Melbourne, VIC, Australia.
| |
Collapse
|
49
|
Huang W, Sim NL, Skanderup AJ. Accurate Ensemble Prediction of Somatic Mutations with SMuRF2. Methods Mol Biol 2022; 2493:53-66. [PMID: 35751808 DOI: 10.1007/978-1-0716-2293-3_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Accurate identification of somatic mutations is crucial for discovery and identification of driver mutations in cancer tumors. Here, we describe the updated Somatic Mutation calling method using a Random Forest (SMuRF2), an ensemble method that combines the predictions and auxiliary features from individual mutation callers using supervised machine learning. SMuRF2 provides an efficient workflow to predict both somatic point mutations (SNVs) and small insertions/deletions (indels) in cancer genomes and exomes. We describe the latest method and provide a detailed tutorial for running SMuRF2.
Collapse
Affiliation(s)
- Weitai Huang
- Laboratory of Computational Cancer Genomics, Genome Institute of Singapore, A*STAR (Agency for Science, Technology and Research), Singapore, Singapore.
| | - Ngak Leng Sim
- Laboratory of Computational Cancer Genomics, Genome Institute of Singapore, A*STAR (Agency for Science, Technology and Research), Singapore, Singapore
| | - Anders J Skanderup
- Laboratory of Computational Cancer Genomics, Genome Institute of Singapore, A*STAR (Agency for Science, Technology and Research), Singapore, Singapore
| |
Collapse
|
50
|
Lin LH, Chou CH, Cheng HW, Chang KW, Liu CJ. Precise Identification of Recurrent Somatic Mutations in Oral Cancer Through Whole-Exome Sequencing Using Multiple Mutation Calling Pipelines. Front Oncol 2021; 11:741626. [PMID: 34912705 PMCID: PMC8666431 DOI: 10.3389/fonc.2021.741626] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 11/11/2021] [Indexed: 01/18/2023] Open
Abstract
Understanding the genomic alterations in oral carcinogenesis remains crucial for the appropriate diagnosis and treatment of oral squamous cell carcinoma (OSCC). To unveil the mutational spectrum, in this study, we conducted whole-exome sequencing (WES), using six mutation calling pipelines and multiple filtering criteria applied to 50 paired OSCC samples. The tumor mutation burden extracted from the data set of somatic variations was significantly associated with age, tumor staging, and survival. Several genes (MUC16, MUC19, KMT2D, TTN, HERC2) with a high frequency of false positive mutations were identified. Moreover, known (TP53, FAT1, EPHA2, NOTCH1, CASP8, and PIK3CA) and novel (HYDIN, ALPK3, ASXL1, USP9X, SKOR2, CPLANE1, STARD9, and NSD2) genes have been found to be significantly and frequently mutated in OSCC. Further analysis of gene alteration status with clinical parameters revealed that canonical pathways, including clathrin-mediated endocytotic signaling, NFκB signaling, PEDF signaling, and calcium signaling were associated with OSCC prognosis. Defining a catalog of targetable genomic alterations showed that 58% of the tumors carried at least one aberrant event that may potentially be targeted by approved therapeutic agents. We found molecular OSCC subgroups which were correlated with etiology and prognosis while defining the landscape of major altered events in the coding regions of OSCC genomes. These findings provide information that will be helpful in the design of clinical trials on targeted therapies and in the stratification of patients with OSCC according to therapeutic efficacy.
Collapse
Affiliation(s)
- Li-Han Lin
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Chung-Hsien Chou
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Hui-Wen Cheng
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan
| | - Kuo-Wei Chang
- Institute of Oral Biology, School of Dentistry, National Yang Ming Chiao Tung University, Taipei, Taiwan.,Department of Stomatology, Taipei Veterans General Hospital, Taipei, Taiwan
| | - Chung-Ji Liu
- Department of Medical Research, MacKay Memorial Hospital, Taipei, Taiwan.,Department of Oral and Maxillofacial Surgery, Taipei MacKay Memorial Hospital, Taipei, Taiwan
| |
Collapse
|