1
|
Furtado LV, Bifulco C, Dolderer D, Hsiao SJ, Kipp BR, Lindeman NI, Ritterhouse LL, Temple-Smolkin RL, Zehir A, Nowak JA. Recommendations for Tumor Mutational Burden Assay Validation and Reporting: A Joint Consensus Recommendation of the Association for Molecular Pathology, College of American Pathologists, and Society for Immunotherapy of Cancer. J Mol Diagn 2024; 26:653-668. [PMID: 38851389 DOI: 10.1016/j.jmoldx.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 04/05/2024] [Accepted: 05/07/2024] [Indexed: 06/10/2024] Open
Abstract
Tumor mutational burden (TMB) has been recognized as a predictive biomarker for immunotherapy response in several tumor types. Several laboratories offer TMB testing, but there is significant variation in how TMB is calculated, reported, and interpreted among laboratories. TMB standardization efforts are underway, but no published guidance for TMB validation and reporting is currently available. Recognizing the current challenges of clinical TMB testing, the Association for Molecular Pathology convened a multidisciplinary collaborative working group with representation from the American Society of Clinical Oncology, the College of American Pathologists, and the Society for the Immunotherapy of Cancer to review the laboratory practices surrounding TMB and develop recommendations for the analytical validation and reporting of TMB testing based on survey data, literature review, and expert consensus. These recommendations encompass pre-analytical, analytical, and postanalytical factors of TMB analysis, and they emphasize the relevance of comprehensive methodological descriptions to allow comparability between assays.
Collapse
Affiliation(s)
- Larissa V Furtado
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee.
| | - Carlo Bifulco
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Providence Portland Medical Center, Portland, Oregon
| | - Daniel Dolderer
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Jupiter Medical Center, Jupiter, Florida
| | - Susan J Hsiao
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology and Cell Biology, Columbia University Medical Center, New York, New York
| | - Benjamin R Kipp
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, Minnesota
| | - Neal I Lindeman
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Weill Cornell Medicine, New York, New York
| | - Lauren L Ritterhouse
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Massachusetts General Hospital, Boston, Massachusetts
| | | | - Ahmet Zehir
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Memorial Sloan Kettering Cancer Center, New York, New York
| | - Jonathan A Nowak
- The Tumor Mutational Burden Working Group of the Clinical Practice Committee, Association for Molecular Pathology, Rockville, Maryland; Department of Pathology, Brigham and Women's Hospital, Boston, Massachusetts
| |
Collapse
|
2
|
Atzeni R, Massidda M, Pieroni E, Rallo V, Pisu M, Angius A. A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer. Int J Mol Sci 2024; 25:8044. [PMID: 39125613 PMCID: PMC11311285 DOI: 10.3390/ijms25158044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/11/2024] [Accepted: 07/22/2024] [Indexed: 08/12/2024] Open
Abstract
Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. Musta was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. Musta is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. Musta is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. Musta has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. Musta has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.
Collapse
Affiliation(s)
- Rossano Atzeni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Matteo Massidda
- Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy;
| | - Enrico Pieroni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Vincenzo Rallo
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| | - Massimo Pisu
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Andrea Angius
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| |
Collapse
|
3
|
Pastò B, Buzzatti G, Schettino C, Malapelle U, Bergamini A, De Angelis C, Musacchio L, Dieci MV, Kuhn E, Lambertini M, Passarelli A, Toss A, Farolfi A, Roncato R, Capoluongo E, Vida R, Pignata S, Callari M, Baldassarre G, Bartoletti M, Gerratana L, Puglisi F. Unlocking the potential of Molecular Tumor Boards: from cutting-edge data interpretation to innovative clinical pathways. Crit Rev Oncol Hematol 2024; 199:104379. [PMID: 38718940 DOI: 10.1016/j.critrevonc.2024.104379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/02/2024] [Accepted: 05/01/2024] [Indexed: 05/22/2024] Open
Abstract
The emerging era of precision medicine is characterized by an increasing availability of targeted anticancer therapies and by the parallel development of techniques to obtain more refined molecular data, whose interpretation may not always be straightforward. Molecular tumor boards gather various professional figures, in order to leverage the analysis of molecular data and provide prognostic and predictive insights for clinicians. In addition to healthcare development, they could also become a tool to promote knowledge and research spreading. A growing body of evidence on the application of molecular tumor boards to clinical practice is forming and positive signals are emerging, although a certain degree of heterogeneity exists. This work analyzes molecular tumor boards' potential workflows, figures involved, data sources, sample matrices and eligible patients, as well as available evidence and learning examples. The emerging concept of multi-institutional, disease-specific molecular tumor boards is also considered by presenting two ongoing nationwide experiences.
Collapse
Affiliation(s)
- Brenno Pastò
- Department of Medicine (DMED), University of Udine, Udine 33100, Italy; Department of Medical Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano 33081, Italy
| | - Giulia Buzzatti
- Department of Medical Oncology, U.O. Clinica di Oncologia Medica, IRCCS Ospedale Policlinico San Martino, Genova 16132, Italy
| | - Clorinda Schettino
- Clinical Trials Unit, Istituto Nazionale Tumori, IRCCS, Fondazione G. Pascale, Napoli 80131, Italy
| | - Umberto Malapelle
- Department of Public Health, University of Naples Federico II, Napoli 80131, Italy
| | - Alice Bergamini
- Faculty of Medicine and Surgery, Vita-Salute San Raffaele University, Milano 20132, Italy; Unit of Obstetrics and Gynaecology, IRCCS San Raffaele Scientific Institute, Milano 20132, Italy
| | - Carmine De Angelis
- Oncology Unit - Department of Clinical Medicine and Surgery, University of Naples Federico II, Napoli 80131, Italy
| | - Lucia Musacchio
- Department of Women and Child Health, Division of Gynaecologic Oncology, Fondazione Policlinico Universitario "A. Gemelli" IRCCS, Roma 00168, Italy
| | - Maria Vittoria Dieci
- Department of Surgery, Oncology and Gastroenterology, University of Padova, Padova 35122, Italy; Oncology 2, Veneto Institute of Oncology IOV-IRCCS, Padova 35128, Italy
| | - Elisabetta Kuhn
- Department of Biomedical, Surgical and Dental Sciences, University of Milan, Milano 20122, Italy; Pathology Unit, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Milano 20122, Italy
| | - Matteo Lambertini
- Department of Medical Oncology, U.O. Clinica di Oncologia Medica, IRCCS Ospedale Policlinico San Martino, Genova 16132, Italy; Department of Internal Medicine and Medical Specialties (DiMI), School of Medicine, University of Genova, Genova 16132, Italy
| | - Anna Passarelli
- Department of Urology and Gynaecology, Istituto Nazionale Tumori IRCCS "Fondazione G. Pascale", Napoli 80131, Italy
| | - Angela Toss
- Department of Oncology and Hematology, Azienda Ospedaliero-Universitaria di Modena, Modena 41124, Italy; Department of Medical and Surgical Sciences, University of Modena and Reggio Emilia, Modena 41124, Italy
| | - Alberto Farolfi
- Department of Medical Oncology, IRCCS Istituto Romagnolo per lo Studio dei Tumori (IRST) "Dino Amadori", Meldola 47014, Italy
| | - Rossana Roncato
- Department of Medicine (DMED), University of Udine, Udine 33100, Italy; Experimental and Clinical Pharmacology Unit, Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, Aviano 33081, Italy
| | - Ettore Capoluongo
- Department of Molecular Medicine and Medical Biotechnologies, University of Naples Federico II, Napoli 80131, Italy; Clinical Pathology Unit, Azienda Ospedaliera San Giovanni Addolorata, Roma 00184, Italy
| | - Riccardo Vida
- Department of Medicine (DMED), University of Udine, Udine 33100, Italy; Department of Medical Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano 33081, Italy
| | - Sandro Pignata
- Department of Urology and Gynaecology, Istituto Nazionale Tumori IRCCS "Fondazione G. Pascale", Napoli 80131, Italy
| | | | - Gustavo Baldassarre
- Molecular Oncology Unit, Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, Aviano 33081, Italy
| | - Michele Bartoletti
- Department of Medical Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano 33081, Italy
| | - Lorenzo Gerratana
- Department of Medicine (DMED), University of Udine, Udine 33100, Italy; Department of Medical Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano 33081, Italy.
| | - Fabio Puglisi
- Department of Medicine (DMED), University of Udine, Udine 33100, Italy; Department of Medical Oncology, Centro di Riferimento Oncologico di Aviano (CRO), IRCCS, Aviano 33081, Italy
| |
Collapse
|
4
|
Tang G, Liu X, Cho M, Li Y, Tran DH, Wang X. Pan-cancer discovery of somatic mutations from RNA sequencing data. Commun Biol 2024; 7:619. [PMID: 38783092 PMCID: PMC11116503 DOI: 10.1038/s42003-024-06326-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 05/14/2024] [Indexed: 05/25/2024] Open
Abstract
Identification of somatic mutations (SMs) is essential for characterizing cancer genomes. While DNA-seq is the prevalent method for identifying SMs, RNA-seq provides an alternative strategy to discover tumor mutations in the transcribed genome. Here, we have developed a machine learning based pipeline to discover SMs based on RNA-seq data (designated as RNA-SMs). Subsequently, we have conducted a pan-cancer analysis to systematically identify RNA-SMs from over 8,000 tumors in The Cancer Genome Atlas (TCGA). In this way, we have identified over 105,000 novel SMs that had not been reported in previous TCGA studies. These novel SMs have significant clinical implications in designing targeted therapy for improved patient outcomes. Further, we have combined the SMs identified by both RNA-seq and DNA-seq analyses to depict an updated mutational landscape across 32 cancer types. This new online SM atlas, OncoDB ( https://oncodb.org ), offers a more complete view of gene mutations that underline the development and progression of various cancers.
Collapse
Affiliation(s)
- Gongyu Tang
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, IL, USA
- Department of Mechanical Engineering and Materials Science, Washington University in St. Louis, St. Louis, MO, USA
| | - Xinyi Liu
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Minsu Cho
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Yuanxiang Li
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Dan-Ho Tran
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, IL, USA
| | - Xiaowei Wang
- Department of Pharmacology and Regenerative Medicine, University of Illinois at Chicago, Chicago, IL, USA.
- University of Illinois Cancer Center, Chicago, IL, USA.
| |
Collapse
|
5
|
Ji S, Zhu T, Sethia A, Wang W. Accelerated somatic mutation calling for whole-genome and whole-exome sequencing data from heterogenous tumor samples. Genome Res 2024; 34:633-641. [PMID: 38589250 PMCID: PMC11146589 DOI: 10.1101/gr.278456.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Accepted: 04/03/2024] [Indexed: 04/10/2024]
Abstract
Accurate detection of somatic mutations in DNA sequencing data is a fundamental prerequisite for cancer research. Previous analytical challenges were overcome by consensus mutation calling from four to five popular callers. This, however, increases the already nontrivial computing time from individual callers. Here, we launch MuSE 2, powered by multistep parallelization and efficient memory allocation, to resolve the computing time bottleneck. MuSE 2 speeds up 50 times more than MuSE 1 and eight to 80 times more than other popular callers. Our benchmark study suggests combining MuSE 2 and the recently accelerated Strelka2 achieves high efficiency and accuracy in analyzing large cancer genomic data sets.
Collapse
Affiliation(s)
- Shuangxi Ji
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA
| | - Tong Zhu
- NVIDIA Corporation, Santa Clara, California 95051, USA
| | - Ankit Sethia
- NVIDIA Corporation, Santa Clara, California 95051, USA
| | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030, USA;
| |
Collapse
|
6
|
Simpson JT. Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582089. [PMID: 38464143 PMCID: PMC10925087 DOI: 10.1101/2024.02.26.582089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
DNA sequencing of tumours to identify somatic mutations has become a critical tool to guide the type of treatment given to cancer patients. The gold standard for mutation calling is comparing sequencing data from the tumour to a matched normal sample to avoid mis-classifying inherited SNPs as mutations. This procedure works extremely well, but in certain situations only a tumour sample is available. While approaches have been developed to find mutations without a matched normal, they have limited accuracy or require specific types of input data (e.g. ultra-deep sequencing). Here we explore the application of single molecule long read sequencing to calling somatic mutations without matched normal samples. We develop a simple theoretical framework to show how haplotype phasing is an important source of information for determining whether a variant is a somatic mutation. We then use simulations to assess the range of experimental parameters (tumour purity, sequencing depth) where this approach is effective. These ideas are developed into a prototype somatic mutation caller, smrest, and its use is demonstrated on two highly mutated cancer cell lines. Finally, we argue that this approach has potential to measure clinically important biomarkers that are based on the genome-wide distribution of mutations: tumour mutation burden and mutation signatures.
Collapse
Affiliation(s)
- Jared T. Simpson
- Ontario Institute for Cancer Research, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
- Department of Computer Science, University of Toronto, Toronto, Canada
| |
Collapse
|
7
|
Li Z, Lan J, Shi X, Lu T, Hu X, Liu X, Chen Y, He Z. Whole-Genome Sequencing Reveals Rare Off-Target Mutations in MC1R-Edited Pigs Generated by Using CRISPR-Cas9 and Somatic Cell Nuclear Transfer. CRISPR J 2024; 7:29-40. [PMID: 38353621 DOI: 10.1089/crispr.2023.0034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/16/2024] Open
Abstract
The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 system has been widely used to create animal models for biomedical and agricultural use owing to its low cost and easy handling. However, the occurrence of erroneous cleavage (off-targeting) may raise certain concerns for the practical application of the CRISPR-Cas9 system. In this study, we created a melanocortin 1 receptor (MC1R)-edited pig model through somatic cell nuclear transfer (SCNT) by using porcine kidney cells modified by the CRISPR-Cas9 system. We then carried out whole-genome sequencing of two MC1R-edited pigs and two cloned wild-type siblings, together with the donor cells, to assess the genome-wide presence of single-nucleotide variants and small insertions and deletions (indels) and found only one candidate off-target indel in both MC1R-edited pigs. In summary, our study indicates that the minimal off-targeting effect induced by CRISPR-Cas9 may not be a major concern in gene-edited pigs created by SCNT.
Collapse
Affiliation(s)
- Zhenyang Li
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Jin Lan
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Xuan Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Tong Lu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Xiaoli Hu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Xiaohong Liu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| | - Zuyong He
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, People's Republic of China
| |
Collapse
|
8
|
Karimnezhad A, Perkins TJ. Empirical Bayes single nucleotide variant-calling for next-generation sequencing data. Sci Rep 2024; 14:1550. [PMID: 38233494 PMCID: PMC10794290 DOI: 10.1038/s41598-024-51958-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 01/11/2024] [Indexed: 01/19/2024] Open
Abstract
One of the fundamental computational problems in cancer genomics is the identification of single nucleotide variants (SNVs) from DNA sequencing data. Many statistical models and software implementations for SNV calling have been developed in the literature, yet, they still disagree widely on real datasets. Based on an empirical Bayesian approach, we introduce a local false discovery rate (LFDR) estimator for germline SNV calling. Our approach learns model parameters without prior information, and simultaneously accounts for information across all sites in the genomic regions of interest. We also propose another LFDR-based algorithm that reliably prioritizes a given list of mutations called by any other variant-calling algorithm. We use a suite of gold-standard cell line data to compare our LFDR approach against a collection of widely used, state of the art programs. We find that our LFDR approach approximately matches or exceeds the performance of all of these programs, despite some very large differences among them. Furthermore, when prioritizing other algorithms' calls by our LFDR score, we find that by manipulating the type I-type II tradeoff we can select subsets of variant calls with minimal loss of sensitivity but dramatic increases in precision.
Collapse
Affiliation(s)
- Ali Karimnezhad
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, K1N 9A7, Canada.
- Biostatistics and Risk Modelling Division, Bureau of Food Surveillance and Science Integration, Food Directorate, Health Products and Food Branch, Health Canada, Ottawa, K1A 0K9, Canada.
| | - Theodore J Perkins
- Regenerative Medicine Program, Ottawa Hospital Research Institute, Ottawa, K1H 8L6, Canada
- Department of Biochemistry, Microbiology and Immunology, University of Ottawa, Ottawa, K1H 8M5, Canada
| |
Collapse
|
9
|
Dhanushkumar T, M E S, Selvam PK, Rambabu M, Dasegowda KR, Vasudevan K, George Priya Doss C. Advancements and hurdles in the development of a vaccine for triple-negative breast cancer: A comprehensive review of multi-omics and immunomics strategies. Life Sci 2024; 337:122360. [PMID: 38135117 DOI: 10.1016/j.lfs.2023.122360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 12/15/2023] [Accepted: 12/15/2023] [Indexed: 12/24/2023]
Abstract
Triple-Negative Breast Cancer (TNBC) presents a significant challenge in oncology due to its aggressive behavior and limited therapeutic options. This review explores the potential of immunotherapy, particularly vaccine-based approaches, in addressing TNBC. It delves into the role of immunoinformatics in creating effective vaccines against TNBC. The review first underscores the distinct attributes of TNBC and the importance of tumor antigens in vaccine development. It then elaborates on antigen detection techniques such as exome sequencing, HLA typing, and RNA sequencing, which are instrumental in identifying TNBC-specific antigens and selecting vaccine candidates. The discussion then shifts to the in-silico vaccine development process, encompassing antigen selection, epitope prediction, and rational vaccine design. This process merges computational simulations with immunological insights. The role of Artificial Intelligence (AI) in expediting the prediction of antigens and epitopes is also emphasized. The review concludes by encapsulating how Immunoinformatics can augment the design of TNBC vaccines, integrating tumor antigens, advanced detection methods, in-silico strategies, and AI-driven insights to advance TNBC immunotherapy. This could potentially pave the way for more targeted and efficacious treatments.
Collapse
Affiliation(s)
- T Dhanushkumar
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - Santhosh M E
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - Prasanna Kumar Selvam
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - Majji Rambabu
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - K R Dasegowda
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India
| | - Karthick Vasudevan
- Department of Biotechnology, School of Applied Sciences, REVA University, Bengaluru 560064, India.
| | - C George Priya Doss
- Laboratory of Integrative Genomics, Department of Integrative Biology, School of BioSciences and Technology, Vellore Institute of Technology (VIT), Vellore, India.
| |
Collapse
|
10
|
Abdelwahab O, Belzile F, Torkamaneh D. Performance analysis of conventional and AI-based variant callers using short and long reads. BMC Bioinformatics 2023; 24:472. [PMID: 38097928 PMCID: PMC10720095 DOI: 10.1186/s12859-023-05596-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Accepted: 12/04/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND The accurate detection of variants is essential for genomics-based studies. Currently, there are various tools designed to detect genomic variants, however, it has always been a challenge to decide which tool to use, especially when various major genome projects have chosen to use different tools. Thus far, most of the existing tools were mainly developed to work on short-read data (i.e., Illumina); however, other sequencing technologies (e.g. PacBio, and Oxford Nanopore) have recently shown that they can also be used for variant calling. In addition, with the emergence of artificial intelligence (AI)-based variant calling tools, there is a pressing need to compare these tools in terms of efficiency, accuracy, computational power, and ease of use. RESULTS In this study, we evaluated five of the most widely used conventional and AI-based variant calling tools (BCFTools, GATK4, Platypus, DNAscope, and DeepVariant) in terms of accuracy and computational cost using both short-read and long-read data derived from three different sequencing technologies (Illumina, PacBio HiFi, and ONT) for the same set of samples from the Genome In A Bottle project. The analysis showed that AI-based variant calling tools supersede conventional ones for calling SNVs and INDELs using both long and short reads in most aspects. In addition, we demonstrate the advantages and drawbacks of each tool while ranking them in each aspect of these comparisons. CONCLUSION This study provides best practices for variant calling using AI-based and conventional variant callers with different types of sequencing data.
Collapse
Affiliation(s)
- Omar Abdelwahab
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
- Institut intelligence et données (IID), Université Laval, Québec, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Québec, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada
| | - Davoud Torkamaneh
- Département de Phytologie, Université Laval, Québec, Canada.
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, Canada.
- Centre de recherche et d'innovation sur les végétaux (CRIV), Université Laval, Québec, Canada.
- Institut intelligence et données (IID), Université Laval, Québec, Canada.
| |
Collapse
|
11
|
Cabello-Aguilar S, Vendrell JA, Solassol J. A Bioinformatics Toolkit for Next-Generation Sequencing in Clinical Oncology. Curr Issues Mol Biol 2023; 45:9737-9752. [PMID: 38132454 PMCID: PMC10741970 DOI: 10.3390/cimb45120608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 11/28/2023] [Accepted: 12/02/2023] [Indexed: 12/23/2023] Open
Abstract
Next-generation sequencing (NGS) has taken on major importance in clinical oncology practice. With the advent of targeted therapies capable of effectively targeting specific genomic alterations in cancer patients, the development of bioinformatics processes has become crucial. Thus, bioinformatics pipelines play an essential role not only in the detection and in identification of molecular alterations obtained from NGS data but also in the analysis and interpretation of variants, making it possible to transform raw sequencing data into meaningful and clinically useful information. In this review, we aim to examine the multiple steps of a bioinformatics pipeline as used in current clinical practice, and we also provide an updated list of the necessary bioinformatics tools. This resource is intended to assist researchers and clinicians in their genetic data analyses, improving the precision and efficiency of these processes in clinical research and patient care.
Collapse
Affiliation(s)
- Simon Cabello-Aguilar
- Montpellier BioInformatics for Clinical Diagnosis (MOBIDIC), Molecular Medicine and Genomics Platform (PMMG), CHU Montpellier, 34295 Montpellier, France
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France; (J.A.V.); (J.S.)
| | - Julie A. Vendrell
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France; (J.A.V.); (J.S.)
| | - Jérôme Solassol
- Laboratoire de Biologie des Tumeurs Solides, Département de Pathologie et Oncobiologie, CHU Montpellier, Université de Montpellier, 34295 Montpellier, France; (J.A.V.); (J.S.)
| |
Collapse
|
12
|
Beeler JS, Bolton KL. How low can you go?: Methodologic considerations in clonal hematopoiesis variant calling. Leuk Res 2023; 135:107419. [PMID: 37956474 DOI: 10.1016/j.leukres.2023.107419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023]
Abstract
Clonal hematopoiesis (CH) is defined by the presence of an expanded clonal hematopoietic cell population due to an acquired mutation conferring a selective growth advantage and is known to predispose to hematologic malignancy. In this review, we discuss sequencing methods for CH detection in bulk sequencing data and corresponding bioinformatic approaches for variant calling, filtering, and curation. We detail practical recommendations for CH calling. Finally, we discuss how improvements in CH sequencing and bioinformatic approaches will enable the characterization of CH trajectories, its impact on human health, and therapeutic approaches to mitigate its adverse effects.
Collapse
Affiliation(s)
- J Scott Beeler
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Kelly L Bolton
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
13
|
Wang D, Wang S, Zhang Y, Cheng X, Huang X, Han Y, Chen Z, Liu C, Li J, Zhang R. Validation and benchmarking of targeted panel sequencing for cancer genomic profiling. Am J Clin Pathol 2023; 160:507-523. [PMID: 37477357 DOI: 10.1093/ajcp/aqad078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 06/22/2023] [Indexed: 07/22/2023] Open
Abstract
OBJECTIVES To validate a large next-generation sequencing (NGS) panel for comprehensive genomic profiling and improve patient access to more effective precision oncology treatment strategies. METHODS OncoPanScan was designed by targeting 825 cancer-related genes to detect a broad range of genomic alterations. A practical validation strategy was used to evaluate the assay's analytical performance, involving 97 tumor specimens with 25 paired blood specimens, 10 engineered cell lines, and 121 artificial reference DNA samples. RESULTS Overall, 1107 libraries were prepared and the sequencing failure rate was 0.18%. Across alteration classes, sensitivity ranged from 0.938 to more than 0.999, specificity ranged from 0.889 to more than 0.999, positive predictive value ranged from 0.867 to more than 0.999, repeatability ranged from 0.908 to more than 0.999, and reproducibility ranged from 0.832 to more than 0.999. The limit of detection for variants was established based on variant frequency, while for tumor mutation burden and microsatellite instability, it was based on tumor content, resulting in a minimum requirement of 20% tumor content. Benchmarking variant calls against validated NGS assays revealed that variations in the dry-bench processes were the primary cause of discordances. CONCLUSIONS This study presents a detailed validation framework and empirical recommendations for large panel validation and elucidates the sources of discordant alteration calls by comparing with "gold standard measures."
Collapse
Affiliation(s)
- Duo Wang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | | | - Yuanfeng Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | | | - Xin Huang
- Genetron Health (Beijing), Beijing, China
| | - Yanxi Han
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | | | - Cong Liu
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, Beijing, China
- National Center for Clinical Laboratories, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
- Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, China
| |
Collapse
|
14
|
Majidian S, Agustinho DP, Chin CS, Sedlazeck FJ, Mahmoud M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol 2023; 24:221. [PMID: 37798733 PMCID: PMC10552390 DOI: 10.1186/s13059-023-03061-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Accepted: 09/18/2023] [Indexed: 10/07/2023] Open
Abstract
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Collapse
Affiliation(s)
- Sina Majidian
- Department of Computational Biology, University of Lausanne, 1015, Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015, Lausanne, Switzerland
| | | | | | - Fritz J Sedlazeck
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Computer Science, Rice University, 6100 Main Street, Houston, TX, 77005, USA.
| | - Medhat Mahmoud
- Baylor College of Medicine, Human Genome Sequencing Center, Houston, TX, 77030, USA.
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
15
|
Bzikadze AV, Pevzner PA. UniAligner: a parameter-free framework for fast sequence alignment. Nat Methods 2023; 20:1346-1354. [PMID: 37580559 DOI: 10.1038/s41592-023-01970-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Accepted: 07/05/2023] [Indexed: 08/16/2023]
Abstract
Even though the recent advances in 'complete genomics' revealed the previously inaccessible genomic regions, analysis of variations in centromeres and other extra-long tandem repeats (ETRs) faces an algorithmic challenge since there are currently no tools for accurate sequence comparison of ETRs. Counterintuitively, the classical alignment approaches, such as the Smith-Waterman algorithm, fail to construct biologically adequate alignments of ETRs. We present UniAligner-the parameter-free sequence alignment algorithm with sequence-dependent alignment scoring that automatically changes for any pair of compared sequences. UniAligner prioritizes matches of rare substrings that are more likely to be relevant to the evolutionary relationship between two sequences. We apply UniAligner to estimate the mutation rates in human centromeres, and quantify the extremely high rate of large duplications and deletions in centromeres. This high rate suggests that centromeres may represent some of the most rapidly evolving regions of the human genome with respect to their structural organization.
Collapse
Affiliation(s)
- Andrey V Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
16
|
Wang Y, Du H, Dai W, Bao C, Zhang X, Hu Y, Xie Z, Zhao X, Li C, Zhang W, Wu R. Diagnostic Potential of Endometrial Cancer DNA from Pipelle, Pap-Brush, and Swab Sampling. Cancers (Basel) 2023; 15:3522. [PMID: 37444632 DOI: 10.3390/cancers15133522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 06/29/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023] Open
Abstract
Endometrial cancer (EC) is a major gynecological malignancy with rising morbidity and mortality worldwide. The aim of this study was to explore a safe and readily available sample and a sensitive and effective detection method and its biomarkers for early diagnosis of EC, which is critical for patient prognosis. This study designed a panel targeting variants for EC-related genes, assessed its technical performance by comparing it with whole-exon sequencing, and explored the diagnostic potential of endometrial biopsies using the Pipelle aspirator, cervical samples using the Pap brush, and vaginal specimens using the swab from 38 EC patients and 208 women with risk factors for EC by applying targeted panel sequencing (TPS). TPS produced high-quality data (Q30 > 85% and mapping ratios > 99.35%) and was found to have strong consistency with whole-exome sequencing (WES) in detecting pathogenic mutations (92.11%), calculating homologous recombination deficiency (HRD) scores (r = 0.65), and assessing the microsatellite instability (MSI) status of EC (100%). The sensitivity of TPS in detection of EC is slightly better than that of WES (86.84% vs. 84.21%). Of the three types of samples detected using TPS, endometrial biopsy using the Pipelle aspirator had the highest sensitivity in detection of pathogenic mutations (81.87%) and the best consistency with surgical tumor specimens in MSI (85.16%). About 84% of EC patients contained pathogenic mutations in PIK3CA, PTEN, TP53, ARID1A, CTNNB1, KRAS, and MTOR, suggesting that this small gene set can achieve an excellent pathogenic mutation detection rate in Chinese EC patients. The custom panel combined with ultra-deep sequencing serves as a sensitive method for detecting genetic lesions from endometrial biopsy using the Pipelle aspirator.
Collapse
Affiliation(s)
- Yinan Wang
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- School of Medicine, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen 518055, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| | - Hui Du
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| | - Wenkui Dai
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| | - Cuijun Bao
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| | - Xi Zhang
- Department of Clinical Medicine, Xi'an Jiaotong University, Xi'an 710049, China
| | - Yan Hu
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| | - Zhiyu Xie
- Department of Clinical Medicine, Xi'an Jiaotong University, Xi'an 710049, China
| | - Xin Zhao
- China National GeneBank, BGI-Shenzhen, Shenzhen 518116, China
| | - Changzhong Li
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| | - Wenyong Zhang
- School of Medicine, Southern University of Science and Technology, 1088 Xueyuan Avenue, Shenzhen 518055, China
| | - Ruifang Wu
- Department of Obstetrics and Gynecology, Peking University Shenzhen Hospital, Shenzhen 518036, China
- Institute of Obstetrics and Gynecology, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
- Shenzhen Key Laboratory on Technology for Early Diagnosis of Major Gynecologic Diseases, Shenzhen 518036, China
| |
Collapse
|
17
|
Ji S, Zhu T, Sethia A, Wang W. Accelerated somatic mutation calling for whole-genome and whole-exome sequencing data from heterogenous tumor samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.04.547569. [PMID: 37461467 PMCID: PMC10350007 DOI: 10.1101/2023.07.04.547569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Accurate detection of somatic mutations in DNA sequencing data is a fundamental prerequisite for cancer research. Previous analytical challenge was overcome by consensus mutation calling from four to five popular callers. This, however, increases the already nontrivial computing time from individual callers. Here, we launch MuSE2.0, powered by multi-step parallelization and efficient memory allocation, to resolve the computing time bottleneck. MuSE2.0 speeds up 50 times than MuSE1.0 and 8-80 times than other popular callers. Our benchmark study suggests combining MuSE2.0 and the recently expedited Strelka2 can achieve high efficiency and accuracy in analyzing large cancer genomic datasets.
Collapse
Affiliation(s)
- Shuangxi Ji
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tong Zhu
- NVIDIA Corporation, Santa Clara, CA, USA
| | | | - Wenyi Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
18
|
Trevarton AJ, Chang JT, Symmans WF. Simple combination of multiple somatic variant callers to increase accuracy. Sci Rep 2023; 13:8463. [PMID: 37231022 DOI: 10.1038/s41598-023-34925-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 05/10/2023] [Indexed: 05/27/2023] Open
Abstract
Publications comparing variant caller algorithms present discordant results with contradictory rankings. Caller performances are inconsistent and wide ranging, and dependent upon input data, application, parameter settings, and evaluation metric. With no single variant caller emerging as a superior standard, combinations or ensembles of variant callers have appeared in the literature. In this study, a whole genome somatic reference standard was used to derive principles to guide strategies for combining variant calls. Then, manually annotated variants called from the whole exome sequencing of a tumor were used to corroborate these general principles. Finally, we examined the ability of these principles to reduce noise in targeted sequencing.
Collapse
Affiliation(s)
- Alexander J Trevarton
- School of Biological Sciences, Faculty of Science, University of Auckland, Auckland, New Zealand.
| | - Jeffrey T Chang
- Department of Integrative Biology and Pharmacology, The University of Texas Health Sciences Center, Houston, USA
| | - W Fraser Symmans
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, USA
| |
Collapse
|
19
|
Liu Y, Wang S, Wang Y, Li Y, Zhu X, Lai X, Zhang X, Li X, Xiao X, Wang J. What makes TMB an ambivalent biomarker for immunotherapy? A subtle mismatch between the sample-based design of variant callers and real clinical cohort. Front Immunol 2023; 14:1151224. [PMID: 37304296 PMCID: PMC10248171 DOI: 10.3389/fimmu.2023.1151224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Accepted: 05/15/2023] [Indexed: 06/13/2023] Open
Abstract
Tumor mutation burden (TMB) is a widely recognized biomarker for predicting the efficacy of immunotherapy. However, its use still remains highly controversial. In this study, we examine the underlying causes of this controversy based on clinical needs. By tracing the source of the TMB errors and analyzing the design philosophy behind variant callers, we identify the conflict between the incompleteness of biostatistics rules and the variety of clinical samples as the critical issue that renders TMB an ambivalent biomarker. A series of experiments were conducted to illustrate the challenges of mutation detection in clinical practice. Additionally, we also discuss potential strategies for overcoming these conflict issues to enable the application of TMB in guiding decision-making in real clinical settings.
Collapse
Affiliation(s)
- Yuqian Liu
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Shenjie Wang
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Yixuan Wang
- Department of Biomedical Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
| | - Yifei Li
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Xiaoyan Zhu
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Xin Lai
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Xuanping Zhang
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Xuqi Li
- Department of General Surgery, The First Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Xiao Xiao
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Geneplus Shenzhen, Shenzhen, China
| | - Jiayin Wang
- School of Computer Science and Technology, Faculty of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| |
Collapse
|
20
|
Li S, Hu R, Small C, Kang TY, Liu CC, Zhou XJ, Li W. cfSNV: a software tool for the sensitive detection of somatic mutations from cell-free DNA. Nat Protoc 2023; 18:1563-1583. [PMID: 36849599 PMCID: PMC10411976 DOI: 10.1038/s41596-023-00807-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 11/24/2022] [Indexed: 03/01/2023]
Abstract
Cell-free DNA (cfDNA) in blood, viewed as a surrogate for tumor biopsy, has many clinical applications, including diagnosing cancer, guiding cancer treatment and monitoring treatment response. All these applications depend on an indispensable, yet underdeveloped task: detecting somatic mutations from cfDNA. The task is challenging because of the low tumor fraction in cfDNA. Recently, we developed the computational method cfSNV, the first method that comprehensively considers the properties of cfDNA for the sensitive detection of mutations from cfDNA. cfSNV vastly outperformed the conventional methods that were developed primarily for calling mutations from solid tumor tissues. cfSNV can accurately detect mutations in cfDNA even with medium-coverage (e.g., ≥200×) sequencing, which makes whole-exome sequencing (WES) of cfDNA a viable option for various clinical utilities. Here, we present a user-friendly cfSNV package that exhibits fast computation and convenient user options. We also built a Docker image of it, which is designed to enable researchers and clinicians with a limited computational background to easily carry out analyses on both high-performance computing platforms and local computers. Mutation calling from a standard preprocessed WES dataset (~250× and ~70 million base pair target size) can be carried out in 3 h on a server with eight virtual CPUs and 32 GB of random access memory.
Collapse
Affiliation(s)
- Shuo Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
| | - Ran Hu
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
- Bioinformatics Interdepartmental Graduate Program, University of California at Los Angeles, Los Angeles, CA, USA
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, CA, USA
| | - Colin Small
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, CA, USA
| | | | - Chun-Chi Liu
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
- EarlyDiagnostics Inc., Los Angeles, CA, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA.
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, CA, USA.
- EarlyDiagnostics Inc., Los Angeles, CA, USA.
| | - Wenyuan Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA.
- EarlyDiagnostics Inc., Los Angeles, CA, USA.
| |
Collapse
|
21
|
Vaisband M, Schubert M, Gassner FJ, Geisberger R, Greil R, Zaborsky N, Hasenauer J. Validation of genetic variants from NGS data using deep convolutional neural networks. BMC Bioinformatics 2023; 24:158. [PMID: 37081386 PMCID: PMC10116675 DOI: 10.1186/s12859-023-05255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Accurate somatic variant calling from next-generation sequencing data is one most important tasks in personalised cancer therapy. The sophistication of the available technologies is ever-increasing, yet, manual candidate refinement is still a necessary step in state-of-the-art processing pipelines. This limits reproducibility and introduces a bottleneck with respect to scalability. We demonstrate that the validation of genetic variants can be improved using a machine learning approach resting on a Convolutional Neural Network, trained using existing human annotation. In contrast to existing approaches, we introduce a way in which contextual data from sequencing tracks can be included into the automated assessment. A rigorous evaluation shows that the resulting model is robust and performs on par with trained researchers following published standard operating procedure.
Collapse
Affiliation(s)
- Marc Vaisband
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria.
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany.
| | - Maria Schubert
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Franz Josef Gassner
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Roland Geisberger
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Richard Greil
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Nadja Zaborsky
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| |
Collapse
|
22
|
Xia H, McMichael J, Becker-Hapak M, Onyeador OC, Buchli R, McClain E, Pence P, Supabphol S, Richters MM, Basu A, Ramirez CA, Puig-Saus C, Cotto KC, Freshour SL, Hundal J, Kiwala S, Goedegebuure SP, Johanns TM, Dunn GP, Ribas A, Miller CA, Gillanders WE, Fehniger TA, Griffith OL, Griffith M. Computational prediction of MHC anchor locations guides neoantigen identification and prioritization. Sci Immunol 2023; 8:eabg2200. [PMID: 37027480 PMCID: PMC10450883 DOI: 10.1126/sciimmunol.abg2200] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Accepted: 03/16/2023] [Indexed: 04/09/2023]
Abstract
Neoantigens are tumor-specific peptide sequences resulting from sources such as somatic DNA mutations. Upon loading onto major histocompatibility complex (MHC) molecules, they can trigger recognition by T cells. Accurate neoantigen identification is thus critical for both designing cancer vaccines and predicting response to immunotherapies. Neoantigen identification and prioritization relies on correctly predicting whether the presenting peptide sequence can successfully induce an immune response. Because most somatic mutations are single-nucleotide variants, changes between wild-type and mutated peptides are typically subtle and require cautious interpretation. A potentially underappreciated variable in neoantigen prediction pipelines is the mutation position within the peptide relative to its anchor positions for the patient's specific MHC molecules. Whereas a subset of peptide positions are presented to the T cell receptor for recognition, others are responsible for anchoring to the MHC, making these positional considerations critical for predicting T cell responses. We computationally predicted anchor positions for different peptide lengths for 328 common HLA alleles and identified unique anchoring patterns among them. Analysis of 923 tumor samples shows that 6 to 38% of neoantigen candidates are potentially misclassified and can be rescued using allele-specific knowledge of anchor positions. A subset of anchor results were orthogonally validated using protein crystallography structures. Representative anchor trends were experimentally validated using peptide-MHC stability assays and competition binding assays. By incorporating our anchor prediction results into neoantigen prediction pipelines, we hope to formalize, streamline, and improve the identification process for relevant clinical studies.
Collapse
Affiliation(s)
- Huiming Xia
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Joshua McMichael
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Michelle Becker-Hapak
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Onyinyechi C. Onyeador
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Rico Buchli
- Pure Protein LLC, Oklahoma City, OK 73104, USA
| | - Ethan McClain
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Patrick Pence
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Suangson Supabphol
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
- The Center of Excellence in Systems Biology, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Megan M. Richters
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Anamika Basu
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Cody A. Ramirez
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Cristina Puig-Saus
- Division of Hematology/Oncology, Department of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, Los Angeles, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Kelsy C. Cotto
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Sharon L. Freshour
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Jasreet Hundal
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - Susanna Kiwala
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
| | - S. Peter Goedegebuure
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| | - Tanner M. Johanns
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Gavin P. Dunn
- Department of Neurosurgery, Washington University School of Medicine, St. Louis, MO, USA
| | - Antoni Ribas
- Division of Hematology/Oncology, Department of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
- Jonsson Comprehensive Cancer Center, Los Angeles, CA, USA
- Parker Institute for Cancer Immunotherapy, San Francisco, CA, USA
| | - Christopher A. Miller
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| | - William E. Gillanders
- Department of Surgery, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
| | - Todd A. Fehniger
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Obi L. Griffith
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| | - Malachi Griffith
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
23
|
Morazán-Fernández D, Mora J, Molina-Mora JA. In Silico Pipeline to Identify Tumor-Specific Antigens for Cancer Immunotherapy Using Exome Sequencing Data. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:130-137. [PMID: 37197645 PMCID: PMC10110822 DOI: 10.1007/s43657-022-00084-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 11/09/2022] [Accepted: 11/15/2022] [Indexed: 05/19/2023]
Abstract
Tumor-specific antigens or neoantigens are peptides that are expressed only in cancer cells and not in healthy cells. Some of these molecules can induce an immune response, and therefore, their use in immunotherapeutic strategies based on cancer vaccines has been extensively explored. Studies based on these approaches have been triggered by the current high-throughput DNA sequencing technologies. However, there is no universal nor straightforward bioinformatic protocol to discover neoantigens using DNA sequencing data. Thus, we propose a bioinformatic protocol to detect tumor-specific antigens associated with single nucleotide variants (SNVs) or "mutations" in tumoral tissues. For this purpose, we used publicly available data to build our model, including exome sequencing data from colorectal cancer and healthy cells obtained from a single case, as well as frequent human leukocyte antigen (HLA) class I alleles in a specific population. HLA data from Costa Rican Central Valley population was selected as an example. The strategy included three main steps: (1) pre-processing of sequencing data; (2) variant calling analysis to detect tumor-specific SNVs in comparison with healthy tissue; and (3) prediction and characterization of peptides (protein fragments, the tumor-specific antigens) derived from the variants, in the context of their affinity with frequent alleles of the selected population. In our model data, we found 28 non-silent SNVs, present in 17 genes in chromosome one. The protocol yielded 23 strong binders peptides derived from the SNVs for frequent HLA class I alleles for the Costa Rican population. Although the analyses were performed as an example to implement the pipeline, to our knowledge, this is the first study of an in silico cancer vaccine using DNA sequencing data in the context of the HLA alleles. It is concluded that the standardized protocol was not only able to identify neoantigens in a specific but also provides a complete pipeline for the eventual design of cancer vaccines using the best bioinformatic practices. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-022-00084-9.
Collapse
Affiliation(s)
| | - Javier Mora
- Centro de Investigación de Enfermedades Tropicales, Centro de Investigación en Cirugía y Cáncer, and Facultad de Microbiología, Universidad de Costa Rica, San José, 2060 Costa Rica
| | - Jose Arturo Molina-Mora
- Centro de Investigación de Enfermedades Tropicales, Centro de Investigación en Cirugía y Cáncer, and Facultad de Microbiología, Universidad de Costa Rica, San José, 2060 Costa Rica
| |
Collapse
|
24
|
Evaluation of ctDNA in the Prediction of Response to Neoadjuvant Therapy and Prognosis in Locally Advanced Rectal Cancer Patients: A Prospective Study. Pharmaceuticals (Basel) 2023; 16:ph16030427. [PMID: 36986526 PMCID: PMC10057108 DOI: 10.3390/ph16030427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 02/28/2023] [Accepted: 03/03/2023] [Indexed: 03/18/2023] Open
Abstract
“Watch and wait” is becoming a common treatment option for patients with locally advanced rectal cancer (LARC) submitted to neoadjuvant treatment. However, currently, no clinical modality has an acceptable accuracy for predicting pathological complete response (pCR). The aim of this study was to assess the clinical utility of circulating tumor DNA (ctDNA) in predicting the response and prognosis in these patients. We prospectively enrolled a cohort of three Iberian centers between January 2020 and December 2021 and performed an analysis on the association of ctDNA with the main response outcomes and disease-free survival (DFS). The rate of pCR in the total sample was 15.3%. A total of 24 plasma samples from 18 patients were analyzed by next-generation sequencing. At baseline, mutations were detected in 38.9%, with the most common being TP53 and KRAS. Combination of either positive magnetic resonance imaging (MRI) extramural venous invasion (mrEMVI) and ctDNA increased the risk of poor response (p = 0.021). Also, patients with two mutations vs. those with fewer than two mutations had a worse DFS (p = 0.005). Although these results should be read carefully due to sample size, this study suggests that baseline ctDNA combined with mrEMVI could potentially help to predict the response and baseline ctDNA number of mutations might allow the discrimination of groups with different DFS. Further studies are needed to clarify the role of ctDNA as an independent tool in the selection and management of LARC patients.
Collapse
|
25
|
Larson NB, Oberg AL, Adjei AA, Wang L. A Clinician's Guide to Bioinformatics for Next-Generation Sequencing. J Thorac Oncol 2023; 18:143-157. [PMID: 36379355 PMCID: PMC9870988 DOI: 10.1016/j.jtho.2022.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 10/31/2022] [Accepted: 11/05/2022] [Indexed: 11/15/2022]
Abstract
Next-generation sequencing (NGS) technologies are high-throughput methods for DNA sequencing and have become a widely adopted tool in cancer research. The sheer amount and variety of data generated by NGS assays require sophisticated computational methods and bioinformatics expertise. In this review, we provide background details of NGS technology and basic bioinformatics concepts for the clinician investigator interested in cancer research applications, with a focus on DNA-based approaches. We introduce the general principles of presequencing library preparation, postsequencing alignment, and variant calling. We also highlight the common variant annotations and NGS applications for other molecular data types. Finally, we briefly discuss the revealed utility of NGS methods in NSCLC research and study design considerations for research studies that aim to leverage NGS technologies for clinical care.
Collapse
Affiliation(s)
- Nicholas Bradley Larson
- Division of Clinical Trials and Biostatistics, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota.
| | - Ann L Oberg
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota
| | - Alex A Adjei
- Taussig Cancer Institute, Cleveland Clinic, Cleveland, Ohio
| | - Liguo Wang
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic College of Medicine and Science, Rochester, Minnesota
| |
Collapse
|
26
|
Dhanda SK, Mahajan S, Manoharan M. Neoepitopes prediction strategies: an integration of cancer genomics and immunoinformatics approaches. Brief Funct Genomics 2023; 22:1-8. [PMID: 36398967 DOI: 10.1093/bfgp/elac041] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/28/2022] [Accepted: 10/14/2022] [Indexed: 11/19/2022] Open
Abstract
A major near-term medical impact of the genomic technology revolution will be the elucidation of mechanisms of cancer pathogenesis, leading to improvements in the diagnosis of cancer and the selection of cancer treatment. Next-generation sequencing technologies have accelerated the characterization of a tumor, leading to the comprehensive discovery of all the major alterations in a given cancer genome, followed by the translation of this information using computational and immunoinformatics approaches to cancer diagnostics and therapeutic efforts. In the current article, we review various components of cancer immunoinformatics applied to a series of fields of cancer research, including computational tools for cancer mutation detection, cancer mutation and immunological databases, and computational vaccinology.
Collapse
Affiliation(s)
- Sandeep Kumar Dhanda
- Department of Oncology, St Jude Children's Research Hospital, Memphis, TN 38105, USA
| | - Swapnil Mahajan
- DeepKnomics Labs Private Limited, 7014 Prestige Garden Bay, IVRI Road, Avalahalli, Behind CRPF Campus, Yelahanka, Bangalore 560064, India
| | - Malini Manoharan
- DeepKnomics Labs Private Limited, 7014 Prestige Garden Bay, IVRI Road, Avalahalli, Behind CRPF Campus, Yelahanka, Bangalore 560064, India
| |
Collapse
|
27
|
Vilov S, Heinig M. DeepSom: a CNN-based approach to somatic variant calling in WGS samples without a matched normal. Bioinformatics 2023; 39:6986966. [PMID: 36637201 PMCID: PMC9843587 DOI: 10.1093/bioinformatics/btac828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 12/19/2022] [Accepted: 01/12/2023] [Indexed: 01/14/2023] Open
Abstract
MOTIVATION Somatic mutations are usually called by analyzing the DNA sequence of a tumor sample in conjunction with a matched normal. However, a matched normal is not always available, for instance, in retrospective analysis or diagnostic settings. For such cases, tumor-only somatic variant calling tools need to be designed. Previously proposed approaches demonstrate inferior performance on whole-genome sequencing (WGS) samples. RESULTS We present the convolutional neural network-based approach called DeepSom for detecting somatic single nucleotide polymorphism and short insertion and deletion variants in tumor WGS samples without a matched normal. We validate DeepSom by reporting its performance on five different cancer datasets. We also demonstrate that on WGS samples DeepSom outperforms previously proposed methods for tumor-only somatic variant calling. AVAILABILITY AND IMPLEMENTATION DeepSom is available as a GitHub repository at https://github.com/heiniglab/DeepSom. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sergey Vilov
- Institute of Computational Biology, Computational Health Center, Helmholtz Zentrum München Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), 85764 Neuherberg, Germany
| | | |
Collapse
|
28
|
Seillier L, Peifer M. Reconstructing Phylogenetic Relationship in Bladder Cancer: A Methodological Overview. Methods Mol Biol 2023; 2684:113-132. [PMID: 37410230 DOI: 10.1007/978-1-0716-3291-8_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/07/2023]
Abstract
Bladder cancer (BC) expresses itself as a highly heterogeneous disease both at the histological and molecular level, often occurring as synchronous or metachronous multifocal disease with high risk of recurrence and potential to metastasize. Multiple sequencing studies focusing on both non-muscle-invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC) gave insights into the extent of both inter- and intrapatient heterogeneity, but many questions on clonal evolution in BC remain unanswered. In this review article, we provide an overview over the technical and theoretical concepts linked to reconstructing evolutionary trajectories in BC and propose a set of tools and established software for phylogenetic analysis.
Collapse
Affiliation(s)
| | - Martin Peifer
- Department of Translational Genomics, University of Cologne, Cologne, Germany
| |
Collapse
|
29
|
Craven KE, Fischer CG, Jiang L, Pallavajjala A, Lin MT, Eshleman JR. Optimizing Insertion and Deletion Detection Using Next-Generation Sequencing in the Clinical Laboratory. J Mol Diagn 2022; 24:1217-1231. [PMID: 36162758 PMCID: PMC9808503 DOI: 10.1016/j.jmoldx.2022.08.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 07/18/2022] [Accepted: 08/31/2022] [Indexed: 01/13/2023] Open
Abstract
Detection of insertions and deletions (InDels) by short-read next-generation sequencing (NGS) technology can be challenging because of frequent misaligned reads. A systematic analysis of short InDels (1 to 30 bases) and fms-related receptor tyrosine kinase 3 (FLT3) internal tandem duplications (ITDs; 6 to 183 bases) from 46 clinical cases of solid or hematologic malignancy processed with a clinical NGS assay identified misaligned reads in every case, ranging from 3% to 100% of reads with the InDel showing mismapped bases. Mismaps also increased with InDel size. As a consequence, the clinical NGS bioinformatics pipeline undercalled the variant allele frequency by 1% to 84%, incorrectly called simultaneous single-base substitutions along with InDels, or did not report an FLT3 ITD that had been detected by capillary electrophoresis. To improve the ability of the pipeline to better detect and quantify InDels, we utilized a software program called Assembly-Based ReAligner (ABRA2) to more accurately remap reads. ABRA2 was able to correct 41% to 100% of the reads with mismapped bases and led to absolute increases in the variant allele frequency from 1% to 61% along with correction of all of the single-base substitutions except for two cases. ABRA2 could also detect multiple FLT3 ITD clones except for one 183-base ITD. Our analysis has found that ABRA2 performs well on short InDels as well as FLT3 ITDs that are <100 bases.
Collapse
Affiliation(s)
- Kelly E Craven
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Catherine G Fischer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Division of Cancer Prevention, National Cancer Institute, Rockville, Maryland
| | - LiQun Jiang
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Aparna Pallavajjala
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Ming-Tseh Lin
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - James R Eshleman
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland; The Sol Goldman Pancreatic Cancer Research Center, Johns Hopkins University School of Medicine, Baltimore, Maryland.
| |
Collapse
|
30
|
Muñoz-Barrera A, Rubio-Rodríguez LA, Díaz-de Usera A, Jáspez D, Lorenzo-Salazar JM, González-Montelongo R, García-Olivares V, Flores C. From Samples to Germline and Somatic Sequence Variation: A Focus on Next-Generation Sequencing in Melanoma Research. Life (Basel) 2022; 12:1939. [PMID: 36431075 PMCID: PMC9695713 DOI: 10.3390/life12111939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/12/2022] [Accepted: 11/16/2022] [Indexed: 11/24/2022] Open
Abstract
Next-generation sequencing (NGS) applications have flourished in the last decade, permitting the identification of cancer driver genes and profoundly expanding the possibilities of genomic studies of cancer, including melanoma. Here we aimed to present a technical review across many of the methodological approaches brought by the use of NGS applications with a focus on assessing germline and somatic sequence variation. We provide cautionary notes and discuss key technical details involved in library preparation, the most common problems with the samples, and guidance to circumvent them. We also provide an overview of the sequence-based methods for cancer genomics, exposing the pros and cons of targeted sequencing vs. exome or whole-genome sequencing (WGS), the fundamentals of the most common commercial platforms, and a comparison of throughputs and key applications. Details of the steps and the main software involved in the bioinformatics processing of the sequencing results, from preprocessing to variant prioritization and filtering, are also provided in the context of the full spectrum of genetic variation (SNVs, indels, CNVs, structural variation, and gene fusions). Finally, we put the emphasis on selected bioinformatic pipelines behind (a) short-read WGS identification of small germline and somatic variants, (b) detection of gene fusions from transcriptomes, and (c) de novo assembly of genomes from long-read WGS data. Overall, we provide comprehensive guidance across the main methodological procedures involved in obtaining sequencing results for the most common short- and long-read NGS platforms, highlighting key applications in melanoma research.
Collapse
Affiliation(s)
- Adrián Muñoz-Barrera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Luis A. Rubio-Rodríguez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Ana Díaz-de Usera
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, 38010 Santa Cruz de Tenerife, Spain
| | - David Jáspez
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - José M. Lorenzo-Salazar
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Rafaela González-Montelongo
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Víctor García-Olivares
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Research Unit, Hospital Universitario Nuestra Señora de Candelaria, 38010 Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Facultad de Ciencias de la Salud, Universidad Fernando de Pessoa Canarias, 35450 Las Palmas de Gran Canaria, Spain
| |
Collapse
|
31
|
Batlle-Masó L, Garcia-Prat M, Parra-Martínez A, Franco-Jarava C, Aguiló-Cucurull A, Velasco P, Antolín M, Rivière JG, Martín-Nalda A, Soler-Palacín P, Martínez-Gallo M, Colobran R. Detection and evolutionary dynamics of somatic FAS variants in autoimmune lymphoproliferative syndrome: Diagnostic implications. Front Immunol 2022; 13:1014984. [PMID: 36466883 PMCID: PMC9716137 DOI: 10.3389/fimmu.2022.1014984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/24/2022] [Indexed: 11/21/2022] Open
Abstract
Autoimmune lymphoproliferative syndrome (ALPS) is a rare primary immune disorder characterized by impaired apoptotic homeostasis. The clinical characteristics include lymphoproliferation, autoimmunity (mainly cytopenia), and an increased risk of lymphoma. A distinctive biological feature is accumulation (>2.5%) of an abnormal cell subset composed of TCRαβ+ CD4-CD8- T cells (DNTs). The most common genetic causes of ALPS are monoallelic pathogenic variants in the FAS gene followed by somatic FAS variants, mainly restricted to DNTs. Identification of somatic FAS variants has been typically addressed by Sanger sequencing in isolated DNTs. However, this approach can be costly and technically challenging, and may not be successful in patients with normal DNT counts receiving immunosuppressive treatment. In this study, we identified a novel somatic mutation in FAS (c.718_719insGTCG) by Sanger sequencing on purified CD3+ cells. We then followed the evolutionary dynamics of the variant along time with an NGS-based approach involving deep amplicon sequencing (DAS) at high coverage (20,000-30,000x). Over five years of clinical follow-up, we obtained six blood samples for molecular study from the pre-treatment (DNTs>7%) and treatment (DNTs<2%) periods. DAS enabled detection of the somatic variant in all samples, even the one obtained after five years of immunosuppressive treatment (DNTs: 0.89%). The variant allele frequency (VAF) range was 4%-5% in pre-treatment samples and <1.5% in treatment samples, and there was a strong positive correlation between DNT counts and VAF (Pearson’s R: 0.98, p=0.0003). We then explored whether the same approach could be used in a discovery setting. In the last follow-up sample (DNT: 0.89%) we performed somatic variant calling on the FAS exon 9 DAS data from whole blood and purified CD3+ cells using VarScan 2. The c.718_719insGTCG variant was identified in both samples and showed the highest VAF (0.67% blood, 1.58% CD3+ cells) among >400 variants called. In summary, our study illustrates the evolutionary dynamics of a somatic FAS mutation before and during immunosuppressive treatment. The results show that pathogenic somatic FAS variants can be identified with the use of DAS in whole blood of ALPS patients regardless of their DNT counts.
Collapse
Affiliation(s)
- Laura Batlle-Masó
- Infection in Immunocompromised Pediatric Patients Research Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Pediatric Infectious Diseases and Immunodeficiencies Unit, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
| | - Marina Garcia-Prat
- Infection in Immunocompromised Pediatric Patients Research Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Pediatric Infectious Diseases and Immunodeficiencies Unit, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
| | - Alba Parra-Martínez
- Infection in Immunocompromised Pediatric Patients Research Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Pediatric Infectious Diseases and Immunodeficiencies Unit, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
| | - Clara Franco-Jarava
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
- Translational Immunology Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Immunology Division, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
| | - Aina Aguiló-Cucurull
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
- Translational Immunology Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Immunology Division, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
| | - Pablo Velasco
- Pediatric Oncology and Hematology Department, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
| | - María Antolín
- Department of Clinical and Molecular Genetics, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
| | - Jacques G. Rivière
- Infection in Immunocompromised Pediatric Patients Research Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Pediatric Infectious Diseases and Immunodeficiencies Unit, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
| | - Andrea Martín-Nalda
- Infection in Immunocompromised Pediatric Patients Research Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Pediatric Infectious Diseases and Immunodeficiencies Unit, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
| | - Pere Soler-Palacín
- Infection in Immunocompromised Pediatric Patients Research Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Pediatric Infectious Diseases and Immunodeficiencies Unit, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
| | - Mónica Martínez-Gallo
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
- Translational Immunology Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Immunology Division, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Department of Cell Biology, Autonomous University of Barcelona (UAB), Physiology and Immunology, Bellaterra, Spain
| | - Roger Colobran
- Jeffrey Modell Diagnostic and Research Center for Primary Immunodeficiencies, Barcelona, Spain
- Translational Immunology Group, Vall d’Hebron Research Institute (VHIR), Barcelona, Spain
- Immunology Division, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Department of Clinical and Molecular Genetics, Vall d’Hebron University Hospital (HUVH), Barcelona, Spain
- Department of Cell Biology, Autonomous University of Barcelona (UAB), Physiology and Immunology, Bellaterra, Spain
- *Correspondence: Roger Colobran,
| |
Collapse
|
32
|
Genestet C, Refrégier G, Hodille E, Zein-Eddine R, Le Meur A, Hak F, Barbry A, Westeel E, Berland JL, Engelmann A, Verdier I, Lina G, Ader F, Dray S, Jacob L, Massol F, Venner S, Dumitrescu O. Mycobacterium tuberculosis genetic features associated with pulmonary tuberculosis severity. Int J Infect Dis 2022; 125:74-83. [PMID: 36273524 DOI: 10.1016/j.ijid.2022.10.026] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 10/13/2022] [Accepted: 10/15/2022] [Indexed: 11/06/2022] Open
Abstract
OBJECTIVES Mycobacterium tuberculosis (Mtb) infections result in a wide spectrum of clinical presentations but without proven Mtb genetic determinants. Herein, we hypothesized that the genetic features of Mtb clinical isolates, such as specific polymorphisms or microdiversity, may be linked to tuberculosis (TB) severity. METHODS A total of 234 patients with pulmonary TB (including 193 drug-susceptible and 14 monoresistant cases diagnosed between 2017 and 2020 and 27 multidrug-resistant cases diagnosed between 2010 and 2020) were stratified according to TB disease severity, and Mtb genetic features were explored using whole genome sequencing, including heterologous single-nucleotide polymorphism (SNP), calling to explore microdiversity. Finally, we performed a structural equation modeling analysis to relate TB severity to Mtb genetic features. RESULTS The clinical isolates from patients with mild TB carried mutations in genes associated with host-pathogen interaction, whereas those from patients with moderate/severe TB carried mutations associated with regulatory mechanisms. Genome-wide association study identified an SNP in the promoter of the gene coding for the virulence regulator espR, statistically associated with moderate/severe disease. Structural equation modeling and model comparisons indicated that TB severity was associated with the detection of Mtb microdiversity within clinical isolates and to the espR SNP. CONCLUSION Taken together, these results provide a new insight to better understand TB pathophysiology and could provide a new prognosis tool for pulmonary TB severity.
Collapse
Affiliation(s)
- Charlotte Genestet
- CIRI - Centre International de Recherche en Infectiologie, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon-1, Rhône-Alpes, Lyon, France; Hospices Civils de Lyon, Institut des Agents Infectieux, Laboratoire de bactériologie, Rhône-Alpes, Lyon, France.
| | - Guislaine Refrégier
- Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique et Evolution, Île-de-France, Orsay, France.; Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, Université Paris-Saclay, Île-de-France, Gif-sur-Yvette, France
| | - Elisabeth Hodille
- CIRI - Centre International de Recherche en Infectiologie, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon-1, Rhône-Alpes, Lyon, France; Hospices Civils de Lyon, Institut des Agents Infectieux, Laboratoire de bactériologie, Rhône-Alpes, Lyon, France
| | - Rima Zein-Eddine
- Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique et Evolution, Île-de-France, Orsay, France.; Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, Université Paris-Saclay, Île-de-France, Gif-sur-Yvette, France; Laboratory of Optics and Biosciences, CNRS-INSERM-Ecole Polytechnique, Île-de-France, Palaiseau, France
| | - Adrien Le Meur
- Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique et Evolution, Île-de-France, Orsay, France.; Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, Université Paris-Saclay, Île-de-France, Gif-sur-Yvette, France
| | - Fiona Hak
- Université Paris-Saclay, CNRS, AgroParisTech, Ecologie Systématique et Evolution, Île-de-France, Orsay, France.; Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris Sud, Université Paris-Saclay, Île-de-France, Gif-sur-Yvette, France
| | - Alexia Barbry
- CIRI - Centre International de Recherche en Infectiologie, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon-1, Rhône-Alpes, Lyon, France; Hospices Civils de Lyon, Institut des Agents Infectieux, Laboratoire de bactériologie, Rhône-Alpes, Lyon, France
| | - Emilie Westeel
- Fondation Mérieux, Emerging Pathogens Laboratory, Rhône-Alpes, Lyon, France
| | - Jean-Luc Berland
- Fondation Mérieux, Emerging Pathogens Laboratory, Rhône-Alpes, Lyon, France
| | - Astrid Engelmann
- Centre Hospitalier Fleyriat, Rhône-Alpes, Bourg-en-Bresse, France
| | - Isabelle Verdier
- Centre Hospitalier Fleyriat, Rhône-Alpes, Bourg-en-Bresse, France
| | - Gérard Lina
- CIRI - Centre International de Recherche en Infectiologie, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon-1, Rhône-Alpes, Lyon, France; Hospices Civils de Lyon, Institut des Agents Infectieux, Laboratoire de bactériologie, Rhône-Alpes, Lyon, France; Université Lyon 1, Facultés de Médecine et de Pharmacie de Lyon, Rhône-Alpes, Lyon, France
| | - Florence Ader
- CIRI - Centre International de Recherche en Infectiologie, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon-1, Rhône-Alpes, Lyon, France; Hospices Civils de Lyon, Service des Maladies infectieuses et tropicales, Rhône-Alpes, Lyon, France
| | - Stéphane Dray
- Biometrics and Evolutionary Biology Laboratory, CNRS UMR 5558, Université Lyon 1, Rhône-Alpes, Villeurbanne, France
| | - Laurent Jacob
- Biometrics and Evolutionary Biology Laboratory, CNRS UMR 5558, Université Lyon 1, Rhône-Alpes, Villeurbanne, France
| | - François Massol
- UMR 8198 Evo-Eco-Paleo, SPICI Group, University of Lille, Hauts-de-France, Lille, France; CNRS, CHU Lille, Institut Pasteur de Lille, U1019-UMR 9017-CIIL-Center for Infection and Immunity of Lille, University of Lille, Hauts-de-France, Lille, France
| | - Samuel Venner
- Biometrics and Evolutionary Biology Laboratory, CNRS UMR 5558, Université Lyon 1, Rhône-Alpes, Villeurbanne, France
| | - Oana Dumitrescu
- CIRI - Centre International de Recherche en Infectiologie, Ecole Normale Supérieure de Lyon, Université Claude Bernard Lyon-1, Rhône-Alpes, Lyon, France; Hospices Civils de Lyon, Institut des Agents Infectieux, Laboratoire de bactériologie, Rhône-Alpes, Lyon, France; Université Lyon 1, Facultés de Médecine et de Pharmacie de Lyon, Rhône-Alpes, Lyon, France
| | | |
Collapse
|
33
|
Czech L, Exposito-Alonso M. grenepipe: a flexible, scalable and reproducible pipeline to automate variant calling from sequence reads. Bioinformatics 2022; 38:4809-4811. [PMID: 36053180 PMCID: PMC10424805 DOI: 10.1093/bioinformatics/btac600] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 07/27/2022] [Accepted: 09/05/2022] [Indexed: 11/14/2022] Open
Abstract
SUMMARY We developed grenepipe, an all-in-one Snakemake workflow to streamline the data processing from raw high-throughput sequencing data of individuals or populations to genotype variant calls. Our pipeline offers a range of popular software tools within a single configuration file, automatically installs software dependencies, is highly optimized for scalability in cluster environments and runs with a single command. AVAILABILITY AND IMPLEMENTATION grenepipe is published under the GPLv3 and freely available at github.com/moiexpositoalonsolab/grenepipe.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Moises Exposito-Alonso
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
- Department of Global Ecology, Carnegie Institution for Science, Stanford, CA 94305, USA
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
34
|
Neoantigens in precision cancer immunotherapy: from identification to clinical applications. Chin Med J (Engl) 2022; 135:1285-1298. [PMID: 35838545 PMCID: PMC9433083 DOI: 10.1097/cm9.0000000000002181] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Immunotherapies targeting cancer neoantigens are safe, effective, and precise. Neoantigens can be identified mainly by genomic techniques such as next-generation sequencing and high-throughput single-cell sequencing; proteomic techniques such as mass spectrometry; and bioinformatics tools based on high-throughput sequencing data, mass spectrometry data, and biological databases. Neoantigen-related therapies are widely used in clinical practice and include neoantigen vaccines, neoantigen-specific CD8+ and CD4+ T cells, and neoantigen-pulsed dendritic cells. In addition, neoantigens can be used as biomarkers to assess immunotherapy response, resistance, and prognosis. Therapies based on neoantigens are an important and promising branch of cancer immunotherapy. Unremitting efforts are needed to unravel the comprehensive role of neoantigens in anti-tumor immunity and to extend their clinical application. This review aimed to summarize the progress in neoantigen research and to discuss its opportunities and challenges in precision cancer immunotherapy.
Collapse
|
35
|
Long Q, Yuan Y, Li M. RNA-SSNV: A Reliable Somatic Single Nucleotide Variant Identification Framework for Bulk RNA-Seq Data. Front Genet 2022; 13:865313. [PMID: 35846154 PMCID: PMC9279659 DOI: 10.3389/fgene.2022.865313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2022] [Accepted: 05/17/2022] [Indexed: 11/13/2022] Open
Abstract
The usage of expressed somatic mutations may have a unique advantage in identifying active cancer driver mutations. However, accurately calling mutations from RNA-seq data is difficult due to confounding factors such as RNA-editing, reverse transcription, and gap alignment. In the present study, we proposed a framework (named RNA-SSNV, https://github.com/pmglab/RNA-SSNV) to call somatic single nucleotide variants (SSNV) from tumor bulk RNA-seq data. Based on a comprehensive multi-filtering strategy and a machine-learning classification model trained with comprehensively curated features, RNA-SSNV achieved the best precision–recall rate (0.880–0.884) in a testing dataset and robustly retained 0.94 AUC for the precision–recall curve in three validation adult-based TCGA (The Cancer Genome Atlas) datasets. We further showed that the somatic mutations called by RNA-SSNV tended to have a higher functional impact and therapeutic power in known driver genes. Furthermore, VAF (variant allele fraction) analysis revealed that subclonal harboring expressed mutations had evolutional selection advantage and RNA had higher detection power to rescue DNA-omitted mutations. In sum, RNA-SSNV will be a useful approach to accurately call expressed somatic mutations for a more insightful analysis of cancer drive genes and carcinogenic mechanisms.
Collapse
Affiliation(s)
- Qihan Long
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
| | - Yangyang Yuan
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
| | - Miaoxin Li
- Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Precision Medicine, Sun Yat-Sen University, Guangzhou, China
- Center for Disease Genome Research, Sun Yat-Sen University, Guangzhou, China
- Guangdong Provincial Key Laboratory of Biomedical Imaging and Guangdong Provincial Engineering Research Center of Molecular Imaging, The Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
- Key Laboratory of Tropical Disease Control (SYSU), Ministry of Education, Guangzhou, China
- *Correspondence: Miaoxin Li,
| |
Collapse
|
36
|
Zhang L, Zhou X, Sha H, Xie L, Liu B. Recent Progress on Therapeutic Vaccines for Breast Cancer. Front Oncol 2022; 12:905832. [PMID: 35734599 PMCID: PMC9207208 DOI: 10.3389/fonc.2022.905832] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 05/11/2022] [Indexed: 11/13/2022] Open
Abstract
Breast cancer remains the most frequently diagnosed malignancy worldwide. Advanced breast cancer is still an incurable disease mainly because of its heterogeneity and limited immunogenicity. The great success of cancer immunotherapy is paving the way for a new era in cancer treatment, and therapeutic cancer vaccination is an area of interest. Vaccine targets include tumor-associated antigens and tumor-specific antigens. Immune responses differ in different vaccine delivery platforms. Next-generation sequencing technologies and computational analysis have recently made personalized vaccination possible. However, only a few cases benefiting from neoantigen-based treatment have been reported in breast cancer, and more attention has been given to overexpressed antigen-based treatment, especially human epidermal growth factor 2-derived peptide vaccines. Here, we discuss recent advancements in therapeutic vaccines for breast cancer and highlight near-term opportunities for moving forward.
Collapse
Affiliation(s)
- Lianru Zhang
- The Comprehensive Cancer Centre of Drum Tower Hospital, Medical School of Nanjing University & Clinical Cancer Institute of Nanjing University, Nanjing, China
| | - Xipeng Zhou
- Department of oncology, Yizheng People's Hospital, Yangzhou, China
| | - Huizi Sha
- The Comprehensive Cancer Centre of Drum Tower Hospital, Medical School of Nanjing University & Clinical Cancer Institute of Nanjing University, Nanjing, China
| | - Li Xie
- The Comprehensive Cancer Centre of Drum Tower Hospital, Medical School of Nanjing University & Clinical Cancer Institute of Nanjing University, Nanjing, China
| | - Baorui Liu
- The Comprehensive Cancer Centre of Drum Tower Hospital, Medical School of Nanjing University & Clinical Cancer Institute of Nanjing University, Nanjing, China
| |
Collapse
|
37
|
Quazi S. Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol 2022; 39:120. [PMID: 35704152 PMCID: PMC9198206 DOI: 10.1007/s12032-022-01711-1] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 03/14/2022] [Indexed: 10/28/2022]
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
38
|
Abstract
The advancement of precision medicine in medical care has led behind the conventional symptom-driven treatment process by allowing early risk prediction of disease through improved diagnostics and customization of more effective treatments. It is necessary to scrutinize overall patient data alongside broad factors to observe and differentiate between ill and relatively healthy people to take the most appropriate path toward precision medicine, resulting in an improved vision of biological indicators that can signal health changes. Precision and genomic medicine combined with artificial intelligence have the potential to improve patient healthcare. Patients with less common therapeutic responses or unique healthcare demands are using genomic medicine technologies. AI provides insights through advanced computation and inference, enabling the system to reason and learn while enhancing physician decision making. Many cell characteristics, including gene up-regulation, proteins binding to nucleic acids, and splicing, can be measured at high throughput and used as training objectives for predictive models. Researchers can create a new era of effective genomic medicine with the improved availability of a broad range of datasets and modern computer techniques such as machine learning. This review article has elucidated the contributions of ML algorithms in precision and genome medicine.
Collapse
Affiliation(s)
- Sameer Quazi
- GenLab Biosolutions Private Limited, Bangalore, Karnataka, 560043, India.
- Department of Biomedical Sciences, School of Life Sciences, Anglia Ruskin University, Cambridge, UK.
| |
Collapse
|
39
|
Dodani DD, Nguyen MH, Morin RD, Marra MA, Corbett RD. Combinatorial and Machine Learning Approaches for Improved Somatic Variant Calling From Formalin-Fixed Paraffin-Embedded Genome Sequence Data. Front Genet 2022; 13:834764. [PMID: 35571031 PMCID: PMC9092826 DOI: 10.3389/fgene.2022.834764] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 03/18/2022] [Indexed: 11/13/2022] Open
Abstract
Formalin fixation of paraffin-embedded tissue samples is a well-established method for preserving tissue and is routinely used in clinical settings. Although formalin-fixed, paraffin-embedded (FFPE) tissues are deemed crucial for research and clinical applications, the fixation process results in molecular damage to nucleic acids, thus confounding their use in genome sequence analysis. Methods to improve genomic data quality from FFPE tissues have emerged, but there remains significant room for improvement. Here, we use whole-genome sequencing (WGS) data from matched Fresh Frozen (FF) and FFPE tissue samples to optimize a sensitive and precise FFPE single nucleotide variant (SNV) calling approach. We present methods to reduce the prevalence of false-positive SNVs by applying combinatorial techniques to five publicly available variant callers. We also introduce FFPolish, a novel variant classification method that efficiently classifies FFPE-specific false-positive variants. Our combinatorial and statistical techniques improve precision and F1 scores compared to the results of publicly available tools when tested individually.
Collapse
Affiliation(s)
- Dollina D Dodani
- The Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
| | - Matthew H Nguyen
- The Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
| | - Ryan D Morin
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada.,Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - Marco A Marra
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada.,Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Richard D Corbett
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Research Institute, Provincial Health Services Authority, Vancouver, BC, Canada
| |
Collapse
|
40
|
Garcia-Prieto CA, Martínez-Jiménez F, Valencia A, Porta-Pardo E. Detection of oncogenic and clinically actionable mutations in cancer genomes critically depends on variant calling tools. Bioinformatics 2022; 38:3181-3191. [PMID: 35512388 PMCID: PMC9191211 DOI: 10.1093/bioinformatics/btac306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 02/09/2022] [Accepted: 05/01/2022] [Indexed: 11/22/2022] Open
Abstract
Motivation The analysis of cancer genomes provides fundamental information about its etiology, the processes driving cell transformation or potential treatments. While researchers and clinicians are often only interested in the identification of oncogenic mutations, actionable variants or mutational signatures, the first crucial step in the analysis of any tumor genome is the identification of somatic variants in cancer cells (i.e. those that have been acquired during their evolution). For that purpose, a wide range of computational tools have been developed in recent years to detect somatic mutations in sequencing data from tumor samples. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Results Here, we quantify the impact of variant calling decisions by comparing the results obtained in three important analyses of cancer genomics data (identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants) when changing the somatic variant caller (MuSE, MuTect2, SomaticSniper and VarScan2) or the strategy to combine them (Consensus of two, Consensus of three and Union) across all 33 cancer types from The Cancer Genome Atlas. Our results show that variant calling decisions have a significant impact on these analyses, creating important differences that could even impact treatment decisions for some patients. Moreover, the Consensus of three calling strategy to combine the output of multiple variant calling tools, a very widely used strategy by the research community, can lead to the loss of some cancer driver genes and actionable mutations. Overall, our results highlight the limitations of widespread practices within the cancer genomics community and point to important differences in critical analyses of tumor sequencing data depending on variant calling, affecting even the identification of clinically actionable variants. Availability and implementation Code is available at https://github.com/carlosgarciaprieto/VariantCallingClinicalBenchmark. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos A Garcia-Prieto
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Francisco Martínez-Jiménez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Alfonso Valencia
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Eduard Porta-Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| |
Collapse
|
41
|
Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Collapse
|
42
|
Lobon I, Solís-Moruno M, Juan D, Muhaisen A, Abascal F, Esteller-Cucala P, García-Pérez R, Martí MJ, Tolosa E, Ávila J, Rahbari R, Marques-Bonet T, Casals F, Soriano E. Somatic Mutations Detected in Parkinson Disease Could Affect Genes With a Role in Synaptic and Neuronal Processes. FRONTIERS IN AGING 2022; 3:851039. [PMID: 35821807 PMCID: PMC9261316 DOI: 10.3389/fragi.2022.851039] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 03/16/2022] [Indexed: 12/17/2022]
Abstract
The role of somatic mutations in complex diseases, including neurodevelopmental and neurodegenerative disorders, is becoming increasingly clear. However, to date, no study has shown their relation to Parkinson disease’s phenotype. To explore the relevance of embryonic somatic mutations in sporadic Parkinson disease, we performed whole-exome sequencing in blood and four brain regions of ten patients. We identified 59 candidate somatic single nucleotide variants (sSNVs) through sensitive calling and a careful filtering strategy (COSMOS). We validated 27 of them with amplicon-based ultra-deep sequencing, with a 70% validation rate for the highest-confidence variants. The identified sSNVs are in genes with synaptic functions that are co-expressed with genes previously associated with Parkinson disease. Most of the sSNVs were only called in blood but were also found in the brain tissues with ultra-deep amplicon sequencing, demonstrating the strength of multi-tissue sampling designs.
Collapse
Affiliation(s)
- Irene Lobon
- Institute of Evolutionary Biology (UPF-CSIC), Barcelona, Spain
- *Correspondence: Irene Lobon, ; Eduardo Soriano,
| | - Manuel Solís-Moruno
- Institute of Evolutionary Biology (UPF-CSIC), Barcelona, Spain
- Genomics Core Facility, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
| | - David Juan
- Institute of Evolutionary Biology (UPF-CSIC), Barcelona, Spain
| | - Ashraf Muhaisen
- Department of Cell Biology, Physiology and Immunology and Institute of Neurosciences, Universitat de Barcelona (UB), Barcelona, Spain
- Centre for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Federico Abascal
- Cancer, Ageing, and Somatic Mutation (CASM), Wellcome Sanger Institute, Cambridge, United Kingdom
| | | | | | - Maria Josep Martí
- Centre for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Madrid, Spain
- Department of Neurology, Hospital Clínic de Barcelona, Institut d’Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), University of Barcelona (UB), Barcelona, Spain
| | - Eduardo Tolosa
- Centre for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Madrid, Spain
- Department of Neurology, Hospital Clínic de Barcelona, Institut d’Investigacions Biomédiques August Pi i Sunyer (IDIBAPS), University of Barcelona (UB), Barcelona, Spain
| | - Jesús Ávila
- Centre for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Madrid, Spain
- Centro de Biología Molecular Severo Ochoa, Madrid, Spain
| | - Raheleh Rahbari
- Cancer, Ageing, and Somatic Mutation (CASM), Wellcome Sanger Institute, Cambridge, United Kingdom
| | - Tomas Marques-Bonet
- Institute of Evolutionary Biology (UPF-CSIC), Barcelona, Spain
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Ferran Casals
- Genomics Core Facility, Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Spain
- Departament de Genètica, Microbiologia i Estadística, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Eduardo Soriano
- Department of Cell Biology, Physiology and Immunology and Institute of Neurosciences, Universitat de Barcelona (UB), Barcelona, Spain
- Centre for Networked Biomedical Research on Neurodegenerative Diseases (CIBERNED), Madrid, Spain
- *Correspondence: Irene Lobon, ; Eduardo Soriano,
| |
Collapse
|
43
|
Network Approaches for Charting the Transcriptomic and Epigenetic Landscape of the Developmental Origins of Health and Disease. Genes (Basel) 2022; 13:genes13050764. [PMID: 35627149 PMCID: PMC9141211 DOI: 10.3390/genes13050764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 02/04/2023] Open
Abstract
The early developmental phase is of critical importance for human health and disease later in life. To decipher the molecular mechanisms at play, current biomedical research is increasingly relying on large quantities of diverse omics data. The integration and interpretation of the different datasets pose a critical challenge towards the holistic understanding of the complex biological processes that are involved in early development. In this review, we outline the major transcriptomic and epigenetic processes and the respective datasets that are most relevant for studying the periconceptional period. We cover both basic data processing and analysis steps, as well as more advanced data integration methods. A particular focus is given to network-based methods. Finally, we review the medical applications of such integrative analyses.
Collapse
|
44
|
Wang D, Zhang Y, li R, Li J, Zhang R. Consistency and reproducibility of large panel next-generation sequencing: Multi-laboratory assessment of somatic mutation detection on reference materials with mismatch repair and proofreading deficiency. J Adv Res 2022; 44:161-172. [PMID: 36725187 PMCID: PMC9937796 DOI: 10.1016/j.jare.2022.03.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 03/16/2022] [Accepted: 03/27/2022] [Indexed: 02/04/2023] Open
Abstract
INTRODUCTION Clinical precision oncology increasingly relies on accurate genome-wide profiling using large panel next generation sequencing; however, difficulties in accurate and consistent detection of somatic mutation from individual platforms and pipelines remain an open question. OBJECTIVES To obtain paired tumor-normal reference materials that can be effectively constructed and interchangeable with clinical samples, and evaluate the performance of 56 panels under routine testing conditions based on the reference samples. METHODS Genes involved in mismatch repair and DNA proofreading were knocked down using the CRISPR-Cas9 technology to accumulate somatic mutations in a defined GM12878 cell line. They were used as reference materials to comprehensively evaluate the reproducibility and accuracy of detection results of oncopanels and explore the potential influencing factors. RESULTS In total, 14 paired tumor-normal reference DNA samples from engineered cell lines were prepared, and a reference dataset comprising 168 somatic mutations in a high-confidence region of 1.8 Mb were generated. For mutations with an allele frequency (AF) of more than 5% in reference samples, 56 panels collectively reported 1306 errors, including 729 false negatives (FNs), 179 false positives (FPs) and 398 reproducibility errors. The performance metric varied among panels with precision and recall ranging from 0.773 to 1 and 0.683 to 1, respectively. Incorrect and inadequate filtering accounted for a large proportion of false discovery (including FNs and FPs), while low-quality detection, cross-contamination and other sequencing errors during the wet bench process were other sources of FNs and FPs. In addition, low AF (<5%) considerably influenced the reproducibility and comparability among panels. CONCLUSIONS This study provided an integrated practice for developing reference standard to assess oncopanels in detecting somatic mutations and quantitatively revealed the source of detection errors. It will promote optimization, validation, and quality control among laboratories with potential applicability in clinical use.
Collapse
Affiliation(s)
- Duo Wang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Yuanfeng Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Rui li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China,Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China,Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China
| | - Jinming Li
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| | - Rui Zhang
- National Center for Clinical Laboratories, Institute of Geriatric Medicine, Chinese Academy of Medical Sciences, Beijing Hospital/National Center of Gerontology, P. R. China; Graduate School of Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, P. R. China; Beijing Engineering Research Center of Laboratory Medicine, Beijing Hospital, Beijing, P. R. China.
| |
Collapse
|
45
|
Early Breast Cancer Evolution by Autosomal Broad Copy Number Alterations. Int J Genomics 2022; 2022:9332922. [PMID: 35252434 PMCID: PMC8896957 DOI: 10.1155/2022/9332922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 02/08/2022] [Indexed: 12/13/2022] Open
Abstract
The availability of comprehensive genomic datasets across patient populations enables the application of novel methods for reconstructing tumor evolution within individual patients. To this end, we propose studying autosomal broad copy number alterations (CNAs) as a framework to better understand early tumor evolution. We compared the broad CNAs and somatic mutations of patients with 1 to 10 autosomal broad CNAs against the full set of patients, using data from The Cancer Genome Atlas breast cancer project. We reveal here that the frequency of a chromosome arm obtaining a broad CNA and a genome acquiring somatic mutations changes as autosomal broad CNAs accumulate. Therefore, we propose that the number of autosomal broad CNAs is an important characteristic of breast tumors that needs to be taken into consideration when studying breast tumors. To investigate this idea more in-depth, we next studied the frequency that specific chromosome arms acquire broad CNAs in patients with 1 to 10 broad CNAs. With this process, we identified the broad CNAs that exhibit the fastest rates of accumulation across all patients. This finding suggests a likely order of occurrence of these alterations in patients, which is apparent when we consider a subset of patients with few broad CNAs. Here, we lay the foundation for future studies to build upon our findings and use autosomal broad CNAs as a method to monitor breast tumor progression in vivo to further our understanding of how early tumor evolution unfolds.
Collapse
|
46
|
Kaewprasert O, Tongsima S, Ong RTH, Faksri K. Optimized analysis parameters of variant calling for whole genome-based phylogeny of Mycobacteroides abscessus. Arch Microbiol 2022; 204:190. [PMID: 35194683 DOI: 10.1007/s00203-022-02792-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 01/17/2022] [Accepted: 02/04/2022] [Indexed: 02/05/2023]
Abstract
Whole-genome sequence (WGS) analysis provides the best resolution for reconstructing bacterial phylogeny. However, the resulting tree could vary according to parameters used in the WGS pipeline, making it difficult to compare results across multiple studies. This study compares effects on phylogenies when applying different parameter stringencies. We used as the study model to optimize parameters strains of Mycobacteroides abscessus serially isolated at various intervals, isolates known to represent persistent infection (PI) cases or re-infection (RI) cases and isolates from different subspecies. Un-optimized parameters with low stringency provided an excessive number of SNPs (823) compared to the optimized setting (3 SNPs) between paired strains isolated 1 day apart from PI cases, discordant tree topology and misclassification of subspecies and of instances of RI. We demonstrated that using high-quality variants provides more accuracy for recognizing serial isolates of the same clone versus different clones and for phylogenetic analysis of M. abscessus. Our approach might be used as a model for analyses requiring phylogenetic reconstruction of other bacteria.
Collapse
Affiliation(s)
- Orawee Kaewprasert
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, 40002, Thailand
- Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand
| | - Sissades Tongsima
- National Biobank of Thailand, National Science and Technology Development Agency, Khlong Luang, Pathum Thani, Thailand
- National Center for Genetics Engineering and Biotechnology, National Science and Technology Development Agency, Khlong Luang, Pathum Thani, Thailand
| | - Rick Twee-Hee Ong
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Kiatichai Faksri
- Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, 40002, Thailand.
- Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand.
| |
Collapse
|
47
|
Medeiros JJF, Capo-Chichi JM, Shlush LI, Dick JE, Arruda A, Minden MD, Abelson S. SmMIP-tools: a computational toolset for processing and analysis of single-molecule molecular inversion probes-derived data. Bioinformatics 2022; 38:2088-2095. [PMID: 35150236 PMCID: PMC9004652 DOI: 10.1093/bioinformatics/btac081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Revised: 01/13/2022] [Accepted: 02/07/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Single-molecule molecular inversion probes (smMIPs) provide an exceptionally cost-effective and modular approach for routine or large-cohort next-generation sequencing. However, processing the derived raw data to generate highly accurate variants calls remains challenging. RESULTS We introduce SmMIP-tools, a comprehensive computational method that promotes the detection of single nucleotide variants and short insertions and deletions from smMIP-based sequencing. Our approach delivered near-perfect performance when benchmarked against a set of known mutations in controlled experiments involving DNA dilutions and outperformed other commonly used computational methods for mutation detection. Comparison against clinically approved diagnostic testing of leukaemia patients demonstrated the ability to detect both previously reported variants and a set of pathogenic mutations that did not pass detection by clinical testing. Collectively, our results indicate that increased performance can be achieved when tailoring data processing and analysis to its related technology. The feasibility of using our method in research and clinical settings to benefit from low-cost smMIP technology is demonstrated. AVAILABILITY AND IMPLEMENTATION The source code for SmMIP-tools, its manual and additional scripts aimed to foster large-scale data processing and analysis are all available on github (https://github.com/abelson-lab/smMIP-tools). Raw sequencing data generated in this study have been submitted to the European Genome-Phenome Archive (EGA; https://ega-archive.org) and can be accessed under accession number EGAS00001005359. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jessie J F Medeiros
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada,Ontario Institute for Cancer Research, Toronto, ON, Canada,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Jose-Mario Capo-Chichi
- Genome Diagnostics, Department of Clinical Laboratory Genetics, University Health Network, Toronto, ON, Canada
| | - Liran I Shlush
- Department of Immunology, Weizmann Institute of Science, Rehovot, Israel
| | - John E Dick
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada,Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
| | - Andrea Arruda
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada
| | - Mark D Minden
- Princess Margaret Cancer Centre, University Health Network (UHN), Toronto, ON, Canada,Department of Hematology and Medical Oncology, University Health Network, Toronto, ON, Canada
| | | |
Collapse
|
48
|
Clonal and subclonal TP53 molecular impairment is associated with prognosis and progression in multiple myeloma. Blood Cancer J 2022; 12:15. [PMID: 35082295 PMCID: PMC8791929 DOI: 10.1038/s41408-022-00610-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Revised: 09/14/2021] [Accepted: 11/11/2021] [Indexed: 02/06/2023] Open
Abstract
Aberrations on TP53, either as deletions of chromosome 17p (del17p) or mutations, are associated with poor outcome in multiple myeloma (MM), but conventional detection methods currently in use underestimate their incidence, hindering an optimal risk assessment and prognostication of MM patients. We have investigated the altered status of TP53 gene by SNPs array and sequencing techniques in a homogenous cohort of 143 newly diagnosed MM patients, evaluated both at diagnosis and at first relapse: single-hit on TP53 gene, either deletion or mutation, detected both at clonal and sub-clonal level, had a minor effect on outcomes. Conversely, the coexistence of both TP53 deletion and mutation, which defined the so-called double-hit patients, was associated with the worst clinical outcome (PFS: HR 3.34 [95% CI: 1.37–8.12] p = 0.008; OS: HR 3.47 [95% CI: 1.18–10.24] p = 0.02). Moreover, the analysis of longitudinal samples pointed out that TP53 allelic status might increase during the disease course. Notably, the acquisition of TP53 alterations at relapse dramatically worsened the clinical course of patients. Overall, our analyses showed these techniques to be highly sensitive to identify TP53 aberrations at sub-clonal level, emphasizing the poor prognosis associated with double-hit MM patients.
Collapse
|
49
|
Decap D, de Schaetzen van Brienen L, Larmuseau M, Costanza P, Herzeel C, Wuyts R, Marchal K, Fostier J. Halvade somatic: Somatic variant calling with Apache Spark. Gigascience 2022; 11:6505120. [PMID: 35022699 PMCID: PMC8756192 DOI: 10.1093/gigascience/giab094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 10/27/2021] [Accepted: 12/09/2021] [Indexed: 12/02/2022] Open
Abstract
Background The accurate detection of somatic variants from sequencing data is of key importance for cancer treatment and research. Somatic variant calling requires a high sequencing depth of the tumor sample, especially when the detection of low-frequency variants is also desired. In turn, this leads to large volumes of raw sequencing data to process and hence, large computational requirements. For example, calling the somatic variants according to the GATK best practices guidelines requires days of computing time for a typical whole-genome sequencing sample. Findings We introduce Halvade Somatic, a framework for somatic variant calling from DNA sequencing data that takes advantage of multi-node and/or multi-core compute platforms to reduce runtime. It relies on Apache Spark to provide scalable I/O and to create and manage data streams that are processed on different CPU cores in parallel. Halvade Somatic contains all required steps to process the tumor and matched normal sample according to the GATK best practices recommendations: read alignment (BWA), sorting of reads, preprocessing steps such as marking duplicate reads and base quality score recalibration (GATK), and, finally, calling the somatic variants (Mutect2). Our approach reduces the runtime on a single 36-core node to 19.5 h compared to a runtime of 84.5 h for the original pipeline, a speedup of 4.3 times. Runtime can be further decreased by scaling to multiple nodes, e.g., we observe a runtime of 1.36 h using 16 nodes, an additional speedup of 14.4 times. Halvade Somatic supports variant calling from both whole-genome sequencing and whole-exome sequencing data and also supports Strelka2 as an alternative or complementary variant calling tool. We provide a Docker image to facilitate single-node deployment. Halvade Somatic can be executed on a variety of compute platforms, including Amazon EC2 and Google Cloud. Conclusions To our knowledge, Halvade Somatic is the first somatic variant calling pipeline that leverages Big Data processing platforms and provides reliable, scalable performance. Source code is freely available.
Collapse
Affiliation(s)
- Dries Decap
- IDLab, Ghent University - imec, Technologiepark 126, B-9052 Ghent, Belgium
| | | | - Maarten Larmuseau
- IDLab, Ghent University - imec, Technologiepark 126, B-9052 Ghent, Belgium
| | | | | | - Roel Wuyts
- imec, Kapeldreef 75, B-3001 Leuven, Belgium
| | - Kathleen Marchal
- IDLab, Ghent University - imec, Technologiepark 126, B-9052 Ghent, Belgium
| | - Jan Fostier
- IDLab, Ghent University - imec, Technologiepark 126, B-9052 Ghent, Belgium
| |
Collapse
|
50
|
Sahraeian SME, Fang LT, Karagiannis K, Moos M, Smith S, Santana-Quintero L, Xiao C, Colgan M, Hong H, Mohiyuddin M, Xiao W. Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. Genome Biol 2022; 23:12. [PMID: 34996510 PMCID: PMC8740374 DOI: 10.1186/s13059-021-02592-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. RESULTS In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. CONCLUSIONS The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.
Collapse
Affiliation(s)
| | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA, 95050, USA
| | - Konstantinos Karagiannis
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Malcolm Moos
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Sean Smith
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Luis Santana-Quintero
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Michael Colgan
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Huixiao Hong
- Bioinformatics branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR, 72079, USA
| | | | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA.
| |
Collapse
|