1
|
Pokorna P, Palova H, Adamcova S, Jugas R, Al Tukmachi D, Kyr M, Knoflickova D, Kozelkova K, Bystry V, Mejstrikova S, Merta T, Trachtova K, Podlipna E, Mudry P, Pavelka Z, Bajciova V, Tinka P, Jarosova M, Ivkovic TC, Madlener S, Pal K, Stepien N, Mayr L, Tichy B, Drabova K, Jezova M, Kozakova S, Vanackova J, Radova L, Steininger K, Haberler C, Gojo J, Sterba J, Slaby O. Real-world performance of integrative clinical genomics in pediatric precision oncology. J Transl Med 2024:102161. [PMID: 39442669 DOI: 10.1016/j.labinv.2024.102161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Revised: 09/16/2024] [Accepted: 10/15/2024] [Indexed: 10/25/2024] Open
Abstract
Despite significant improvement in the survival of pediatric cancer patients, treatment outcomes for high-risk, relapsed, and refractory cancers remain unsatisfactory. Moreover, prolonged survival is frequently associated with long-term adverse effects due to intensive multimodal treatments. Accelerating the progress of pediatric oncology requires both therapeutic advances and strategies to mitigate the long-term cytotoxic side effects, potentially through targeting specific molecular drivers of pediatric malignancies. In this report, we present the results of integrative genomic and transcriptomic profiling of 230 patients with malignant solid tumors (the "primary cohort") and 18 patients with recurrent or otherwise difficult-to-treat nonmalignant conditions (the "secondary cohort"). The integrative workflow for the primary cohort enabled the identification of clinically significant single-nucleotide variants, small insertions/deletions, and fusion genes, which were found in 55% and 28% of patients, respectively. For 38% of patients, molecularly informed treatment recommendations were made. In the secondary cohort, known or potentially driving alteration was detected in 89% of cases, including a suspected novel causal gene for patients with inclusion body infantile digital fibromatosis. Furthermore, 47% of findings also brought therapeutic implications for subsequent management. Across both cohorts, changes or refinements to the original histopathological diagnoses were achieved in 4% of cases. Our study demonstrates the efficacy of integrating advanced genomic and transcriptomic analyses to identify therapeutic targets, refine diagnoses, and optimize treatment strategies for challenging pediatric and young adult malignancies and underscores the need for broad implementation of precision oncology in clinical settings.
Collapse
Affiliation(s)
- Petra Pokorna
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic; Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic; Center for Precision Medicine, University Hospital Brno, Brno, Czech Republic
| | - Hana Palova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Sona Adamcova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Robin Jugas
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Dagmar Al Tukmachi
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic; Department of Biochemistry, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Michal Kyr
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Dana Knoflickova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Katerina Kozelkova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Vojtech Bystry
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Sona Mejstrikova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Internal Medicine, Hematology and Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Tomas Merta
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Karolina Trachtova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Eliska Podlipna
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Peter Mudry
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Zdenek Pavelka
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Viera Bajciova
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Pavel Tinka
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Marie Jarosova
- Department of Internal Medicine, Hematology and Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Tina Catela Ivkovic
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Sibylle Madlener
- Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics and Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
| | - Karol Pal
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Natalia Stepien
- Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics and Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
| | - Lisa Mayr
- Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics and Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
| | - Boris Tichy
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Klara Drabova
- Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Marta Jezova
- Department of Pathology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic
| | - Sarka Kozakova
- Department of Pharmacology, Faculty of Medicine, Masaryk University, Brno, Czech Republic; Department of Pharmacy, University Hospital Brno, Brno, Czech Republic
| | - Jitka Vanackova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Lenka Radova
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Karin Steininger
- Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics and Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
| | - Christine Haberler
- Division of Neuropathology and Neurochemistry, Department of Neurology, Medical University of Vienna, Vienna, Austria
| | - Johannes Gojo
- Department of Pediatrics and Adolescent Medicine, Comprehensive Center for Pediatrics and Comprehensive Cancer Center, Medical University of Vienna, Vienna, Austria
| | - Jaroslav Sterba
- Center for Precision Medicine, University Hospital Brno, Brno, Czech Republic; Department of Pediatric Oncology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic.
| | - Ondrej Slaby
- Central European Institute of Technology, Masaryk University, Brno, Czech Republic; Department of Biology, Faculty of Medicine, Masaryk University, Brno, Czech Republic; Center for Precision Medicine, University Hospital Brno, Brno, Czech Republic; Department of Pathology, University Hospital Brno and Faculty of Medicine, Masaryk University, Brno, Czech Republic.
| |
Collapse
|
2
|
Maruzani R, Brierley L, Jorgensen A, Fowler A. Benchmarking UMI-aware and standard variant callers for low frequency ctDNA variant detection. BMC Genomics 2024; 25:827. [PMID: 39227777 PMCID: PMC11370058 DOI: 10.1186/s12864-024-10737-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 08/22/2024] [Indexed: 09/05/2024] Open
Abstract
BACKGROUND Circulating tumour DNA (ctDNA) is a subset of cell free DNA (cfDNA) released by tumour cells into the bloodstream. Circulating tumour DNA has shown great potential as a biomarker to inform treatment in cancer patients. Collecting ctDNA is minimally invasive and reflects the entire genetic makeup of a patient's cancer. ctDNA variants in NGS data can be difficult to distinguish from sequencing and PCR artefacts due to low abundance, particularly in the early stages of cancer. Unique Molecular Identifiers (UMIs) are short sequences ligated to the sequencing library before amplification. These sequences are useful for filtering out low frequency artefacts. The utility of ctDNA as a cancer biomarker depends on accurate detection of cancer variants. RESULTS In this study, we benchmarked six variant calling tools, including two UMI-aware callers for their ability to call ctDNA variants. The standard variant callers tested included Mutect2, bcftools, LoFreq and FreeBayes. The UMI-aware variant callers benchmarked were UMI-VarCal and UMIErrorCorrect. We used both datasets with known variants spiked in at low frequencies, and datasets containing ctDNA, and generated synthetic UMI sequences for these datasets. Variant callers displayed different preferences for sensitivity and specificity. Mutect2 showed high sensitivity, while returning more privately called variants than any other caller in data without synthetic UMIs - an indicator of false positive variant discovery. In data encoded with synthetic UMIs, UMI-VarCal detected fewer putative false positive variants than all other callers in synthetic datasets. Mutect2 showed a balance between high sensitivity and specificity in data encoded with synthetic UMIs. CONCLUSIONS Our results indicate UMI-aware variant callers have potential to improve sensitivity and specificity in calling low frequency ctDNA variants over standard variant calling tools. There is a growing need for further development of UMI-aware variant calling tools if effective early detection methods for cancer using ctDNA samples are to be realised.
Collapse
Affiliation(s)
- Rugare Maruzani
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK.
| | - Liam Brierley
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
- MRC-University of Glasgow Centre for Virus Research, University of Glasgow, Garscube Campus, 464 Bearsden Road, Glasgow, G61 1QH, UK
| | - Andrea Jorgensen
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
| | - Anna Fowler
- Department of Health Data Science, Institute of Population Health, University of Liverpool, Waterhouse Building, Block F, Brownlow Street, Liverpool, L69 3GF, UK
| |
Collapse
|
3
|
Atzeni R, Massidda M, Pieroni E, Rallo V, Pisu M, Angius A. A Novel Affordable and Reliable Framework for Accurate Detection and Comprehensive Analysis of Somatic Mutations in Cancer. Int J Mol Sci 2024; 25:8044. [PMID: 39125613 PMCID: PMC11311285 DOI: 10.3390/ijms25158044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/11/2024] [Accepted: 07/22/2024] [Indexed: 08/12/2024] Open
Abstract
Accurate detection and analysis of somatic variants in cancer involve multiple third-party tools with complex dependencies and configurations, leading to laborious, error-prone, and time-consuming data conversions. This approach lacks accuracy, reproducibility, and portability, limiting clinical application. Musta was developed to address these issues as an end-to-end pipeline for detecting, classifying, and interpreting cancer mutations. Musta is based on a Python command-line tool designed to manage tumor-normal samples for precise somatic mutation analysis. The core is a Snakemake-based workflow that covers all key cancer genomics steps, including variant calling, mutational signature deconvolution, variant annotation, driver gene detection, pathway analysis, and tumor heterogeneity estimation. Musta is easy to install on any system via Docker, with a Makefile handling installation, configuration, and execution, allowing for full or partial pipeline runs. Musta has been validated at the CRS4-NGS Core facility and tested on large datasets from The Cancer Genome Atlas and the Beijing Institute of Genomics. Musta has proven robust and flexible for somatic variant analysis in cancer. It is user-friendly, requiring no specialized programming skills, and enables data processing with a single command line. Its reproducibility ensures consistent results across users following the same protocol.
Collapse
Affiliation(s)
- Rossano Atzeni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Matteo Massidda
- Department of Medical, Surgical and Experimental Sciences, University of Sassari, 07100 Sassari, Italy;
| | - Enrico Pieroni
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Vincenzo Rallo
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| | - Massimo Pisu
- Center for Advanced Studies, Research and Development in Sardinia (CRS4), 09050 Pula, Italy; (R.A.); (E.P.); (M.P.)
| | - Andrea Angius
- Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche (CNR), Cittadella Universitaria di Cagliari, 09042 Monserrato, Italy;
| |
Collapse
|
4
|
Khachaturyan M, Santer M, Reusch TBH, Dagan T. Heteroplasmy Is Rare in Plant Mitochondria Compared with Plastids despite Similar Mutation Rates. Mol Biol Evol 2024; 41:msae135. [PMID: 38934796 PMCID: PMC11245704 DOI: 10.1093/molbev/msae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 06/11/2024] [Accepted: 06/20/2024] [Indexed: 06/28/2024] Open
Abstract
Plant cells harbor two membrane-bound organelles containing their own genetic material-plastids and mitochondria. Although the two organelles coexist and coevolve within the same plant cells, they differ in genome copy number, intracellular organization, and mode of segregation. How these attributes affect the time to fixation or, conversely, loss of neutral alleles is currently unresolved. Here, we show that mitochondria and plastids share the same mutation rate, yet plastid alleles remain in a heteroplasmic state significantly longer compared with mitochondrial alleles. By analyzing genetic variants across populations of the marine flowering plant Zostera marina and simulating organelle allele dynamics, we examine the determinants of allele segregation and allele fixation. Our results suggest that the bottlenecks on the cell population, e.g. during branching or seeding, and stratification of the meristematic tissue are important determinants of mitochondrial allele dynamics. Furthermore, we suggest that the prolonged plastid allele dynamics are due to a yet unknown active plastid partition mechanism. The dissimilarity between plastid and mitochondrial novel allele fixation at different levels of organization may manifest in differences in adaptation processes. Our study uncovers fundamental principles of organelle population genetics that are essential for further investigations of long-term evolution and molecular dating of divergence events.
Collapse
Affiliation(s)
- Marina Khachaturyan
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
- Institute of General Microbiology, University of Kiel, Kiel, Germany
| | - Mario Santer
- Institute of General Microbiology, University of Kiel, Kiel, Germany
| | - Thorsten B H Reusch
- Marine Evolutionary Ecology, GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
| | - Tal Dagan
- Institute of General Microbiology, University of Kiel, Kiel, Germany
| |
Collapse
|
5
|
Zhuk AS, Stepchenkova EI, Zotova IV, Belopolskaya OB, Pavlov YI, Kostroma II, Gritsaev SV, Aksenova AY. G-Quadruplex Forming DNA Sequence Context Is Enriched around Points of Somatic Mutations in a Subset of Multiple Myeloma Patients. Int J Mol Sci 2024; 25:5269. [PMID: 38791307 PMCID: PMC11121618 DOI: 10.3390/ijms25105269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
Multiple myeloma (MM) is the second most common hematological malignancy, which remains incurable despite recent advances in treatment strategies. Like other forms of cancer, MM is characterized by genomic instability, caused by defects in DNA repair. Along with mutations in DNA repair genes and genotoxic drugs used to treat MM, non-canonical secondary DNA structures (four-stranded G-quadruplex structures) can affect accumulation of somatic mutations and chromosomal abnormalities in the tumor cells of MM patients. Here, we tested the hypothesis that G-quadruplex structures may influence the distribution of somatic mutations in the tumor cells of MM patients. We sequenced exomes of normal and tumor cells of 11 MM patients and analyzed the data for the presence of G4 context around points of somatic mutations. To identify molecular mechanisms that could affect mutational profile of tumors, we also analyzed mutational signatures in tumor cells as well as germline mutations for the presence of specific SNPs in DNA repair genes or in genes regulating G-quadruplex unwinding. In several patients, we found that sites of somatic mutations are frequently located in regions with G4 context. This pattern correlated with specific germline variants found in these patients. We discuss the possible implications of these variants for mutation accumulation and specificity in MM and propose that the extent of G4 context enrichment around somatic mutation sites may be a novel metric characterizing mutational processes in tumors.
Collapse
Affiliation(s)
- Anna S. Zhuk
- Laboratory of Amyloid Biology, St. Petersburg State University, 199034 St. Petersburg, Russia; (A.S.Z.); (I.V.Z.)
- Institute of Applied Computer Science, ITMO University, 197101 St. Petersburg, Russia
| | - Elena I. Stepchenkova
- Vavilov Institute of General Genetics, St. Petersburg Branch, Russian Academy of Sciences, 199034 St. Petersburg, Russia;
- Department of Genetics and Biotechnology, St. Petersburg State University, 199034 St. Petersburg, Russia
| | - Irina V. Zotova
- Laboratory of Amyloid Biology, St. Petersburg State University, 199034 St. Petersburg, Russia; (A.S.Z.); (I.V.Z.)
- Vavilov Institute of General Genetics, St. Petersburg Branch, Russian Academy of Sciences, 199034 St. Petersburg, Russia;
| | - Olesya B. Belopolskaya
- Resource Center “Bio-Bank Center”, Research Park of St. Petersburg State University, 198504 St. Petersburg, Russia;
- The Laboratory of Genogeography, Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991 Moscow, Russia
| | - Youri I. Pavlov
- Eppley Institute for Research in Cancer, Fred and Pamela Buffett Cancer Center, University of Nebraska Medical Center, Omaha, NE 68198, USA;
- Departments of Biochemistry and Molecular Biology, Microbiology and Pathology, Genetics Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Ivan I. Kostroma
- City Hospital No. 15, 198205 St. Petersburg, Russia; (I.I.K.); (S.V.G.)
| | | | - Anna Y. Aksenova
- Laboratory of Amyloid Biology, St. Petersburg State University, 199034 St. Petersburg, Russia; (A.S.Z.); (I.V.Z.)
| |
Collapse
|
6
|
Ramakrishnan S, Cortes-Gomez E, Athans SR, Attwood KM, Rosario SR, Kim SJ, Mager DE, Isenhart EG, Hu Q, Wang J, Woloszynska A. Race-specific coregulatory and transcriptomic profiles associated with DNA methylation and androgen receptor in prostate cancer. Genome Med 2024; 16:52. [PMID: 38566104 PMCID: PMC10988846 DOI: 10.1186/s13073-024-01323-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 03/22/2024] [Indexed: 04/04/2024] Open
Abstract
BACKGROUND Prostate cancer is a significant health concern, particularly among African American (AA) men who exhibit higher incidence and mortality compared to European American (EA) men. Understanding the molecular mechanisms underlying these disparities is imperative for enhancing clinical management and achieving better outcomes. METHODS Employing a multi-omics approach, we analyzed prostate cancer in both AA and EA men. Using Illumina methylation arrays and RNA sequencing, we investigated DNA methylation and gene expression in tumor and non-tumor prostate tissues. Additionally, Boolean analysis was utilized to unravel complex networks contributing to racial disparities in prostate cancer. RESULTS When comparing tumor and adjacent non-tumor prostate tissues, we found that DNA hypermethylated regions are enriched for PRC2/H3K27me3 pathways and EZH2/SUZ12 cofactors. Olfactory/ribosomal pathways and distinct cofactors, including CTCF and KMT2A, were enriched in DNA hypomethylated regions in prostate tumors from AA men. We identified race-specific inverse associations of DNA methylation with expression of several androgen receptor (AR) associated genes, including the GATA family of transcription factors and TRIM63. This suggests that race-specific dysregulation of the AR signaling pathway exists in prostate cancer. To investigate the effect of AR inhibition on race-specific gene expression changes, we generated in-silico patient-specific prostate cancer Boolean networks. Our simulations revealed prolonged AR inhibition causes significant dysregulation of TGF-β, IDH1, and cell cycle pathways specifically in AA prostate cancer. We further quantified global gene expression changes, which revealed differential expression of genes related to microtubules, immune function, and TMPRSS2-fusion pathways, specifically in prostate tumors of AA men. Enrichment of these pathways significantly correlated with an altered risk of disease progression in a race-specific manner. CONCLUSIONS Our study reveals unique signaling networks underlying prostate cancer biology in AA and EA men, offering potential insights for clinical management strategies tailored to specific racial groups. Targeting AR and associated pathways could be particularly beneficial in addressing the disparities observed in prostate cancer outcomes in the context of AA and EA men. Further investigation into these identified pathways may lead to the development of personalized therapeutic approaches to improve outcomes for prostate cancer patients across different racial backgrounds.
Collapse
Affiliation(s)
- Swathi Ramakrishnan
- Department of Pharmacology and Therapeutics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Eduardo Cortes-Gomez
- Department of Bioinformatics and Biostatistics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
- Department of Biostatistics, SUNY University at Buffalo, Kimball Tower, Buffalo, NY, 14214, USA
| | - Sarah R Athans
- Department of Pharmacology and Therapeutics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Kristopher M Attwood
- Department of Bioinformatics and Biostatistics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Spencer R Rosario
- Department of Bioinformatics and Biostatistics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Se Jin Kim
- Department of Pharmaceutical Sciences, SUNY University at Buffalo, Buffalo, NY, 14214, USA
| | - Donald E Mager
- Department of Pharmaceutical Sciences, SUNY University at Buffalo, Buffalo, NY, 14214, USA
- Enhanced Pharmacodynamics, LLC, Buffalo, NY, 14203, USA
| | - Emily G Isenhart
- Department of Cancer Genetics and Genomics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Qiang Hu
- Department of Bioinformatics and Biostatistics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Jianmin Wang
- Department of Bioinformatics and Biostatistics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA
| | - Anna Woloszynska
- Department of Pharmacology and Therapeutics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, 14263, USA.
| |
Collapse
|
7
|
Srinivasan S, Dhamne C, Patkar N, Chatterjee G, Moulik NR, Chichra A, Pallath A, Tembhare P, Shetty D, Subramanian PG, Narula G, Banavali S. KIT exon 17 mutations are predictive of inferior outcome in pediatric acute myeloid leukemia with RUNX1::RUNX1T1. Pediatr Blood Cancer 2024; 71:e30791. [PMID: 38014874 DOI: 10.1002/pbc.30791] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 10/24/2023] [Accepted: 11/15/2023] [Indexed: 11/29/2023]
Abstract
BACKGROUND Pediatric core binding factor acute myeloid leukemia (CBF-AML), although considered a favorable risk subtype, exhibits variable outcomes primarily driven by additional genetic abnormalities, such as KIT mutations. PROCEDURE In this study, we examined the prognostic impact of KIT mutations in 130 pediatric patients with CBF-AML, treated uniformly at a single center over 4 years (2017-2021). KIT mutations were detected via next-generation sequencing using a myeloid panel comprising 52 genes for most patients. RESULTS Our findings revealed that KIT mutations were present in 31% of CBF-AML cases. Exon 17 KIT mutation was most commonly (72%) seen with notable occurrences at the D816 and N822 residue in 48% and 39% of cases, respectively. The 3-year cumulative incidence of relapse (CIR) and overall survival (OS) for patients with exon 17 KIT mutation were 36% and 40%, respectively, and was significantly worse in comparison to other site KIT mutations (3-year CIR: 11%; OS: 64%) and without KIT mutation (3-year CIR: 13%; OS:71%). Notably, the prognostic impact of KIT mutations was prominent in patients with RUNX1::RUNX1T1, but not in those with CBFB::MYH11 fusion. Additionally, a high KIT variant-allele frequency (VAF) (>33%) predicted for a higher disease relapse; 3-year CIR of 40% for VAF greater than 33% versus 7% for VAF less than 33%. When adjusted for site of KIT mutation and end-of-induction measurable residual disease, VAF greater than 33% correlated with poor OS (hazard ratio [HR]: 4.4 [95% CI: 1.2-17.2], p = .034). CONCLUSION Exon 17 KIT mutations serve as an important predictor of relapse in RUNX1::RUNX1T1 pediatric AML. In addition, a high KIT VAF may predict poor outcomes in these patients. These results emphasize the need to incorporate KIT mutational analysis into risk stratification for pediatric CBF-AML.
Collapse
Affiliation(s)
- Shyam Srinivasan
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Chetan Dhamne
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Nikhil Patkar
- Department of Hematopathology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Gaurav Chatterjee
- Department of Hematopathology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Nirmalya Roy Moulik
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Akanksha Chichra
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Aneeta Pallath
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Prashant Tembhare
- Department of Hematopathology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Dhanalaxmi Shetty
- Department of Cancer Cytogenetics, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - P G Subramanian
- Department of Hematopathology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Gaurav Narula
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| | - Shripad Banavali
- Department of Pediatric Oncology, Tata Memorial Centre, Homi Bhabha National Institute, Mumbai, Maharashtra, India
| |
Collapse
|
8
|
Shah RK, Cygan E, Kozlik T, Colina A, Zamora AE. Utilizing immunogenomic approaches to prioritize targetable neoantigens for personalized cancer immunotherapy. Front Immunol 2023; 14:1301100. [PMID: 38149253 PMCID: PMC10749952 DOI: 10.3389/fimmu.2023.1301100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Accepted: 11/29/2023] [Indexed: 12/28/2023] Open
Abstract
Advancements in sequencing technologies and bioinformatics algorithms have expanded our ability to identify tumor-specific somatic mutation-derived antigens (neoantigens). While recent studies have shown neoantigens to be compelling targets for cancer immunotherapy due to their foreign nature and high immunogenicity, the need for increasingly accurate and cost-effective approaches to rapidly identify neoantigens remains a challenging task, but essential for successful cancer immunotherapy. Currently, gene expression analysis and algorithms for variant calling can be used to generate lists of mutational profiles across patients, but more care is needed to curate these lists and prioritize the candidate neoantigens most capable of inducing an immune response. A growing amount of evidence suggests that only a handful of somatic mutations predicted by mutational profiling approaches act as immunogenic neoantigens. Hence, unbiased screening of all candidate neoantigens predicted by Whole Genome Sequencing/Whole Exome Sequencing may be necessary to more comprehensively access the full spectrum of immunogenic neoepitopes. Once putative cancer neoantigens are identified, one of the largest bottlenecks in translating these neoantigens into actionable targets for cell-based therapies is identifying the cognate T cell receptors (TCRs) capable of recognizing these neoantigens. While many TCR-directed screening and validation assays have utilized bulk samples in the past, there has been a recent surge in the number of single-cell assays that provide a more granular understanding of the factors governing TCR-pMHC interactions. The goal of this review is to provide an overview of existing strategies to identify candidate neoantigens using genomics-based approaches and methods for assessing neoantigen immunogenicity. Additionally, applications, prospects, and limitations of some of the current single-cell technologies will be discussed. Finally, we will briefly summarize some of the recent models that have been used to predict TCR antigen specificity and analyze the TCR receptor repertoire.
Collapse
Affiliation(s)
- Ravi K. Shah
- Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Erin Cygan
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Tanya Kozlik
- Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Alfredo Colina
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| | - Anthony E. Zamora
- Department of Medicine, Medical College of Wisconsin, Milwaukee, WI, United States
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
| |
Collapse
|
9
|
London CA, Gardner H, Zhao S, Knapp DW, Utturkar SM, Duval DL, Chambers MR, Ostrander E, Trent JM, Kuffel G. Leading the pack: Best practices in comparative canine cancer genomics to inform human oncology. Vet Comp Oncol 2023; 21:565-577. [PMID: 37778398 DOI: 10.1111/vco.12935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 10/03/2023]
Abstract
Pet dogs develop spontaneous cancers at a rate estimated to be five times higher than that of humans, providing a unique opportunity to study disease biology and evaluate novel therapeutic strategies in a model system that possesses an intact immune system and mirrors key aspects of human cancer biology. Despite decades of interest, effective utilization of pet dog cancers has been hindered by a limited repertoire of necessary cellular and molecular reagents for both in vitro and in vivo studies, as well as a dearth of information regarding the genomic landscape of these cancers. Recently, many of these critical gaps have been addressed through the generation of a highly annotated canine reference genome, the creation of several tools necessary for multi-omic analysis of canine tumours, and the development of a centralized repository for key genomic and associated clinical information from canine cancer patients, the Integrated Canine Data Commons. Together, these advances have catalysed multidisciplinary efforts designed to integrate the study of pet dog cancers more effectively into the translational continuum, with the ultimate goal of improving human outcomes. The current review summarizes this recent progress and provides a guide to resources and tools available for comparative study of pet dog cancers.
Collapse
Affiliation(s)
- Cheryl A London
- Cummings School of Veterinary Medicine, Tufts University, North Grafton, Massachusetts, USA
| | - Heather Gardner
- Cummings School of Veterinary Medicine, Tufts University, North Grafton, Massachusetts, USA
| | - Shaying Zhao
- University of Georgia Cancer Center, University of Georgia, Athens, Georgia, USA
| | - Deborah W Knapp
- College of Veterinary Medicine, Purdue University, West Lafayette, Indiana, USA
| | - Sagar M Utturkar
- Purdue Institute for Cancer Research, Purdue University, West Lafayette, Indiana, USA
| | - Dawn L Duval
- College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, Colorado, USA
| | | | - Elaine Ostrander
- Cancer Genetics and Comparative Genomics Branch, National Cancer Institute, Bethesda, Maryland, USA
| | - Jeffrey M Trent
- Translational Genomics Research Institute, Phoenix, Arizona, USA
| | - Gina Kuffel
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
10
|
Beeler JS, Bolton KL. How low can you go?: Methodologic considerations in clonal hematopoiesis variant calling. Leuk Res 2023; 135:107419. [PMID: 37956474 DOI: 10.1016/j.leukres.2023.107419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/25/2023] [Accepted: 10/30/2023] [Indexed: 11/15/2023]
Abstract
Clonal hematopoiesis (CH) is defined by the presence of an expanded clonal hematopoietic cell population due to an acquired mutation conferring a selective growth advantage and is known to predispose to hematologic malignancy. In this review, we discuss sequencing methods for CH detection in bulk sequencing data and corresponding bioinformatic approaches for variant calling, filtering, and curation. We detail practical recommendations for CH calling. Finally, we discuss how improvements in CH sequencing and bioinformatic approaches will enable the characterization of CH trajectories, its impact on human health, and therapeutic approaches to mitigate its adverse effects.
Collapse
Affiliation(s)
- J Scott Beeler
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Kelly L Bolton
- Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
| |
Collapse
|
11
|
Xiang X, Lu B, Song D, Li J, Shu K, Pu D. Evaluating the performance of low-frequency variant calling tools for the detection of variants from short-read deep sequencing data. Sci Rep 2023; 13:20444. [PMID: 37993475 PMCID: PMC10665316 DOI: 10.1038/s41598-023-47135-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/09/2023] [Indexed: 11/24/2023] Open
Abstract
Detection of low-frequency variants with high accuracy plays an important role in biomedical research and clinical practice. However, it is challenging to do so with next-generation sequencing (NGS) approaches due to the high error rates of NGS. To accurately distinguish low-level true variants from these errors, many statistical variants calling tools for calling low-frequency variants have been proposed, but a systematic performance comparison of these tools has not yet been performed. Here, we evaluated four raw-reads-based variant callers (SiNVICT, outLyzer, Pisces, and LoFreq) and four UMI-based variant callers (DeepSNVMiner, MAGERI, smCounter2, and UMI-VarCal) considering their capability to call single nucleotide variants (SNVs) with allelic frequency as low as 0.025% in deep sequencing data. We analyzed a total of 54 simulated data with various sequencing depths and variant allele frequencies (VAFs), two reference data, and Horizon Tru-Q sample data. The results showed that the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers regarding detection limit. Sequencing depth had almost no effect on the UMI-based callers but significantly influenced on the raw-reads-based callers. Regardless of the sequencing depth, MAGERI showed the fastest analysis, while smCounter2 consistently took the longest to finish the variant calling process. Overall, DeepSNVMiner and UMI-VarCal performed the best with considerably good sensitivity and precision of 88%, 100%, and 84%, 100%, respectively. In conclusion, the UMI-based callers, except smCounter2, outperformed the raw-reads-based callers in terms of sensitivity and precision. We recommend using DeepSNVMiner and UMI-VarCal for low-frequency variant detection. The results provide important information regarding future directions for reliable low-frequency variant detection and algorithm development, which is critical in genetics-based medical research and clinical applications.
Collapse
Affiliation(s)
- Xudong Xiang
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Bowen Lu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Dongyang Song
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Jie Li
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
| | - Kunxian Shu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Dan Pu
- Chongqing Key Laboratory of Big Data for Bio Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| |
Collapse
|
12
|
Vaisband M, Schubert M, Gassner FJ, Geisberger R, Greil R, Zaborsky N, Hasenauer J. Validation of genetic variants from NGS data using deep convolutional neural networks. BMC Bioinformatics 2023; 24:158. [PMID: 37081386 PMCID: PMC10116675 DOI: 10.1186/s12859-023-05255-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 03/27/2023] [Indexed: 04/22/2023] Open
Abstract
Accurate somatic variant calling from next-generation sequencing data is one most important tasks in personalised cancer therapy. The sophistication of the available technologies is ever-increasing, yet, manual candidate refinement is still a necessary step in state-of-the-art processing pipelines. This limits reproducibility and introduces a bottleneck with respect to scalability. We demonstrate that the validation of genetic variants can be improved using a machine learning approach resting on a Convolutional Neural Network, trained using existing human annotation. In contrast to existing approaches, we introduce a way in which contextual data from sequencing tracks can be included into the automated assessment. A rigorous evaluation shows that the resulting model is robust and performs on par with trained researchers following published standard operating procedure.
Collapse
Affiliation(s)
- Marc Vaisband
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria.
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany.
| | - Maria Schubert
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Franz Josef Gassner
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Roland Geisberger
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Richard Greil
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Nadja Zaborsky
- Department of Internal Medicine III with Haematology, Medical Oncology, Haemostaseology, Infectiology and Rheumatology, Oncologic Center; Salzburg Cancer Research Institute - Laboratory for Immunological and Molecular Cancer Research (SCRI-LIMCR); Cancer Cluster Salzburg, Paracelsus Medical University, Salzburg, Austria
| | - Jan Hasenauer
- Life and Medical Sciences Institute, University of Bonn, Bonn, Germany
| |
Collapse
|
13
|
Starrett GJ, Yu K, Golubeva Y, Lenz P, Piaskowski ML, Petersen D, Dean M, Israni A, Hernandez BY, Tucker TC, Cheng I, Gonsalves L, Morris CR, Hussain SK, Lynch CF, Harris RS, Prokunina-Olsson L, Meltzer PS, Buck CB, Engels EA. Evidence for virus-mediated oncogenesis in bladder cancers arising in solid organ transplant recipients. eLife 2023; 12:e82690. [PMID: 36961501 PMCID: PMC10446826 DOI: 10.7554/elife.82690] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 03/22/2023] [Indexed: 03/25/2023] Open
Abstract
A small percentage of bladder cancers in the general population have been found to harbor DNA viruses. In contrast, up to 25% of tumors of solid organ transplant recipients, who are at an increased risk of developing bladder cancer and have an overall poorer outcomes, harbor BK polyomavirus (BKPyV). To better understand the biology of the tumors and the mechanisms of carcinogenesis from potential oncoviruses, we performed whole genome and transcriptome sequencing on bladder cancer specimens from 43 transplant patients. Nearly half of the tumors from this patient population contained viral sequences. The most common were from BKPyV (N=9, 21%), JC polyomavirus (N=7, 16%), carcinogenic human papillomaviruses (N=3, 7%), and torque teno viruses (N=5, 12%). Immunohistochemistry revealed variable Large T antigen expression in BKPyV-positive tumors ranging from 100% positive staining of tumor tissue to less than 1%. In most cases of BKPyV-positive tumors, the viral genome appeared to be clonally integrated into the host chromosome consistent with microhomology-mediated end joining and coincided with focal amplifications of the tumor genome similar to other virus-mediated cancers. Significant changes in host gene expression consistent with the functions of BKPyV Large T antigen were also observed in these tumors. Lastly, we identified four mutation signatures in our cases, with those attributable to APOBEC3 and SBS5 being the most abundant. Mutation signatures associated with an antiviral drug, ganciclovir, and aristolochic acid, a nephrotoxic compound found in some herbal medicines, were also observed. The results suggest multiple pathways to carcinogenesis in solid organ transplant recipients with a large fraction being virus-associated.
Collapse
Affiliation(s)
| | - Kelly Yu
- DCEG, NCI, NIHRockvilleUnited States
| | | | - Petra Lenz
- Leidos Biomedical Research IncFrederickUnited States
| | | | | | | | - Ajay Israni
- Department of Medicine, Nephrology Division, Hennepin Healthcare System, University of MinnesotaMinneapolisUnited States
| | | | - Thomas C Tucker
- The Kentucky Cancer Registry, University of KentuckyLexingtonUnited States
| | - Iona Cheng
- Department of Epidemiology and Biostatistics,and Helen Diller Family Comprehensive Cancer Center, University of California, San FranciscoFremontUnited States
| | - Lou Gonsalves
- Connecticut Tumor Registry, Connecticut Department of Public HealthHartfordUnited States
| | - Cyllene R Morris
- California Cancer Reporting and Epidemiologic Surveillance Program, University of California, DavisDavisUnited States
| | - Shehnaz K Hussain
- Cedars-Sinai Cancer and Department of Medicine, Cedars-Sinai Medical CenterLos AngelesUnited States
| | - Charles F Lynch
- The Iowa Cancer Registry, University of IowaIowa CityUnited States
| | - Reuben S Harris
- Howard Hughes Medical Institute, University of MinnesotaMinneapolisUnited States
| | | | | | | | | |
Collapse
|
14
|
Zheng T. DETexT: An SNV detection enhancement for low read depth by integrating mutational signatures into TextCNN. Front Genet 2022; 13:943972. [PMID: 36246660 PMCID: PMC9554618 DOI: 10.3389/fgene.2022.943972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 09/06/2022] [Indexed: 12/01/2022] Open
Abstract
Detecting SNV at very low read depths helps to reduce sequencing requirements, lowers sequencing costs, and aids in the early screening, diagnosis, and treatment of cancer. However, the accuracy of SNV detection is significantly reduced at read depths below ×34 due to the lack of a sufficient number of read pairs to help filter out false positives. Many recent studies have revealed the potential of mutational signature (MS) in detecting true SNV, understanding the mutational processes that lead to the development of human cancers, and analyzing the endogenous and exogenous causes. Here, we present DETexT, an SNV detection method better suited to low read depths, which classifies false positive variants by combining MS with deep learning algorithms to mine correlation information around bases in individual reads without relying on the support of duplicate read pairs. We have validated the effectiveness of DETexT on simulated and real datasets and conducted comparative experiments. The source code has been uploaded to https://github.com/TrinaZ/extra-lowRD for academic use only.
Collapse
Affiliation(s)
- Tian Zheng
- Department of Computer Science and Technology, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China
- Institute of Data Science and Information Quality, Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
- *Correspondence: Tian Zheng,
| |
Collapse
|
15
|
Mahmood MS, Afzal M, Batool H, Saif A, Aqdas T, Ashraf NM, Saleem M. Screening of Pathogenic Missense Single Nucleotide Variants From LHPP Gene Associated With the Hepatocellular Carcinoma: An In silico Approach. Bioinform Biol Insights 2022; 16:11779322221115547. [PMID: 35966807 PMCID: PMC9373111 DOI: 10.1177/11779322221115547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/11/2022] [Indexed: 11/15/2022] Open
Abstract
LHPP gene encodes a phospholysine phosphohistidine inorganic pyrophosphate phosphatase, which functions as a tumor-suppressor protein. The tumor suppression by this protein has been confirmed in various cancers, including hepatocellular carcinoma (HCC). LHPP downregulation promotes cell growth and proliferation by modulating the PI3K/AKT signaling pathway. This study identifies potentially deleterious missense single nucleotide variants (SNVs) associated with the LHPP gene using multiple computational tools based on different algorithms. A total of 4 destabilizing mutants are identified as L22P, I212T, G227R, and G236R, from the conserved region of the phosphatase. The 3-dimensional (3D) modeling and structural comparison of variants with the native protein reveals significant structural and conformational variations after mutations, suggesting disruption in the function of phospholysine phosphohistidine inorganic pyrophosphate phosphatase. The identified mutations might, therefore, participate in the cause of HCC.
Collapse
Affiliation(s)
- Malik Siddique Mahmood
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan.,Department of Biochemistry, NUR International University, Lahore, Pakistan
| | - Maryam Afzal
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Hina Batool
- Department of Life Sciences, University of Management and Technology, Lahore, Pakistan
| | - Amara Saif
- Department of Life Sciences, University of Management and Technology, Lahore, Pakistan
| | - Tahreem Aqdas
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| | - Naeem Mahmood Ashraf
- Department of Biochemistry & Biotechnology, University of Gujrat, Gujrat, Pakistan
| | - Mahjabeen Saleem
- School of Biochemistry & Biotechnology, University of the Punjab, Lahore, Pakistan
| |
Collapse
|
16
|
Xiong KX, Zhou HL, Lin C, Yin JH, Kristiansen K, Yang HM, Li GB. Chord: an ensemble machine learning algorithm to identify doublets in single-cell RNA sequencing data. Commun Biol 2022; 5:510. [PMID: 35637301 PMCID: PMC9151659 DOI: 10.1038/s42003-022-03476-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2021] [Accepted: 05/11/2022] [Indexed: 12/16/2022] Open
Abstract
High-throughput single-cell RNA sequencing (scRNA-seq) is a popular method, but it is accompanied by doublet rate problems that disturb the downstream analysis. Several computational approaches have been developed to detect doublets. However, most of these methods may yield satisfactory performance in some datasets but lack stability in others; thus, it is difficult to regard a single method as the gold standard which can be applied to all types of scenarios. It is a difficult and time-consuming task for researchers to choose the most appropriate software. We here propose Chord which implements a machine learning algorithm that integrates multiple doublet detection methods to address these issues. Chord had higher accuracy and stability than the individual approaches on different datasets containing real and synthetic data. Moreover, Chord was designed with a modular architecture port, which has high flexibility and adaptability to the incorporation of any new tools. Chord is a general solution to the doublet detection problem. For the unmet need to choose the suitable doublet detection method, an ensemble machine learning algorithm called Chord was developed, which integrates multiple methods and achieves higher accuracy and stability on different scRNA-seq datasets.
Collapse
Affiliation(s)
- Ke-Xu Xiong
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing, 100049, China.,BGI-Shenzhen, Shenzhen, 518083, China
| | - Han-Lin Zhou
- BGI-Shenzhen, Shenzhen, 518083, China. .,BGI College & Henan Institute of Medical and Pharmaceutical Science, Zhengzhou University, Zhengzhou, China. .,BGI-Henan, BGI-Shenzhen, Xinxiang, 453000, China. .,Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, DK-2100, Denmark.
| | - Cong Lin
- BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Henan, BGI-Shenzhen, Xinxiang, 453000, China.,Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Shenzhen Key Laboratory of Single-Cell Omics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Jian-Hua Yin
- BGI-Shenzhen, Shenzhen, 518083, China.,BGI-Henan, BGI-Shenzhen, Xinxiang, 453000, China.,Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, BGI-Shenzhen, Shenzhen, 518083, China.,Shenzhen Key Laboratory of Single-Cell Omics, BGI-Shenzhen, Shenzhen, 518083, China
| | - Karsten Kristiansen
- BGI-Shenzhen, Shenzhen, 518083, China.,Laboratory of Genomics and Molecular Biomedicine, Department of Biology, University of Copenhagen, Copenhagen, DK-2100, Denmark
| | - Huan-Ming Yang
- BGI-Shenzhen, Shenzhen, 518083, China.,James D. Watson Institute of Genome Science, 310008, Hangzhou, China
| | - Gui-Bo Li
- BGI-Shenzhen, Shenzhen, 518083, China. .,BGI College & Henan Institute of Medical and Pharmaceutical Science, Zhengzhou University, Zhengzhou, China. .,BGI-Henan, BGI-Shenzhen, Xinxiang, 453000, China. .,Guangdong Provincial Key Laboratory of Human Disease Genomics, Shenzhen Key Laboratory of Genomics, BGI-Shenzhen, Shenzhen, 518083, China. .,Shenzhen Key Laboratory of Single-Cell Omics, BGI-Shenzhen, Shenzhen, 518083, China.
| |
Collapse
|
17
|
Niu YN, Roberts EG, Denisko D, Hoffman MM. Assessing and assuring interoperability of a genomics file format. Bioinformatics 2022; 38:3327-3336. [PMID: 35575355 PMCID: PMC9237710 DOI: 10.1093/bioinformatics/btac327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 03/30/2022] [Accepted: 05/11/2022] [Indexed: 12/01/2022] Open
Abstract
Motivation Bioinformatics software tools operate largely through the use of specialized genomics file formats. Often these formats lack formal specification, making it difficult or impossible for the creators of these tools to robustly test them for correct handling of input and output. This causes problems in interoperability between different tools that, at best, wastes time and frustrates users. At worst, interoperability issues could lead to undetected errors in scientific results. Results We developed a new verification system, Acidbio, which tests for correct behavior in bioinformatics software packages. We crafted tests to unify correct behavior when tools encounter various edge cases—potentially unexpected inputs that exemplify the limits of the format. To analyze the performance of existing software, we tested the input validation of 80 Bioconda packages that parsed the Browser Extensible Data (BED) format. We also used a fuzzing approach to automatically perform additional testing. Of 80 software packages examined, 75 achieved less than 70% correctness on our test suite. We categorized multiple root causes for the poor performance of different types of software. Fuzzing detected other errors that the manually designed test suite could not. We also created a badge system that developers can use to indicate more precisely which BED variants their software accepts and to advertise the software’s performance on the test suite. Availability and implementation Acidbio is available at https://github.com/hoffmangroup/acidbio. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yi Nian Niu
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada
| | - Eric G Roberts
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada
| | - Danielle Denisko
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
| | - Michael M Hoffman
- Princess Margaret Cancer Centre University Health Network, Toronto, ON, M5G 2C1, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada.,Vector Institute, Toronto, ON, M5G 1M1, Canada
| |
Collapse
|
18
|
Abstract
Distilling biologically meaningful information from cancer genome sequencing data requires comprehensive identification of somatic alterations using rigorous computational methods. As the amount and complexity of sequencing data have increased, so has the number of tools for analysing them. Here, we describe the main steps involved in the bioinformatic analysis of cancer genomes, review key algorithmic developments and highlight popular tools and emerging technologies. These tools include those that identify point mutations, copy number alterations, structural variations and mutational signatures in cancer genomes. We also discuss issues in experimental design, the strengths and limitations of sequencing modalities and methodological challenges for the future.
Collapse
|
19
|
Sahraeian SME, Fang LT, Karagiannis K, Moos M, Smith S, Santana-Quintero L, Xiao C, Colgan M, Hong H, Mohiyuddin M, Xiao W. Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample. Genome Biol 2022; 23:12. [PMID: 34996510 PMCID: PMC8740374 DOI: 10.1186/s13059-021-02592-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 12/28/2021] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. RESULTS In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. CONCLUSIONS The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.
Collapse
Affiliation(s)
| | - Li Tai Fang
- Roche Sequencing Solutions, Santa Clara, CA, 95050, USA
| | - Konstantinos Karagiannis
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Malcolm Moos
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Sean Smith
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Luis Santana-Quintero
- The Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Michael Colgan
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA
| | - Huixiao Hong
- Bioinformatics branch, Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR, 72079, USA
| | | | - Wenming Xiao
- Office of Oncological Diseases, Office of New Drug, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, MD, 20993, USA.
| |
Collapse
|
20
|
Chang TC, Xu K, Cheng Z, Wu G. Somatic and Germline Variant Calling from Next-Generation Sequencing Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1361:37-54. [DOI: 10.1007/978-3-030-91836-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
21
|
Farswan A, Jena L, Kaur G, Gupta A, Gupta R, Rani L, Sharma A, Kumar L. Branching clonal evolution patterns predominate mutational landscape in multiple myeloma. Am J Cancer Res 2021; 11:5659-5679. [PMID: 34873486 PMCID: PMC8640818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/27/2021] [Indexed: 06/13/2023] Open
Abstract
Multiple Myeloma (MM) arises from malignant transformation and deregulated proliferation of clonal plasma cells (PCs) harbouring heterogeneous molecular anomalies. The effect of evolving mutations on clone fitness and their cellular prevalence shapes the progressing myeloma genome and impacts clinical outcomes. Although clonal heterogeneity in MM is well established, which subclonal mutations emerge/persist/perish with progression in MM and which of these can be targeted therapeutically remains an open question. In line with this, we have sequenced pairwise whole exomes of 62 MM patients collected at two time points, i.e., at diagnosis and on progression. Somatic variants were called using a novel ensemble approach where a consensus was deduced from four variant callers (Illumina's Dragen, Strelka2, SomaticSniper and SpeedSeq) and actionable/druggable gene targets were identified. A marked intraclonal heterogeneity was observed. Branching evolution was observed among 72.58% patients, of whom 64.51% had low TMBs (<10) and 61.29% had 2 or more founder clones. The hypermutator patients (with high TMB levels ≥10 to ≤100) showed a significant decrease in their TMBs from diagnosis (median TMB 77.11) to progression (median TMB 31.22). A distinct temporal fall in subclonal driver mutations was identified recurrently across diagnosis to progression e.g., in PABPC1, BRAF, KRAS, CR1, DIS3 and ATM genes in 3 or more patients suggesting such patients could be treated early with target specific drugs like Vemurafenib/Cobimetinib. An analogous rise in driver mutations was observed in KMT2C, FOXD4L1, SP140, NRAS and other genes. A few drivers such as FAT4, IGLL5 and CDKN1A retained consistent distribution patterns at two time points. These findings are clinically relevant and point at consideration of evaluating multi time point subclonal mutational landscapes for designing better risk stratification strategies and tailoring time to time risk adapted combination therapies in future.
Collapse
Affiliation(s)
- Akanksha Farswan
- SBILab, Department of Electronics and Communication Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-D)Delhi 110020, India
| | - Lingaraja Jena
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Gurvinder Kaur
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Anubha Gupta
- SBILab, Department of Electronics and Communication Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-D)Delhi 110020, India
| | - Ritu Gupta
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Lata Rani
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Atul Sharma
- Department of Medical Oncology, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Lalit Kumar
- Department of Medical Oncology, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| |
Collapse
|
22
|
Yuan X, Ma C, Zhao H, Yang L, Wang S, Xi J. STIC: Predicting Single Nucleotide Variants and Tumor Purity in Cancer Genome. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2692-2701. [PMID: 32086221 DOI: 10.1109/tcbb.2020.2975181] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Single nucleotide variant (SNV) plays an important role in cellular proliferation and tumorigenesis in various types of human cancer. Next-generation sequencing (NGS) has provided high-throughput data at an unprecedented resolution to predict SNVs. Currently, there exist many computational methods for either germline or somatic SNV discovery from NGS data, but very few of them are versatile enough to adapt to any situations. In the absence of matched normal samples, the prediction of somatic SNVs from single-tumor samples becomes considerably challenging, especially when the tumor purity is unknown. Here, we propose a new approach, STIC, to predict somatic SNVs and estimate tumor purity from NGS data without matched normal samples. The main features of STIC include: (1) extracting a set of SNV-relevant features on each site and training the BP neural network algorithm on the features to predict SNVs; (2) creating an iterative process to distinguish somatic SNVs from germline ones by disturbing allele frequency; and (3) establishing a reasonable relationship between tumor purity and allele frequencies of somatic SNVs to accurately estimate the purity. We quantitatively evaluate the performance of STIC on both simulation and real sequencing datasets, the results of which indicate that STIC outperforms competing methods.
Collapse
|
23
|
Salvo M, González-Feliú E, Toro J, Gallegos I, Maureira I, Miranda-González N, Barajas O, Bustamante E, Ahumada M, Colombo A, Armisén R, Villamán C, Ibañez C, Bravo ML, Sanhueza V, Spencer ML, de Toro G, Morales E, Bizama C, García P, Carrasco AM, Gutiérrez L, Bermejo JL, Verdugo RA, Marcelain K. Validation of an NGS Panel Designed for Detection of Actionable Mutations in Tumors Common in Latin America. J Pers Med 2021; 11:jpm11090899. [PMID: 34575676 PMCID: PMC8472524 DOI: 10.3390/jpm11090899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/29/2021] [Accepted: 09/03/2021] [Indexed: 12/24/2022] Open
Abstract
Next-generation sequencing (NGS) is progressively being used in clinical practice. However, several barriers preclude using this technology for precision oncology in most Latin American countries. To overcome some of these barriers, we have designed a 25-gene panel that contains predictive biomarkers for most current and near-future available therapies in Chile and Latin America. Library preparation was optimized to account for low DNA integrity observed in formalin-fixed paraffin-embedded tissue. The workflow includes an automated bioinformatic pipeline that accounts for the underrepresentation of Latin Americans in genome databases. The panel detected small insertions, deletions, and single nucleotide variants down to allelic frequencies of 0.05 with high sensitivity, specificity, and reproducibility. The workflow was validated in 272 clinical samples from several solid tumor types, including gallbladder (GBC). More than 50 biomarkers were detected in these samples, mainly in BRCA1/2, KRAS, and PIK3CA genes. In GBC, biomarkers for PARP, EGFR, PIK3CA, mTOR, and Hedgehog signaling inhibitors were found. Thus, this small NGS panel is an accurate and sensitive method that may constitute a more cost-efficient alternative to multiple non-NGS assays and costly, large NGS panels. This kind of streamlined assay with automated bioinformatics analysis may facilitate the implementation of precision medicine in Latin America.
Collapse
Affiliation(s)
- Mauricio Salvo
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
| | - Evelin González-Feliú
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
| | - Jessica Toro
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
| | - Iván Gallegos
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Department of Pathology, Hospital Clínico de la Universidad de Chile, Santiago 8380456, Chile
| | - Ignacio Maureira
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Department of Medical Technology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile
| | - Nicolás Miranda-González
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
| | - Olga Barajas
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Department of Internal Medicine, Hospital Clínico Universidad de Chile, Santiago 8380456, Chile
- Fundación Arturo López Pérez, Santiago 7500921, Chile; (E.B.); (A.M.C.)
| | - Eva Bustamante
- Fundación Arturo López Pérez, Santiago 7500921, Chile; (E.B.); (A.M.C.)
| | - Mónica Ahumada
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Department of Internal Medicine, Hospital Clínico Universidad de Chile, Santiago 8380456, Chile
| | - Alicia Colombo
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Department of Pathology, Hospital Clínico de la Universidad de Chile, Santiago 8380456, Chile
| | - Ricardo Armisén
- Center for Genetics and Genomics, Instituto de Ciencias e Innovación en Medicina, Facultad de Medicina Clínica Alemana, Universidad del Desarrollo, Santiago 8320000, Chile;
| | - Camilo Villamán
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
| | - Carolina Ibañez
- Department of Hematology & Oncology, Faculty of Medicine, Pontificia Universidad Católica de Chile (PUC), Santiago 3580000, Chile; (C.I.); (M.L.B.)
| | - María Loreto Bravo
- Department of Hematology & Oncology, Faculty of Medicine, Pontificia Universidad Católica de Chile (PUC), Santiago 3580000, Chile; (C.I.); (M.L.B.)
| | - Verónica Sanhueza
- Department of Pathology, Hospital Padre Hurtado, Santiago 8710022, Chile;
| | - M. Loreto Spencer
- Department of Pathology, Hospital Clínico Regional Guillermo Grant Benavente, Concepción 4070038, Chile;
| | - Gonzalo de Toro
- School of Medical Technology, Universidad Austral de Chile at Puerto Montt, Puerto Montt 5110566, Chile;
| | - Erik Morales
- Department of Pathology, Hospital Regional de Talca, Talca 3460000, Chile;
- Department of Preclinical Sciences, Faculty of Medicine, Universidad Católica del Maule, Talca 3460000, Chile
| | - Carolina Bizama
- Department of Pathology, Faculty of Medicine, Pontificia Universidad Católica de Chile, Santiago 3580000, Chile; (C.B.); (P.G.)
| | - Patricia García
- Department of Pathology, Faculty of Medicine, Pontificia Universidad Católica de Chile, Santiago 3580000, Chile; (C.B.); (P.G.)
| | | | - Lorena Gutiérrez
- Department of Pathology, Hospital San Juan de Dios, Santiago 8320000, Chile;
| | | | - Ricardo A. Verdugo
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Human Genetics Program, ICBM, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile
- Correspondence: (R.A.V.); (K.M.); Tel.: +56-22978-9527 (R.A.V.); +56-22978-9562 (K.M.)
| | - Katherine Marcelain
- Department of Basic and Clinical Oncology, Faculty of Medicine, Universidad de Chile, Santiago 8330015, Chile; (M.S.); (E.G.-F.); (J.T.); (I.G.); (I.M.); (N.M.-G.); (O.B.); (M.A.); (A.C.); (C.V.)
- Correspondence: (R.A.V.); (K.M.); Tel.: +56-22978-9527 (R.A.V.); +56-22978-9562 (K.M.)
| |
Collapse
|
24
|
Fang LT, Zhu B, Zhao Y, Chen W, Yang Z, Kerrigan L, Langenbach K, de Mars M, Lu C, Idler K, Jacob H, Zheng Y, Ren L, Yu Y, Jaeger E, Schroth GP, Abaan OD, Talsania K, Lack J, Shen TW, Chen Z, Stanbouly S, Tran B, Shetty J, Kriga Y, Meerzaman D, Nguyen C, Petitjean V, Sultan M, Cam M, Mehta M, Hung T, Peters E, Kalamegham R, Sahraeian SME, Mohiyuddin M, Guo Y, Yao L, Song L, Lam HYK, Drabek J, Vojta P, Maestro R, Gasparotto D, Kõks S, Reimann E, Scherer A, Nordlund J, Liljedahl U, Jensen RV, Pirooznia M, Li Z, Xiao C, Sherry ST, Kusko R, Moos M, Donaldson E, Tezak Z, Ning B, Tong W, Li J, Duerken-Hughes P, Catalanotti C, Maheshwari S, Shuga J, Liang WS, Keats J, Adkins J, Tassone E, Zismann V, McDaniel T, Trent J, Foox J, Butler D, Mason CE, Hong H, Shi L, Wang C, Xiao W. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing. Nat Biotechnol 2021; 39:1151-1160. [PMID: 34504347 PMCID: PMC8532138 DOI: 10.1038/s41587-021-00993-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 06/18/2021] [Indexed: 02/08/2023]
Abstract
The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.
Collapse
Affiliation(s)
- Li Tai Fang
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Yongmei Zhao
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Wanqiu Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Zhaowei Yang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Liz Kerrigan
- ATCC (American Type Culture Collection), Manassas, VA, USA
| | | | | | - Charles Lu
- Computational Genomics, Genomics Research Center (GRC), AbbVie, North Chicago, IL, USA
| | - Kenneth Idler
- Computational Genomics, Genomics Research Center (GRC), AbbVie, North Chicago, IL, USA
| | - Howard Jacob
- Computational Genomics, Genomics Research Center (GRC), AbbVie, North Chicago, IL, USA
| | - Yuanting Zheng
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Luyao Ren
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China
| | | | | | | | - Keyur Talsania
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Justin Lack
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tsai-Wei Shen
- Advanced Biomedical and Computational Sciences, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Zhong Chen
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Seta Stanbouly
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA
| | - Bao Tran
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Jyoti Shetty
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Yuliya Kriga
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Cu Nguyen
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA
| | - Virginie Petitjean
- Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Marc Sultan
- Biomarker Development, Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Margaret Cam
- CCR Collaborative Bioinformatics Resource (CCBR), Office of Science and Technology Resources, Center for Cancer Research, Bethesda, MD, USA
| | - Monika Mehta
- Sequencing Facility, Cancer Research Technology Program, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Tiffany Hung
- Genentech, a member of the Roche group, South San Francisco, CA, USA
| | - Eric Peters
- Genentech, a member of the Roche group, South San Francisco, CA, USA
| | - Rasika Kalamegham
- Genentech, a member of the Roche group, South San Francisco, CA, USA
| | | | - Marghoob Mohiyuddin
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Yunfei Guo
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Lijing Yao
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Lei Song
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Hugo Y K Lam
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., Belmont, CA, USA
| | - Jiri Drabek
- IMTM, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Petr Vojta
- IMTM, Faculty of Medicine and Dentistry, Palacky University, Olomouc, Czech Republic
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
| | - Roberta Maestro
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Daniela Gasparotto
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Centro di Riferimento Oncologico di Aviano (CRO) IRCCS, National Cancer Institute, Unit of Oncogenetics and Functional Oncogenomics, Aviano, Italy
| | - Sulev Kõks
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Perron Institute for Neurological and Translational Science, Nedlands, Western Australia, Australia
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Ene Reimann
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Andreas Scherer
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Jessica Nordlund
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Ulrika Liljedahl
- European Infrastructure for Translational Medicine, Amsterdam, the Netherlands
- Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | - Roderick V Jensen
- Department of Biological Sciences, Virginia Tech, Blacksburg, VA, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Core, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - Zhipan Li
- Sentieon Inc., Mountain View, CA, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stephen T Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Malcolm Moos
- Center for Biologics Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Eric Donaldson
- Center for Drug Evaluation and Research, FDA, Silver Spring, MD, USA
| | - Zivana Tezak
- Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA
| | - Baitang Ning
- National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Weida Tong
- National Center for Toxicological Research, FDA, Jefferson, AR, USA
| | - Jing Li
- Department of Allergy and Clinical Immunology, State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | | | | | | | | | - Winnie S Liang
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Jonathan Keats
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | | | - Erica Tassone
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | | | | | - Jeffrey Trent
- Translational Genomics Research Institute, Phoenix, AZ, USA
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Daniel Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA
| | - Huixiao Hong
- National Center for Toxicological Research, FDA, Jefferson, AR, USA.
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
| | - Charles Wang
- Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
- Department of Basic Science, Loma Linda University School of Medicine, Loma Linda, CA, USA.
| | - Wenming Xiao
- Center for Devices and Radiological Health, FDA, Silver Spring, MD, USA.
| |
Collapse
|
25
|
Xu Y, Su GH, Ma D, Xiao Y, Shao ZM, Jiang YZ. Technological advances in cancer immunity: from immunogenomics to single-cell analysis and artificial intelligence. Signal Transduct Target Ther 2021; 6:312. [PMID: 34417437 PMCID: PMC8377461 DOI: 10.1038/s41392-021-00729-7] [Citation(s) in RCA: 48] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 07/06/2021] [Accepted: 07/18/2021] [Indexed: 02/07/2023] Open
Abstract
Immunotherapies play critical roles in cancer treatment. However, given that only a few patients respond to immune checkpoint blockades and other immunotherapeutic strategies, more novel technologies are needed to decipher the complicated interplay between tumor cells and the components of the tumor immune microenvironment (TIME). Tumor immunomics refers to the integrated study of the TIME using immunogenomics, immunoproteomics, immune-bioinformatics, and other multi-omics data reflecting the immune states of tumors, which has relied on the rapid development of next-generation sequencing. High-throughput genomic and transcriptomic data may be utilized for calculating the abundance of immune cells and predicting tumor antigens, referring to immunogenomics. However, as bulk sequencing represents the average characteristics of a heterogeneous cell population, it fails to distinguish distinct cell subtypes. Single-cell-based technologies enable better dissection of the TIME through precise immune cell subpopulation and spatial architecture investigations. In addition, radiomics and digital pathology-based deep learning models largely contribute to research on cancer immunity. These artificial intelligence technologies have performed well in predicting response to immunotherapy, with profound significance in cancer therapy. In this review, we briefly summarize conventional and state-of-the-art technologies in the field of immunogenomics, single-cell and artificial intelligence, and present prospects for future research.
Collapse
Affiliation(s)
- Ying Xu
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Guan-Hua Su
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Ding Ma
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Yi Xiao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| | - Zhi-Ming Shao
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
- Institutes of Biomedical Sciences, Fudan University, Shanghai, China.
| | - Yi-Zhou Jiang
- Key Laboratory of Breast Cancer in Shanghai, Department of Breast Surgery, Fudan University Shanghai Cancer Center, Shanghai, China.
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China.
| |
Collapse
|
26
|
Alsaihati BA, Ho KL, Watson J, Feng Y, Wang T, Dobbin KK, Zhao S. Canine tumor mutational burden is correlated with TP53 mutation across tumor types and breeds. Nat Commun 2021; 12:4670. [PMID: 34344882 PMCID: PMC8333103 DOI: 10.1038/s41467-021-24836-9] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Spontaneous canine cancers are valuable but relatively understudied and underutilized models. To enhance their usage, we reanalyze whole exome and genome sequencing data published for 684 cases of >7 common tumor types and >35 breeds, with rigorous quality control and breed validation. Our results indicate that canine tumor alteration landscape is tumor type-dependent, but likely breed-independent. Each tumor type harbors major pathway alterations also found in its human counterpart (e.g., PI3K in mammary tumor and p53 in osteosarcoma). Mammary tumor and glioma have lower tumor mutational burden (TMB) (median < 0.5 mutations per Mb), whereas oral melanoma, osteosarcoma and hemangiosarcoma have higher TMB (median ≥ 1 mutations per Mb). Across tumor types and breeds, TMB is associated with mutation of TP53 but not PIK3CA, the most mutated genes. Golden Retrievers harbor a TMB-associated and osteosarcoma-enriched mutation signature. Here, we provide a snapshot of canine mutations across major tumor types and breeds.
Collapse
Affiliation(s)
- Burair A Alsaihati
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA, USA
- National Center for Genomics Technology, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Kun-Lin Ho
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Joshua Watson
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Yuan Feng
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Tianfang Wang
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA, USA
| | - Kevin K Dobbin
- Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
| | - Shaying Zhao
- Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
| |
Collapse
|
27
|
Apostolides M, Jiang Y, Husić M, Siddaway R, Hawkins C, Turinsky AL, Brudno M, Ramani AK. MetaFusion: A high-confidence metacaller for filtering and prioritizing RNA-seq gene fusion candidates. Bioinformatics 2021; 37:3144-3151. [PMID: 33944895 DOI: 10.1093/bioinformatics/btab249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2020] [Revised: 03/04/2021] [Accepted: 05/03/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Current fusion detection tools use diverse calling approaches and provide varying results, making selection of the appropriate tool challenging. Ensemble fusion calling techniques appear promising; however, current options have limited accessibility and function. RESULTS MetaFusion is a flexible meta-calling tool that amalgamates outputs from any number of fusion callers. Individual caller results are standardized by conversion into the new file type Common Fusion Format (CFF). Calls are annotated, merged using graph clustering, filtered, and ranked to provide a final output of high confidence candidates. MetaFusion consistently achieves higher precision and recall than individual callers on real and simulated datasets, and reaches up to 100% precision, indicating that ensemble calling is imperative for high confidence results. MetaFusion uses FusionAnnotator to annotate calls with information from cancer fusion databases, and is provided with a benchmarking toolkit to calibrate new callers. AVAILABILITY MetaFusion is freely available at https://github.com/ccmbioinfo/MetaFusion. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michael Apostolides
- Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada
| | - Yue Jiang
- Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada
| | - Mia Husić
- Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada
| | - Robert Siddaway
- The Arthur and Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, ON, Canada
| | - Cynthia Hawkins
- The Arthur and Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada.,Division of Pathology, The Hospital for Sick Children, Toronto, ON, Canada
| | - Andrei L Turinsky
- Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada
| | - Michael Brudno
- Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada.,Genetics and Genome Biology Program, The Hospital for Sick Children, Toronto, ON, Canada.,Department of Computer Science, University of Toronto, Toronto, ON, Canada.,University Health Network, Toronto, ON, Canada
| | - Arun K Ramani
- Centre for Computational Medicine, The Hospital For Sick Children, Toronto, ON, Canada
| |
Collapse
|
28
|
Jones W, Gong B, Novoradovskaya N, Li D, Kusko R, Richmond TA, Johann DJ, Bisgin H, Sahraeian SME, Bushel PR, Pirooznia M, Wilkins K, Chierici M, Bao W, Basehore LS, Lucas AB, Burgess D, Butler DJ, Cawley S, Chang CJ, Chen G, Chen T, Chen YC, Craig DJ, Del Pozo A, Foox J, Francescatto M, Fu Y, Furlanello C, Giorda K, Grist KP, Guan M, Hao Y, Happe S, Hariani G, Haseley N, Jasper J, Jurman G, Kreil DP, Łabaj P, Lai K, Li J, Li QZ, Li Y, Li Z, Liu Z, López MS, Miclaus K, Miller R, Mittal VK, Mohiyuddin M, Pabón-Peña C, Parsons BL, Qiu F, Scherer A, Shi T, Stiegelmeyer S, Suo C, Tom N, Wang D, Wen Z, Wu L, Xiao W, Xu C, Yu Y, Zhang J, Zhang Y, Zhang Z, Zheng Y, Mason CE, Willey JC, Tong W, Shi L, Xu J. A verified genomic reference sample for assessing performance of cancer panels detecting small variants of low allele frequency. Genome Biol 2021; 22:111. [PMID: 33863366 PMCID: PMC8051128 DOI: 10.1186/s13059-021-02316-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 03/18/2021] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Oncopanel genomic testing, which identifies important somatic variants, is increasingly common in medical practice and especially in clinical trials. Currently, there is a paucity of reliable genomic reference samples having a suitably large number of pre-identified variants for properly assessing oncopanel assay analytical quality and performance. The FDA-led Sequencing and Quality Control Phase 2 (SEQC2) consortium analyze ten diverse cancer cell lines individually and their pool, termed Sample A, to develop a reference sample with suitably large numbers of coding positions with known (variant) positives and negatives for properly evaluating oncopanel analytical performance. RESULTS In reference Sample A, we identify more than 40,000 variants down to 1% allele frequency with more than 25,000 variants having less than 20% allele frequency with 1653 variants in COSMIC-related genes. This is 5-100× more than existing commercially available samples. We also identify an unprecedented number of negative positions in coding regions, allowing statistical rigor in assessing limit-of-detection, sensitivity, and precision. Over 300 loci are randomly selected and independently verified via droplet digital PCR with 100% concordance. Agilent normal reference Sample B can be admixed with Sample A to create new samples with a similar number of known variants at much lower allele frequency than what exists in Sample A natively, including known variants having allele frequency of 0.02%, a range suitable for assessing liquid biopsy panels. CONCLUSION These new reference samples and their admixtures provide superior capability for performing oncopanel quality control, analytical accuracy, and validation for small to large oncopanels and liquid biopsy assays.
Collapse
Affiliation(s)
- Wendell Jones
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA.
| | - Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Rebecca Kusko
- Immuneering Corporation, One Broadway, 14th Floor, Cambridge, MA, 02142, USA
| | - Todd A Richmond
- Market & Application Development Bioinformatics, Roche Sequencing Solutions Inc., 4300 Hacienda Dr., Pleasanton, CA, 94588, USA
| | - Donald J Johann
- Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, 4301 W Markham St., Little Rock, AR, 72205, USA
| | - Halil Bisgin
- Department of Computer Science, Engineering and Physics, University of Michigan-Flint, Flint, MI, 48502, USA
| | - Sayed Mohammad Ebrahim Sahraeian
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Rd., Suite 7 #300, Belmont, CA, 94002, USA
| | - Pierre R Bushel
- National Institute of Environmental Health Sciences, Research Triangle Park, Durham, NC, 27709, USA
| | - Mehdi Pirooznia
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Katherine Wilkins
- Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | | | - Wenjun Bao
- JMP Life Sciences, SAS Institute Inc., Cary, NC, 27519, USA
| | - Lee Scott Basehore
- Agilent Technologies, 11011 N Torrey Pines Rd., La Jolla, CA, 92037, USA
| | | | - Daniel Burgess
- (formerly) Research and Development, Roche Sequencing Solutions Inc., 500 South Rosa Rd., Madison, WI, 53719, USA
| | - Daniel J Butler
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | - Simon Cawley
- (formerly) Clinical Sequencing Division, Thermo Fisher Scientific, 180 Oyster Point Blvd., South San Francisco, CA, 94080, USA
| | - Chia-Jung Chang
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
| | - Guangchun Chen
- Department of Immunology, Genomics and Microarray Core Facility, University of Texas Southwestern Medical Center, 5323 Harry Hine Blvd., Dallas, TX, 75390, USA
| | - Tao Chen
- University of Texas Southwestern Medical Center, 2330 Inwood Rd., Dallas, TX, 75390, USA
| | - Yun-Ching Chen
- Bioinformatics and Computational Biology Laboratory, National Heart Lung and Blood Institute, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Daniel J Craig
- Department of Medicine, College of Medicine and Life Sciences, The University of Toledo, Toledo, OH, 43614, USA
| | - Angela Del Pozo
- Institute of Medical and Molecular Genetics (INGEMM), Hospital Universitario La Paz, CIBERER Instituto de Salud Carlos III, 28046, Madrid, Spain
| | - Jonathan Foox
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | | | - Yutao Fu
- Thermo Fisher Scientific, 110 Miller Ave., Ann Arbor, MI, 48104, USA
| | | | - Kristina Giorda
- Marketing, Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA, 52241, USA
| | - Kira P Grist
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA
| | - Meijian Guan
- JMP Life Sciences, SAS Institute Inc., Cary, NC, 27519, USA
| | - Yingyi Hao
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Scott Happe
- Agilent Technologies, 1834 State Hwy 71 West, Cedar Creek, TX, 78612, USA
| | - Gunjan Hariani
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA
| | - Nathan Haseley
- Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA
| | - Jeff Jasper
- Q2 Solutions - EA Genomics, 5927 S Miami Blvd., Morrisville, NC, 27560, USA
| | | | - David Philip Kreil
- Bioinformatics Research, Institute of Molecular Biotechnology, Boku University Vienna, Vienna, Austria
| | - Paweł Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Department of Biotechnology, Boku University, Vienna, Austria
| | - Kevin Lai
- Bioinformatics, Integrated DNA Technologies, Inc., 1710 Commercial Park, Coralville, IA, 52241, USA
| | - Jianying Li
- Kelly Government Solutions, Inc., Research Triangle Park, NC, 27709, USA
| | - Quan-Zhen Li
- Department of Immunology, Genomics and Microarray Core Facility, University of Texas Southwestern Medical Center, 5323 Harry Hine Blvd., Dallas, TX, 75390, USA
| | - Yulong Li
- Center of Genome and Personalized Medicine, Institute of Cancer Stem Cell, Dalian Medical University, Dalian, Liaoning, China
| | - Zhiguang Li
- Center of Genome and Personalized Medicine, Institute of Cancer Stem Cell, Dalian Medical University, Dalian, Liaoning, China
| | - Zhichao Liu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Mario Solís López
- Institute of Medical and Molecular Genetics (INGEMM), Hospital Universitario La Paz, CIBERER Instituto de Salud Carlos III, 28046, Madrid, Spain
- EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081, HZ, Amsterdam, The Netherlands
| | - Kelci Miclaus
- JMP Life Sciences, SAS Institute Inc., Cary, NC, 27519, USA
| | - Raymond Miller
- Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | - Vinay K Mittal
- Thermo Fisher Scientific, 110 Miller Ave., Ann Arbor, MI, 48104, USA
| | - Marghoob Mohiyuddin
- Bioinformatics Research & Early Development, Roche Sequencing Solutions Inc., 1301 Shoreway Rd., Suite 7 #300, Belmont, CA, 94002, USA
| | - Carlos Pabón-Peña
- Agilent Technologies, 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | - Barbara L Parsons
- Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Fujun Qiu
- Research and Development, Burning Rock Biotech, Shanghai, 201114, China
| | - Andreas Scherer
- EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081, HZ, Amsterdam, The Netherlands
- Institute for Molecular Medicine Finland (FIMM), Nordic EMBL Partnership for Molecular Medicine, HiLIFE Unit, Biomedicum Helsinki 2U (D302b), FI-00014 University of Helsinki, P.O. Box 20 (Tukholmankatu 8), Helsinki, Finland
| | - Tieliu Shi
- Center for Bioinformatics and Computational Biology, and the Institute of Biomedical Sciences, School of Life Sciences, East China Normal University, 500 Dongchuan Rd, Shanghai, 200241, China
| | - Suzy Stiegelmeyer
- University of North Carolina Health, 101 Manning Drive, Chapel Hill, NC, 27514, USA
| | - Chen Suo
- Department of Epidemiology, School of Public Health, Fudan University, Shanghai, China
| | - Nikola Tom
- EATRIS ERIC- European Infrastructure for Translational Medicine, De Boelelaan 1118, 1081, HZ, Amsterdam, The Netherlands
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Kamenice 5, 625 00, Brno, Czech Republic
| | - Dong Wang
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu, 610064, Sichuan, China
| | - Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Chang Xu
- Research and Development, QIAGEN Sciences Inc., Frederick, MD, 21703, USA
| | - Ying Yu
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
| | - Jiyang Zhang
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
| | - Yifan Zhang
- University of Arkansas at Little Rock, Little Rock, AR, 72204, USA
| | - Zhihong Zhang
- Research and Development, Burning Rock Biotech, Shanghai, 201114, China
| | - Yuanting Zheng
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA
| | - James C Willey
- Departments of Medicine, Pathology, and Cancer Biology, College of Medicine and Life Sciences, University of Toledo Health Sciences Campus, 3000 Arlington Ave, Toledo, OH, 43614, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Leming Shi
- State Key Laboratory of Genetic Engineering, School of Life Sciences and Shanghai Cancer Hospital/Cancer Institute, Fudan University, Shanghai, 200438, China
- Human Phenome Institute, Fudan University, Shanghai, 201203, China
- Fudan-Gospel Joint Research Center for Precision Medicine, Fudan University, Shanghai, 200438, China
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
29
|
Li Z, Fang S, Zhang R, Yu L, Zhang J, Bu D, Sun L, Zhao Y, Li J. VarBen. J Mol Diagn 2021; 23:285-299. [DOI: 10.1016/j.jmoldx.2020.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 10/06/2020] [Accepted: 11/17/2020] [Indexed: 02/08/2023] Open
|
30
|
Bai J, Shi J, Li C, Wang S, Zhang T, Hua X, Zhu B, Koka H, Wu HH, Song L, Wang D, Wang M, Zhou W, Ballew BJ, Zhu B, Hicks B, Mirabello L, Parry DM, Zhai Y, Li M, Du J, Wang J, Zhang S, Liu Q, Zhao P, Gui S, Goldstein AM, Zhang Y, Yang XR. Whole genome sequencing of skull-base chordoma reveals genomic alterations associated with recurrence and chordoma-specific survival. Nat Commun 2021; 12:757. [PMID: 33536423 PMCID: PMC7859411 DOI: 10.1038/s41467-021-21026-5] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2020] [Accepted: 01/06/2021] [Indexed: 02/06/2023] Open
Abstract
Chordoma is a rare bone tumor with an unknown etiology and high recurrence rate. Here we conduct whole genome sequencing of 80 skull-base chordomas and identify PBRM1, a SWI/SNF (SWItch/Sucrose Non-Fermentable) complex subunit gene, as a significantly mutated driver gene. Genomic alterations in PBRM1 (12.5%) and homozygous deletions of the CDKN2A/2B locus are the most prevalent events. The combination of PBRM1 alterations and the chromosome 22q deletion, which involves another SWI/SNF gene (SMARCB1), shows strong associations with poor chordoma-specific survival (Hazard ratio [HR] = 10.55, 95% confidence interval [CI] = 2.81-39.64, p = 0.001) and recurrence-free survival (HR = 4.30, 95% CI = 2.34-7.91, p = 2.77 × 10-6). Despite the low mutation rate, extensive somatic copy number alterations frequently occur, most of which are clonal and showed highly concordant profiles between paired primary and recurrence/metastasis samples, indicating their importance in chordoma initiation. In this work, our findings provide important biological and clinical insights into skull-base chordoma.
Collapse
Affiliation(s)
- Jiwei Bai
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Chuzhong Li
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China
| | - Shuai Wang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
| | - Tongwu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Xing Hua
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Hela Koka
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Ho-Hsiang Wu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Lei Song
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Difei Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Mingyi Wang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Weiyin Zhou
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bari J Ballew
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Bin Zhu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
- Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA
| | - Lisa Mirabello
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Dilys M Parry
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Yixuan Zhai
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Mingxuan Li
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
| | - Jiang Du
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China
| | - Junmei Wang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China
| | - Shuheng Zhang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
- Department of Neurosurgery, Anshan Central Hospital, Anshan, China
| | - Qian Liu
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China
| | - Peng Zhao
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Songbai Gui
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Alisa M Goldstein
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| | - Yazhuo Zhang
- Beijing Neurosurgical Institute, Capital Medical University, Beijing, China.
- Department of Neurosurgery, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing, China.
- Brain Tumor Center, Beijing Institute for Brain Disorders, Beijing, China.
| | - Xiaohong R Yang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, DHHS, Bethesda, MD, USA
| |
Collapse
|
31
|
Sherafat E, Force J, Măndoiu II. Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy. BMC Bioinformatics 2020; 21:498. [PMID: 33375939 PMCID: PMC7772914 DOI: 10.1186/s12859-020-03813-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 10/13/2020] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND Personalized cancer vaccines are emerging as one of the most promising approaches to immunotherapy of advanced cancers. However, only a small proportion of the neoepitopes generated by somatic DNA mutations in cancer cells lead to tumor rejection. Since it is impractical to experimentally assess all candidate neoepitopes prior to vaccination, developing accurate methods for predicting tumor-rejection mediating neoepitopes (TRMNs) is critical for enabling routine clinical use of cancer vaccines. RESULTS In this paper we introduce Positive-unlabeled Learning using AuTOml (PLATO), a general semi-supervised approach to improving accuracy of model-based classifiers. PLATO generates a set of high confidence positive calls by applying a stringent filter to model-based predictions, then rescores remaining candidates by using positive-unlabeled learning. To achieve robust performance on clinical samples with large patient-to-patient variation, PLATO further integrates AutoML hyper-parameter tuning, classification threshold selection based on spies, and support for bootstrapping. CONCLUSIONS Experimental results on real datasets demonstrate that PLATO has improved performance compared to model-based approaches for two key steps in TRMN prediction, namely somatic variant calling from exome sequencing data and peptide identification from MS/MS data.
Collapse
Affiliation(s)
- Elham Sherafat
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, 06269, USA
| | - Jordan Force
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, 06269, USA
| | - Ion I Măndoiu
- Computer Science and Engineering Department, University of Connecticut, Storrs, CT, 06269, USA.
| |
Collapse
|
32
|
Meng J, Victor B, He Z, Liu H, Jiang T. DeepSSV: detecting somatic small variants in paired tumor and normal sequencing data with convolutional neural network. Brief Bioinform 2020; 22:5960414. [PMID: 33164053 DOI: 10.1093/bib/bbaa272] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/05/2020] [Accepted: 09/19/2020] [Indexed: 01/16/2023] Open
Abstract
It is of considerable interest to detect somatic mutations in paired tumor and normal sequencing data. A number of callers that are based on statistical or machine learning approaches have been developed to detect somatic small variants. However, they take into consideration only limited information about the reference and potential variant allele in both tumor and normal samples at a candidate somatic site. Also, they differ in how biological and technological noises are addressed. Hence, they are expected to produce divergent outputs. To overcome the drawbacks of existing somatic callers, we develop a deep learning-based tool called DeepSSV, which employs a convolutional neural network (CNN) model to learn increasingly abstract feature representations from the raw data in higher feature layers. DeepSSV creates a spatially oriented representation of read alignments around the candidate somatic sites adapted for the convolutional architecture, which enables it to expand to effectively gather scattered evidence. Moreover, DeepSSV incorporates the mapping information of both reference allele-supporting and variant allele-supporting reads in the tumor and normal samples at a genomic site that are readily available in the pileup format file. Together, the CNN model can process the whole alignment information. Such representational richness allows the model to capture the dependencies in the sequence and identify context-based sequencing artifacts. We fitted the model on ground truth somatic mutations and did benchmarking experiments on simulated and real tumors. The benchmarking results demonstrate that DeepSSV outperforms its state-of-the-art competitors in overall F1 score.
Collapse
Affiliation(s)
- Jing Meng
- Suzhou Institute of Systems Medicine, Center for Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, Jiangsu, China
| | | | - Zhen He
- La Trobe University, Melbourne, Victoria, Australia
| | | | - Taijiao Jiang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
33
|
Koboldt DC. Best practices for variant calling in clinical sequencing. Genome Med 2020; 12:91. [PMID: 33106175 PMCID: PMC7586657 DOI: 10.1186/s13073-020-00791-w] [Citation(s) in RCA: 149] [Impact Index Per Article: 37.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2019] [Accepted: 10/08/2020] [Indexed: 02/08/2023] Open
Abstract
Next-generation sequencing technologies have enabled a dramatic expansion of clinical genetic testing both for inherited conditions and diseases such as cancer. Accurate variant calling in NGS data is a critical step upon which virtually all downstream analysis and interpretation processes rely. Just as NGS technologies have evolved considerably over the past 10 years, so too have the software tools and approaches for detecting sequence variants in clinical samples. In this review, I discuss the current best practices for variant calling in clinical sequencing studies, with a particular emphasis on trio sequencing for inherited disorders and somatic mutation detection in cancer patients. I describe the relative strengths and weaknesses of panel, exome, and whole-genome sequencing for variant detection. Recommended tools and strategies for calling variants of different classes are also provided, along with guidance on variant review, validation, and benchmarking to ensure optimal performance. Although NGS technologies are continually evolving, and new capabilities (such as long-read single-molecule sequencing) are emerging, the “best practice” principles in this review should be relevant to clinical variant calling in the long term.
Collapse
Affiliation(s)
- Daniel C Koboldt
- Steve and Cindy Rasmussen Institute for Genomic Medicine at Nationwide Children's Hospital, Columbus, OH, USA. .,Department of Pediatrics, The Ohio State University, Columbus, OH, USA.
| |
Collapse
|
34
|
Bhuyan MSI, Pe'er I, Rahman MS. SICaRiO: short indel call filtering with boosting. Brief Bioinform 2020; 22:5917082. [PMID: 33003198 DOI: 10.1093/bib/bbaa238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 08/26/2020] [Accepted: 08/27/2020] [Indexed: 11/14/2022] Open
Abstract
Despite impressive improvement in the next-generation sequencing technology, reliable detection of indels is still a difficult endeavour. Recognition of true indels is of prime importance in many applications, such as personalized health care, disease genomics and population genetics. Recently, advanced machine learning techniques have been successfully applied to classification problems with large-scale data. In this paper, we present SICaRiO, a gradient boosting classifier for the reliable detection of true indels, trained with the gold-standard dataset from 'Genome in a Bottle' (GIAB) consortium. Our filtering scheme significantly improves the performance of each variant calling pipeline used in GIAB and beyond. SICaRiO uses genomic features that can be computed from publicly available resources, i.e. it does not require sequencing pipeline-specific information (e.g. read depth). This study also sheds lights on prior genomic contexts responsible for the erroneous calling of indels made by sequencing pipelines. We have compared prediction difficulty for three categories of indels over different sequencing pipelines. We have also ranked genomic features according to their predictivity in determining false positives.
Collapse
Affiliation(s)
- Md Shariful Islam Bhuyan
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| | - Itsik Pe'er
- Department of Computer Science, Fu Foundation School of Engineering, and the Chair at the Center for Health Analytics, Data Science Institute, Columbia University, New York, USA
| | - M Sohel Rahman
- Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
35
|
Wang M, Luo W, Jones K, Bian X, Williams R, Higson H, Wu D, Hicks B, Yeager M, Zhu B. SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach. Sci Rep 2020; 10:12898. [PMID: 32732891 PMCID: PMC7393490 DOI: 10.1038/s41598-020-69772-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 07/16/2020] [Indexed: 02/06/2023] Open
Abstract
It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.
Collapse
Affiliation(s)
- Mingyi Wang
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA.
| | - Wen Luo
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Kristine Jones
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Xiaopeng Bian
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, 20850, USA
| | - Russell Williams
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Herbert Higson
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Dongjing Wu
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Belynda Hicks
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Meredith Yeager
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Bin Zhu
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA.
| |
Collapse
|
36
|
Wu C, Zhao X, Welsh M, Costello K, Cao K, Abou Tayoun A, Li M, Sarmady M. Using Machine Learning to Identify True Somatic Variants from Next-Generation Sequencing. Clin Chem 2020; 66:239-246. [PMID: 31672855 DOI: 10.1373/clinchem.2019.308213] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Accepted: 08/19/2019] [Indexed: 12/25/2022]
Abstract
BACKGROUND Molecular profiling has become essential for tumor risk stratification and treatment selection. However, cancer genome complexity and technical artifacts make identification of real variants a challenge. Currently, clinical laboratories rely on manual screening, which is costly, subjective, and not scalable. We present a machine learning-based method to distinguish artifacts from bona fide single-nucleotide variants (SNVs) detected by next-generation sequencing from nonformalin-fixed paraffin-embedded tumor specimens. METHODS A cohort of 11278 SNVs identified through clinical sequencing of tumor specimens was collected and divided into training, validation, and test sets. Each SNV was manually inspected and labeled as either real or artifact as part of clinical laboratory workflow. A 3-class (real, artifact, and uncertain) model was developed on the training set, fine-tuned with the validation set, and then evaluated on the test set. Prediction intervals reflecting the certainty of the classifications were derived during the process to label "uncertain" variants. RESULTS The optimized classifier demonstrated 100% specificity and 97% sensitivity over 5587 SNVs of the test set. Overall, 1252 of 1341 true-positive variants were identified as real, 4143 of 4246 false-positive calls were deemed artifacts, whereas only 192 (3.4%) SNVs were labeled as "uncertain," with zero misclassification between the true positives and artifacts in the test set. CONCLUSIONS We presented a computational classifier to identify variant artifacts detected from tumor sequencing. Overall, 96.6% of the SNVs received definitive labels and thus were exempt from manual review. This framework could improve quality and efficiency of the variant review process in clinical laboratories.
Collapse
Affiliation(s)
- Chao Wu
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Xiaonan Zhao
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Mark Welsh
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | | | - Kajia Cao
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA
| | - Ahmad Abou Tayoun
- Department of Genetics, Al Jalila Children's Specialty Hospital, Dubai, UAE
| | - Marilyn Li
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA
| | - Mahdi Sarmady
- Division of Genomic Diagnostics, The Children's Hospital of Philadelphia, Philadelphia, PA.,Center for Data-Driven Discovery in Biomedicine, Children's Hospital of Philadelphia, Philadelphia, PA.,Department of Pathology & Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA
| |
Collapse
|
37
|
Huang W, Guo YA, Muthukumar K, Baruah P, Chang MM, Jacobsen Skanderup A. SMuRF: portable and accurate ensemble prediction of somatic mutations. Bioinformatics 2020; 35:3157-3159. [PMID: 30649191 PMCID: PMC6735703 DOI: 10.1093/bioinformatics/btz018] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2018] [Revised: 11/26/2018] [Accepted: 01/07/2019] [Indexed: 12/22/2022] Open
Abstract
Summary Somatic Mutation calling method using a Random Forest (SMuRF) integrates predictions and auxiliary features from multiple somatic mutation callers using a supervised machine learning approach. SMuRF is trained on community-curated matched tumor and normal whole genome sequencing data. SMuRF predicts both SNVs and indels with high accuracy in genome or exome-level sequencing data. Furthermore, the method is robust across multiple tested cancer types and predicts low allele frequency variants with high accuracy. In contrast to existing ensemble-based somatic mutation calling approaches, SMuRF works out-of-the-box and is orders of magnitudes faster. Availability and implementation The method is implemented in R and available at https://github.com/skandlab/SMuRF. SMuRF operates as an add-on to the community-developed bcbio-nextgen somatic variant calling pipeline. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weitai Huang
- Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore.,Graduate School of Integrative Sciences and Engineering, National University of Singapore, Singapore, Singapore
| | - Yu Amanda Guo
- Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore
| | - Karthik Muthukumar
- Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore
| | - Probhonjon Baruah
- Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore
| | - Mei Mei Chang
- Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore
| | - Anders Jacobsen Skanderup
- Department of Computational and Systems Biology, Agency for Science Technology and Research, Genome Institute of Singapore, Singapore, Singapore
| |
Collapse
|
38
|
Cao C, Mak L, Jin G, Gordon P, Ye K, Long Q. PRESM: personalized reference editor for somatic mutation discovery in cancer genomics. Bioinformatics 2020; 35:1445-1452. [PMID: 30247633 DOI: 10.1093/bioinformatics/bty812] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 08/27/2018] [Accepted: 09/19/2018] [Indexed: 12/16/2022] Open
Abstract
MOTIVATION Accurate detection of somatic mutations is a crucial step toward understanding cancer. Various tools have been developed to detect somatic mutations from cancer genome sequencing data by mapping reads to a universal reference genome and inferring likelihoods from complex statistical models. However, read mapping is frequently obstructed by mismatches between germline and somatic mutations on a read and the reference genome. Previous attempts to develop personalized genome tools are not compatible with downstream statistical models for somatic mutation detection. RESULTS We present PRESM, a tool that builds personalized reference genomes by integrating germline mutations into the reference genome. The aforementioned obstacle is circumvented by using a two-step germline substitution procedure, maintaining positional fidelity using an innovative workaround. Reads derived from tumor tissue can be positioned more accurately along a personalized reference than a universal reference due to the reduced genetic distance between the subject (tumor genome) and the target (the personalized genome). Application of PRESM's personalized genome reduced false-positive (FP) somatic mutation calls by as much as 55.5%, and facilitated the discovery of a novel somatic point mutation on a germline insertion in PDE1A, a phosphodiesterase associated with melanoma. Moreover, all improvements in calling accuracy were achieved without parameter optimization, as PRESM itself is parameter-free. Hence, similar increases in read mapping and decreases in the FP rate will persist when PRESM-built genomes are applied to any user-provided dataset. AVAILABILITY AND IMPLEMENTATION The software is available at https://github.com/precisionomics/PRESM. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chen Cao
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Lauren Mak
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Guangxu Jin
- Department of Cancer Biology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Paul Gordon
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| | - Kai Ye
- Department of Bioinformatics, Electronic and Information Engineering School, Xi'an Jiaotong University, Xi'an, China
| | - Quan Long
- Departments of Biochemistry & Molecular Biology and Medical Genetics, Alberta Children's Hospital Research Institute, University of Calgary, Calgary, Canada
| |
Collapse
|
39
|
KaramiNejadRanjbar M, Sharifzadeh S, Wietek NC, Artibani M, El-Sahhar S, Sauka-Spengler T, Yau C, Tresp V, Ahmed AA. A highly accurate platform for clone-specific mutation discovery enables the study of active mutational processes. eLife 2020; 9:55207. [PMID: 32255426 PMCID: PMC7228773 DOI: 10.7554/elife.55207] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/07/2020] [Indexed: 12/14/2022] Open
Abstract
Bulk whole genome sequencing (WGS) enables the analysis of tumor evolution but, because of depth limitations, can only identify old mutational events. The discovery of current mutational processes for predicting the tumor’s evolutionary trajectory requires dense sequencing of individual clones or single cells. Such studies, however, are inherently problematic because of the discovery of excessive false positive (FP) mutations when sequencing picogram quantities of DNA. Data pooling to increase the confidence in the discovered mutations, moves the discovery back in the past to a common ancestor. Here we report a robust WGS and analysis pipeline (DigiPico/MutLX) that virtually eliminates all F results while retaining an excellent proportion of true positives. Using our method, we identified, for the first time, a hyper-mutation (kataegis) event in a group of ∼30 cancer cells from a recurrent ovarian carcinoma. This was unidentifiable from the bulk WGS data. Overall, we propose DigiPico/MutLX method as a powerful framework for the identification of clone-specific variants at an unprecedented accuracy.
Collapse
Affiliation(s)
- Mohammad KaramiNejadRanjbar
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom.,Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | | | - Nina C Wietek
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom.,Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Mara Artibani
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom.,Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Salma El-Sahhar
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom.,Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, United Kingdom
| | - Tatjana Sauka-Spengler
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom.,Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Christopher Yau
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Volker Tresp
- Ludwig Maximilian University of Munich, Munich, Germany.,Siemens AG, Corporate Technology, Munich, Germany
| | - Ahmed A Ahmed
- Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, United Kingdom.,Nuffield Department of Women's & Reproductive Health, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
40
|
Wu L, Deng Q, Xu Z, Zhou S, Li C, Li YX. A novel virtual barcode strategy for accurate panel-wide variant calling in circulating tumor DNA. BMC Bioinformatics 2020; 21:127. [PMID: 32245364 PMCID: PMC7118954 DOI: 10.1186/s12859-020-3412-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Accepted: 02/12/2020] [Indexed: 01/19/2023] Open
Abstract
Background Hybrid capture-based next-generation sequencing of DNA has been widely applied in the detection of circulating tumor DNA (ctDNA). Various methods have been proposed for ctDNA detection, but low-allelic-fraction (AF) variants are still a great challenge. In addition, no panel-wide calling algorithm is available, which hiders the full usage of ctDNA based ‘liquid biopsy’. Thus, we developed the VBCALAVD (Virtual Barcode-based Calling Algorithm for Low Allelic Variant Detection) in silico to overcome these limitations. Results Based on the understanding of the nature of ctDNA fragmentation, a novel platform-independent virtual barcode strategy was established to eliminate random sequencing errors by clustering sequencing reads into virtual families. Stereotypical mutant-family-level background artifacts were polished by constructing AF distributions. Three additional robust fine-tuning filters were obtained to eliminate stochastic mutant-family-level noises. The performance of our algorithm was validated using cell-free DNA reference standard samples (cfDNA RSDs) and normal healthy cfDNA samples (cfDNA controls). For the RSDs with AFs of 0.1, 0.2, 0.5, 1 and 5%, the mean F1 scores were 0.43 (0.25~0.56), 0.77, 0.92, 0.926 (0.86~1.0) and 0.89 (0.75~1.0), respectively, which indicates that the proposed approach significantly outperforms the published algorithms. Among controls, no false positives were detected. Meanwhile, characteristics of mutant-family-level noise and quantitative determinants of divergence between mutant-family-level noises from controls and RSDs were clearly depicted. Conclusions Due to its good performance in the detection of low-AF variants, our algorithm will greatly facilitate the noninvasive panel-wide detection of ctDNA in research and clinical settings. The whole pipeline is available at https://github.com/zhaodalv/VBCALAVD.
Collapse
Affiliation(s)
- Leilei Wu
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Qinfang Deng
- Department of Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 200433, China
| | - Ze Xu
- Smartquerier Biomedicine, Shanghai, 201203, China
| | - Songwen Zhou
- Department of Oncology, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai, 200433, China.
| | - Chao Li
- Smartquerier Biomedicine, Shanghai, 201203, China. .,Shanghai Center for Bioinformation Technology, Shanghai, 201203, China.
| | - Yi-Xue Li
- School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China. .,Shanghai Center for Bioinformation Technology, Shanghai, 201203, China. .,CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, 200031, China.
| |
Collapse
|
41
|
David R. The promise of toxicogenomics for genetic toxicology: past, present and future. Mutagenesis 2020; 35:153-159. [PMID: 32087008 DOI: 10.1093/mutage/geaa007] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2019] [Accepted: 02/10/2020] [Indexed: 01/10/2023] Open
Abstract
Toxicogenomics, the application of genomics to toxicology, was described as 'a new era' for toxicology. Standard toxicity tests typically involve a number of short-term bioassays that are costly, time consuming, require large numbers of animals and generally focus on a single end point. Toxicogenomics was heralded as a way to improve the efficiency of toxicity testing by assessing gene regulation across the genome, allowing rapid classification of compounds based on characteristic expression profiles. Gene expression microarrays could measure and characterise genome-wide gene expression changes in a single study and while transcriptomic profiles that can discriminate between genotoxic and non-genotoxic carcinogens have been identified, challenges with the approach limited its application. As such, toxicogenomics did not transform the field of genetic toxicology in the way it was predicted. More recently, next generation sequencing (NGS) technologies have revolutionised genomics owing to the fact that hundreds of billions of base pairs can be sequenced simultaneously cheaper and quicker than traditional Sanger methods. In relation to genetic toxicology, and thousands of cancer genomes have been sequenced with single-base substitution mutational signatures identified, and mutation signatures have been identified following treatment of cells with known or suspected environmental carcinogens. RNAseq has been applied to detect transcriptional changes following treatment with genotoxins; modified RNAseq protocols have been developed to identify adducts in the genome and Duplex sequencing is an example of a technique that has recently been developed to accurately detect mutation. Machine learning, including MutationSeq and SomaticSeq, has also been applied to somatic mutation detection and improvements in automation and/or the application of machine learning algorithms may allow high-throughput mutation sequencing in the future. This review will discuss the initial promise of transcriptomics for genetic toxicology, and how the development of NGS technologies and new machine learning algorithms may finally realise that promise.
Collapse
Affiliation(s)
- Rhiannon David
- Clinical Pharmacology and Safety Sciences, R&D, AstraZeneca, Cambridge, UK
| |
Collapse
|
42
|
Mannakee BK, Gutenkunst RN. BATCAVE: calling somatic mutations with a tumor- and site-specific prior. NAR Genom Bioinform 2020; 2:lqaa004. [PMID: 32051931 PMCID: PMC7003682 DOI: 10.1093/nargab/lqaa004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 01/13/2020] [Accepted: 01/23/2020] [Indexed: 02/06/2023] Open
Abstract
Detecting somatic mutations withins tumors is key to understanding treatment resistance, patient prognosis and tumor evolution. Mutations at low allelic frequency, those present in only a small portion of tumor cells, are particularly difficult to detect. Many algorithms have been developed to detect such mutations, but none models a key aspect of tumor biology. Namely, every tumor has its own profile of mutation types that it tends to generate. We present BATCAVE (Bayesian Analysis Tools for Context-Aware Variant Evaluation), an algorithm that first learns the individual tumor mutational profile and mutation rate then uses them in a prior for evaluating potential mutations. We also present an R implementation of the algorithm, built on the popular caller MuTect. Using simulations, we show that adding the BATCAVE algorithm to MuTect improves variant detection. It also improves the calibration of posterior probabilities, enabling more principled tradeoff between precision and recall. We also show that BATCAVE performs well on real data. Our implementation is computationally inexpensive and straightforward to incorporate into existing MuTect pipelines. More broadly, the algorithm can be added to other variant callers, and it can be extended to include additional biological features that affect mutation generation.
Collapse
Affiliation(s)
- Brian K Mannakee
- Mel and Enid Zuckerman College of Public Health, University of Arizona, Tucson, AZ 85721, USA
| | - Ryan N Gutenkunst
- Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
43
|
Amin SB, Anderson KJ, Boudreau CE, Martinez-Ledesma E, Kocakavuk E, Johnson KC, Barthel FP, Varn FS, Kassab C, Ling X, Kim H, Barter M, Lau CC, Ngan CY, Chapman M, Koehler JW, Long JP, Miller AD, Miller CR, Porter BF, Rissi DR, Mazcko C, LeBlanc AK, Dickinson PJ, Packer RA, Taylor AR, Rossmeisl JH, Woolard KD, Heimberger AB, Levine JM, Verhaak RGW. Comparative Molecular Life History of Spontaneous Canine and Human Gliomas. Cancer Cell 2020; 37:243-257.e7. [PMID: 32049048 PMCID: PMC7132629 DOI: 10.1016/j.ccell.2020.01.004] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/14/2019] [Revised: 11/15/2019] [Accepted: 01/10/2020] [Indexed: 02/08/2023]
Abstract
Sporadic gliomas in companion dogs provide a window on the interaction between tumorigenic mechanisms and host environment. We compared the molecular profiles of canine gliomas with those of human pediatric and adult gliomas to characterize evolutionarily conserved mammalian mutational processes in gliomagenesis. Employing whole-genome, exome, transcriptome, and methylation sequencing of 83 canine gliomas, we found alterations shared between canine and human gliomas such as the receptor tyrosine kinases, TP53 and cell-cycle pathways, and IDH1 R132. Canine gliomas showed high similarity with human pediatric gliomas per robust aneuploidy, mutational rates, relative timing of mutations, and DNA-methylation patterns. Our cross-species comparative genomic analysis provides unique insights into glioma etiology and the chronology of glioma-causing somatic alterations.
Collapse
Affiliation(s)
- Samirkumar B Amin
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Kevin J Anderson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - C Elizabeth Boudreau
- Department of Small Animal Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Avenue Morones Prieto 3000, Monterrey, Nuevo Leon 64710, Mexico; Department of Neuro-Oncology, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Emre Kocakavuk
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; DKFZ Division of Translational Neurooncology at the West German Cancer Center (WTZ), German Cancer Consortium (DKTK) Partner Site & Department of Neurosurgery, University Hospital Essen, Essen, Germany
| | - Kevin C Johnson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Floris P Barthel
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Frederick S Varn
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Cynthia Kassab
- Department of Neurosurgery, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Xiaoyang Ling
- Department of Neurosurgery, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Hoon Kim
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Mary Barter
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | - Ching C Lau
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA; Connecticut Children's Medical Center, Hartford, CT 06106, USA; University of Connecticut School of Medicine, Farmington, CT 06032, USA
| | - Chew Yee Ngan
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Margaret Chapman
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Jennifer W Koehler
- Department of Pathobiology, College of Veterinary Medicine, Auburn University, Auburn, AL, USA
| | - James P Long
- Department of Neurosurgery, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Department of Biostatistics, the University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andrew D Miller
- Department of Biomedical Sciences, Section of Anatomic Pathology, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA
| | - C Ryan Miller
- Departments of Pathology and Laboratory Medicine, Neurology, and Pharmacology, Lineberger Comprehensive Cancer Center and Neuroscience Center, University of North Carolina School of Medicine, Chapel Hill, NC, USA
| | - Brian F Porter
- Department of Veterinary Pathobiology, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX, USA
| | - Daniel R Rissi
- Department of Pathology and Athens Veterinary Diagnostic Laboratory, College of Veterinary Medicine, University of Georgia, Athens, GA, USA
| | - Christina Mazcko
- Comparative Oncology Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Amy K LeBlanc
- Comparative Oncology Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Peter J Dickinson
- Department of Surgical and Radiological Sciences, UC Davis School of Veterinary Medicine, Davis, CA, USA
| | - Rebecca A Packer
- Department of Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO, USA
| | - Amanda R Taylor
- Auburn University College of Veterinary Medicine, Auburn, AL, USA
| | | | - Kevin D Woolard
- Department of Surgical and Radiological Sciences, UC Davis School of Veterinary Medicine, Davis, CA, USA
| | - Amy B Heimberger
- Department of Neurosurgery, the University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Jonathan M Levine
- Department of Small Animal Clinical Sciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843, USA
| | - Roel G W Verhaak
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA.
| |
Collapse
|
44
|
Bush SJ, Foster D, Eyre DW, Clark EL, De Maio N, Shaw LP, Stoesser N, Peto TEA, Crook DW, Walker AS. Genomic diversity affects the accuracy of bacterial single-nucleotide polymorphism-calling pipelines. Gigascience 2020; 9:giaa007. [PMID: 32025702 PMCID: PMC7002876 DOI: 10.1093/gigascience/giaa007] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 12/02/2019] [Accepted: 01/15/2020] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Accurately identifying single-nucleotide polymorphisms (SNPs) from bacterial sequencing data is an essential requirement for using genomics to track transmission and predict important phenotypes such as antimicrobial resistance. However, most previous performance evaluations of SNP calling have been restricted to eukaryotic (human) data. Additionally, bacterial SNP calling requires choosing an appropriate reference genome to align reads to, which, together with the bioinformatic pipeline, affects the accuracy and completeness of a set of SNP calls obtained. This study evaluates the performance of 209 SNP-calling pipelines using a combination of simulated data from 254 strains of 10 clinically common bacteria and real data from environmentally sourced and genomically diverse isolates within the genera Citrobacter, Enterobacter, Escherichia, and Klebsiella. RESULTS We evaluated the performance of 209 SNP-calling pipelines, aligning reads to genomes of the same or a divergent strain. Irrespective of pipeline, a principal determinant of reliable SNP calling was reference genome selection. Across multiple taxa, there was a strong inverse relationship between pipeline sensitivity and precision, and the Mash distance (a proxy for average nucleotide divergence) between reads and reference genome. The effect was especially pronounced for diverse, recombinogenic bacteria such as Escherichia coli but less dominant for clonal species such as Mycobacterium tuberculosis. CONCLUSIONS The accuracy of SNP calling for a given species is compromised by increasing intra-species diversity. When reads were aligned to the same genome from which they were sequenced, among the highest-performing pipelines was Novoalign/GATK. By contrast, when reads were aligned to particularly divergent genomes, the highest-performing pipelines often used the aligners NextGenMap or SMALT, and/or the variant callers LoFreq, mpileup, or Strelka.
Collapse
Affiliation(s)
- Stephen J Bush
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Dona Foster
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - David W Eyre
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK
| | - Nicola De Maio
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SH, UK
| | - Liam P Shaw
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Nicole Stoesser
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Tim E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Health Research Protection Unit in Healthcare Associated Infections and Antimicrobial Resistance at University of Oxford in partnership with Public Health England, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
- National Institute for Health Research Oxford Biomedical Research Centre, Oxford, John Radcliffe Hospital, Headington, Oxford, OX3 9DU, UK
| |
Collapse
|
45
|
Oota S. Somatic mutations - Evolution within the individual. Methods 2019; 176:91-98. [PMID: 31711929 DOI: 10.1016/j.ymeth.2019.11.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Revised: 10/31/2019] [Accepted: 11/07/2019] [Indexed: 02/08/2023] Open
Abstract
With the rapid advancement of sequencing technologies over the last two decades, it is becoming feasible to detect rare variants from somatic tissue samples. Studying such somatic mutations can provide deep insights into various senescence-related diseases, including cancer, inflammation, and sporadic psychiatric disorders. While it is still a difficult task to identify true somatic mutations, relentless efforts to combine experimental and computational methods have made it possible to obtain reliable data. Furthermore, state-of-the-art machine learning approaches have drastically improved the efficiency and sensitivity of these methods. Meanwhile, we can regard somatic mutations as a counterpart of germline mutations, and it is possible to apply well-formulated mathematical frameworks developed for population genetics and molecular evolution to analyze this 'somatic evolution'. For example, retrospective cell lineage tracing is a promising technique to elucidate the mechanism of pre-diseases using single-cell RNA-sequencing (scRNA-seq) data.
Collapse
Affiliation(s)
- Satoshi Oota
- Image Processing Research Team, Center for Advanced Photonics, RIKEN, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| |
Collapse
|
46
|
Bartha Á, Győrffy B. Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. Cancers (Basel) 2019; 11:E1725. [PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.
Collapse
Affiliation(s)
- Áron Bartha
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| | - Balázs Győrffy
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| |
Collapse
|
47
|
Wood DE, White JR, Georgiadis A, Van Emburgh B, Parpart-Li S, Mitchell J, Anagnostou V, Niknafs N, Karchin R, Papp E, McCord C, LoVerso P, Riley D, Diaz LA, Jones S, Sausen M, Velculescu VE, Angiuoli SV. A machine learning approach for somatic mutation discovery. Sci Transl Med 2019; 10:10/457/eaar7939. [PMID: 30185652 DOI: 10.1126/scitranslmed.aar7939] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Revised: 05/26/2018] [Accepted: 08/16/2018] [Indexed: 12/19/2022]
Abstract
Variability in the accuracy of somatic mutation detection may affect the discovery of alterations and the therapeutic management of cancer patients. To address this issue, we developed a somatic mutation discovery approach based on machine learning that outperformed existing methods in identifying experimentally validated tumor alterations (sensitivity of 97% versus 90 to 99%; positive predictive value of 98% versus 34 to 92%). Analysis of paired tumor-normal exome data from 1368 TCGA (The Cancer Genome Atlas) samples using this method revealed concordance for 74% of mutation calls but also identified likely false-positive and false-negative changes in TCGA data, including in clinically actionable genes. Determination of high-quality somatic mutation calls improved tumor mutation load-based predictions of clinical outcome for melanoma and lung cancer patients previously treated with immune checkpoint inhibitors. Integration of high-quality machine learning mutation detection in clinical next-generation sequencing (NGS) analyses increased the accuracy of test results compared to other clinical sequencing analyses. These analyses provide an approach for improved identification of tumor-specific mutations and have important implications for research and clinical management of cancer patients.
Collapse
Affiliation(s)
| | - James R White
- Personal Genome Diagnostics, Baltimore, MD 21224, USA
| | | | | | | | | | - Valsamo Anagnostou
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Noushin Niknafs
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
| | - Rachel Karchin
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.,Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Eniko Papp
- Personal Genome Diagnostics, Baltimore, MD 21224, USA
| | | | - Peter LoVerso
- Personal Genome Diagnostics, Baltimore, MD 21224, USA
| | - David Riley
- Personal Genome Diagnostics, Baltimore, MD 21224, USA
| | - Luis A Diaz
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Siân Jones
- Personal Genome Diagnostics, Baltimore, MD 21224, USA
| | - Mark Sausen
- Personal Genome Diagnostics, Baltimore, MD 21224, USA
| | - Victor E Velculescu
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
| | | |
Collapse
|
48
|
Peng M, Mo Y, Wang Y, Wu P, Zhang Y, Xiong F, Guo C, Wu X, Li Y, Li X, Li G, Xiong W, Zeng Z. Neoantigen vaccine: an emerging tumor immunotherapy. Mol Cancer 2019; 18:128. [PMID: 31443694 PMCID: PMC6708248 DOI: 10.1186/s12943-019-1055-6] [Citation(s) in RCA: 390] [Impact Index Per Article: 78.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 08/14/2019] [Indexed: 12/24/2022] Open
Abstract
Genetic instability of tumor cells often leads to the occurrence of a large number of mutations, and expression of non-synonymous mutations can produce tumor-specific antigens called neoantigens. Neoantigens are highly immunogenic as they are not expressed in normal tissues. They can activate CD4+ and CD8+ T cells to generate immune response and have the potential to become new targets of tumor immunotherapy. The development of bioinformatics technology has accelerated the identification of neoantigens. The combination of different algorithms to identify and predict the affinity of neoantigens to major histocompatibility complexes (MHCs) or the immunogenicity of neoantigens is mainly based on the whole-exome sequencing technology. Tumor vaccines targeting neoantigens mainly include nucleic acid, dendritic cell (DC)-based, tumor cell, and synthetic long peptide (SLP) vaccines. The combination with immune checkpoint inhibition therapy or radiotherapy and chemotherapy might achieve better therapeutic effects. Currently, several clinical trials have demonstrated the safety and efficacy of these vaccines. Further development of sequencing technologies and bioinformatics algorithms, as well as an improvement in our understanding of the mechanisms underlying tumor development, will expand the application of neoantigen vaccines in the future.
Collapse
Affiliation(s)
- Miao Peng
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Yongzhen Mo
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Yian Wang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Pan Wu
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Yijie Zhang
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Fang Xiong
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Can Guo
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Xu Wu
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Yong Li
- DEPARTMENT OF MEDICINE, Comprehensive Cancer Center Baylor College of Medicine, Alkek Building, RM N720, Houston, Texas, USA
| | - Xiaoling Li
- Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China
| | - Guiyuan Li
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Wei Xiong
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China.,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Zhaoyang Zeng
- NHC Key Laboratory of Carcinogenesis and Hunan Key Laboratory of Translational Radiation Oncology, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China. .,Key Laboratory of Carcinogenesis and Cancer Invasion of the Chinese Ministry of Education, Cancer Research Institute, Central South University, Changsha, Hunan, China. .,Hunan Key Laboratory of Nonresolving Inflammation and Cancer, Disease Genome Research Center, the Third Xiangya Hospital, Central South University, Changsha, Hunan, China.
| |
Collapse
|
49
|
Anzar I, Sverchkova A, Stratford R, Clancy T. NeoMutate: an ensemble machine learning framework for the prediction of somatic mutations in cancer. BMC Med Genomics 2019; 12:63. [PMID: 31096972 PMCID: PMC6524241 DOI: 10.1186/s12920-019-0508-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Accepted: 04/22/2019] [Indexed: 12/30/2022] Open
Abstract
Background The accurate screening of tumor genomic landscapes for somatic mutations using high-throughput sequencing involves a crucial step in precise clinical diagnosis and targeted therapy. However, the complex inherent features of cancer tissue, especially, tumor genetic intra-heterogeneity coupled with the problem of sequencing and alignment artifacts, makes somatic variant calling a challenging task. Current variant filtering strategies, such as rule-based filtering and consensus voting of different algorithms, have previously helped to increase specificity, although comes at the cost of sensitivity. Methods In light of this, we have developed the NeoMutate framework which incorporates 7 supervised machine learning (ML) algorithms to exploit the strengths of multiple variant callers, using a non-redundant set of biological and sequence features. We benchmarked NeoMutate by simulating more than 10,000 bona fide cancer-related mutations into three well-characterized Genome in a Bottle (GIAB) reference samples. Results A robust and exhaustive evaluation of NeoMutate’s performance based on 5-fold cross validation experiments, in addition to 3 independent tests, demonstrated a substantially improved variant detection accuracy compared to any of its individual composite variant callers and consensus calling of multiple tools. Conclusions We show here that integrating multiple tools in an ensemble ML layer optimizes somatic variant detection rates, leading to a potentially improved variant selection framework for the diagnosis and treatment of cancer. Electronic supplementary material The online version of this article (10.1186/s12920-019-0508-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Irantzu Anzar
- OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379, Oslo, Norway
| | - Angelina Sverchkova
- OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379, Oslo, Norway
| | - Richard Stratford
- OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379, Oslo, Norway
| | - Trevor Clancy
- OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379, Oslo, Norway.
| |
Collapse
|
50
|
Calling Variants in the Clinic: Informed Variant Calling Decisions Based on Biological, Clinical, and Laboratory Variables. Comput Struct Biotechnol J 2019; 17:561-569. [PMID: 31049166 PMCID: PMC6482431 DOI: 10.1016/j.csbj.2019.04.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Revised: 03/12/2019] [Accepted: 04/03/2019] [Indexed: 01/10/2023] Open
Abstract
Deep sequencing genomic analysis is becoming increasingly common in clinical research and practice, enabling accurate identification of diagnostic, prognostic, and predictive determinants. Variant calling, distinguishing between true mutations and experimental errors, is a central task of genomic analysis and often requires sophisticated statistical, computational, and/or heuristic techniques. Although variant callers seek to overcome noise inherent in biological experiments, variant calling can be significantly affected by outside factors including those used to prepare, store, and analyze samples. The goal of this review is to discuss known experimental features, such as sample preparation, library preparation, and sequencing, alongside diverse biological and clinical variables, and evaluate their effect on variant caller selection and optimization.
Collapse
|