1
|
Ergun MA, Cinal O, Bakışlı B, Emül AA, Baysan M. COSAP: Comparative Sequencing Analysis Platform. BMC Bioinformatics 2024; 25:130. [PMID: 38532317 DOI: 10.1186/s12859-024-05756-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 03/20/2024] [Indexed: 03/28/2024] Open
Abstract
BACKGROUND Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.
Collapse
Affiliation(s)
- Mehmet Arif Ergun
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Omer Cinal
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Berkant Bakışlı
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Abdullah Asım Emül
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey
| | - Mehmet Baysan
- Department of Computer Engineering, Istanbul Technical University, 34469, Istanbul, Turkey.
| |
Collapse
|
2
|
Choon YW, Choon YF, Nasarudin NA, Al Jasmi F, Remli MA, Alkayali MH, Mohamad MS. Artificial intelligence and database for NGS-based diagnosis in rare disease. Front Genet 2024; 14:1258083. [PMID: 38371307 PMCID: PMC10870236 DOI: 10.3389/fgene.2023.1258083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 11/24/2023] [Indexed: 02/20/2024] Open
Abstract
Rare diseases (RDs) are rare complex genetic diseases affecting a conservative estimate of 300 million people worldwide. Recent Next-Generation Sequencing (NGS) studies are unraveling the underlying genetic heterogeneity of this group of diseases. NGS-based methods used in RDs studies have improved the diagnosis and management of RDs. Concomitantly, a suite of bioinformatics tools has been developed to sort through big data generated by NGS to understand RDs better. However, there are concerns regarding the lack of consistency among different methods, primarily linked to factors such as the lack of uniformity in input and output formats, the absence of a standardized measure for predictive accuracy, and the regularity of updates to the annotation database. Today, artificial intelligence (AI), particularly deep learning, is widely used in a variety of biological contexts, changing the healthcare system. AI has demonstrated promising capabilities in boosting variant calling precision, refining variant prediction, and enhancing the user-friendliness of electronic health record (EHR) systems in NGS-based diagnostics. This paper reviews the state of the art of AI in NGS-based genetics, and its future directions and challenges. It also compare several rare disease databases.
Collapse
Affiliation(s)
- Yee Wen Choon
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
- Faculty of Data Science and Informatics, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
| | - Yee Fan Choon
- Faculty of Dentistry, Lincoln University College, Petaling Jaya, Selangor, Malaysia
| | - Nurul Athirah Nasarudin
- Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Fatma Al Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| | - Muhamad Akmal Remli
- Institute for Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
- Faculty of Data Science and Informatics, Universiti Malaysia Kelantan, Kota Bharu, Kelantan, Malaysia
| | | | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medicine and Health Sciences, United Arab Emirates University, Al Ain, United Arab Emirates
| |
Collapse
|
3
|
Anzar I, Malone B, Samarakoon P, Vardaxis I, Simovski B, Fontenelle H, Meza-Zepeda LA, Stratford R, Keung EZ, Burgess M, Tawbi HA, Myklebost O, Clancy T. The interplay between neoantigens and immune cells in sarcomas treated with checkpoint inhibition. Front Immunol 2023; 14:1226445. [PMID: 37799721 PMCID: PMC10548483 DOI: 10.3389/fimmu.2023.1226445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 07/10/2023] [Indexed: 10/07/2023] Open
Abstract
Introduction Sarcomas are comprised of diverse bone and connective tissue tumors with few effective therapeutic options for locally advanced unresectable and/or metastatic disease. Recent advances in immunotherapy, in particular immune checkpoint inhibition (ICI), have shown promising outcomes in several cancer indications. Unfortunately, ICI therapy has provided only modest clinical responses and seems moderately effective in a subset of the diverse subtypes. Methods To explore the immune parameters governing ICI therapy resistance or immune escape, we performed whole exome sequencing (WES) on tumors and their matched normal blood, in addition to RNA-seq from tumors of 31 sarcoma patients treated with pembrolizumab. We used advanced computational methods to investigate key immune properties, such as neoantigens and immune cell composition in the tumor microenvironment (TME). Results A multifactorial analysis suggested that expression of high quality neoantigens in the context of specific immune cells in the TME are key prognostic markers of progression-free survival (PFS). The presence of several types of immune cells, including T cells, B cells and macrophages, in the TME were associated with improved PFS. Importantly, we also found the presence of both CD8+ T cells and neoantigens together was associated with improved survival compared to the presence of CD8+ T cells or neoantigens alone. Interestingly, this trend was not identified with the combined presence of CD8+ T cells and TMB; suggesting that a combined CD8+ T cell and neoantigen effect on PFS was important. Discussion The outcome of this study may inform future trials that may lead to improved outcomes for sarcoma patients treated with ICI.
Collapse
Affiliation(s)
- Irantzu Anzar
- Oslo Cancer Cluster, NEC OncoImmunity AS, Oslo, Norway
- Institute of Clinical Medicine, University of Oslo, Oslo, Norway
| | | | | | | | | | | | - Leonardo A. Meza-Zepeda
- Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Genomics Core Facility, Department of Core Facilities, Oslo University Hospital, Oslo, Norway
| | | | - Emily Z. Keung
- Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Melissa Burgess
- Department of Medical Oncology, University of Pittsburgh Medical Center, Pittsburgh, PA, United States
| | - Hussein A. Tawbi
- Department of Melanoma Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, United States
| | - Ola Myklebost
- Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Trevor Clancy
- Oslo Cancer Cluster, NEC OncoImmunity AS, Oslo, Norway
| |
Collapse
|
4
|
Ellingsen EB, O'Day S, Mezheyeuski A, Gromadka A, Clancy T, Kristedja TS, Milhem M, Zakharia Y. Clinical Activity of Combined Telomerase Vaccination and Pembrolizumab in Advanced Melanoma: Results from a Phase I Trial. Clin Cancer Res 2023; 29:3026-3036. [PMID: 37378632 PMCID: PMC10425723 DOI: 10.1158/1078-0432.ccr-23-0416] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/27/2023] [Accepted: 06/20/2023] [Indexed: 06/29/2023]
Abstract
PURPOSE Cancer vaccines represent a novel treatment modality with a complementary mode of action addressing a crucial bottleneck for checkpoint inhibitor (CPI) efficacy. CPIs are expected to release brakes in T-cell responses elicited by vaccination, leading to more robust immune responses. Increased antitumor T-cell responses may confer increased antitumor activity in patients with less immunogenic tumors, a subgroup expected to achieve reduced benefit from CPIs alone. In this trial, a telomerase-based vaccine was combined with pembrolizumab to assess the safety and clinical activity in patients with melanoma. PATIENTS AND METHODS Thirty treatment-naïve patients with advanced melanoma were enrolled. Patients received intradermal injections of UV1 with adjuvant GM-CSF at two dose levels, and pembrolizumab according to the label. Blood samples were assessed for vaccine-induced T-cell responses, and tumor tissues were collected for translational analyses. The primary endpoint was safety, with secondary objectives including progression-free survival (PFS), overall survival (OS), and objective response rate (ORR). RESULTS The combination was considered safe and well-tolerated. Grade 3 adverse events were observed in 20% of patients, with no grade 4 or 5 adverse events reported. Vaccination-related adverse events were mostly mild injection site reactions. The median PFS was 18.9 months, and the 1- and 2-year OS rates were 86.7% and 73.3%, respectively. The ORR was 56.7%, with 33.3% achieving complete responses. Vaccine-induced immune responses were observed in evaluable patients, and inflammatory changes were detected in posttreatment biopsies. CONCLUSIONS Encouraging safety and preliminary efficacy were observed. Randomized phase II trials are currently ongoing.
Collapse
Affiliation(s)
- Espen B. Ellingsen
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Faculty of Medicine, University of Oslo, Oslo, Norway
- Ultimovacs ASA, Oslo, Norway
| | - Steven O'Day
- Providence Saint John's Cancer Institute, Santa Monica, California
| | | | | | | | | | | | - Yousef Zakharia
- University of Iowa and Holden Comprehensive Cancer Center, Iowa City, Iowa
| |
Collapse
|
5
|
Trevarton AJ, Chang JT, Symmans WF. Simple combination of multiple somatic variant callers to increase accuracy. Sci Rep 2023; 13:8463. [PMID: 37231022 DOI: 10.1038/s41598-023-34925-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 05/10/2023] [Indexed: 05/27/2023] Open
Abstract
Publications comparing variant caller algorithms present discordant results with contradictory rankings. Caller performances are inconsistent and wide ranging, and dependent upon input data, application, parameter settings, and evaluation metric. With no single variant caller emerging as a superior standard, combinations or ensembles of variant callers have appeared in the literature. In this study, a whole genome somatic reference standard was used to derive principles to guide strategies for combining variant calls. Then, manually annotated variants called from the whole exome sequencing of a tumor were used to corroborate these general principles. Finally, we examined the ability of these principles to reduce noise in targeted sequencing.
Collapse
Affiliation(s)
- Alexander J Trevarton
- School of Biological Sciences, Faculty of Science, University of Auckland, Auckland, New Zealand.
| | - Jeffrey T Chang
- Department of Integrative Biology and Pharmacology, The University of Texas Health Sciences Center, Houston, USA
| | - W Fraser Symmans
- Department of Translational Molecular Pathology, The University of Texas MD Anderson Cancer Center, Houston, USA
| |
Collapse
|
6
|
A Multimodal Ensemble Driven by Multiobjective Optimisation to Predict Overall Survival in Non-Small-Cell Lung Cancer. J Imaging 2022; 8:jimaging8110298. [DOI: 10.3390/jimaging8110298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/28/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022] Open
Abstract
Lung cancer accounts for more deaths worldwide than any other cancer disease. In order to provide patients with the most effective treatment for these aggressive tumours, multimodal learning is emerging as a new and promising field of research that aims to extract complementary information from the data of different modalities for prognostic and predictive purposes. This knowledge could be used to optimise current treatments and maximise their effectiveness. To predict overall survival, in this work, we investigate the use of multimodal learning on the CLARO dataset, which includes CT images and clinical data collected from a cohort of non-small-cell lung cancer patients. Our method allows the identification of the optimal set of classifiers to be included in the ensemble in a late fusion approach. Specifically, after training unimodal models on each modality, it selects the best ensemble by solving a multiobjective optimisation problem that maximises both the recognition performance and the diversity of the predictions. In the ensemble, the labels of each sample are assigned using the majority voting rule. As further validation, we show that the proposed ensemble outperforms the models learning a single modality, obtaining state-of-the-art results on the task at hand.
Collapse
|
7
|
Xing M, Zhang Y, Yu H, Yang Z, Li X, Li Q, Zhao Y, Zhao Z, Luo Y. Predict DLBCL patients' recurrence within two years with Gaussian mixture model cluster oversampling and multi-kernel learning. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107103. [PMID: 36088813 DOI: 10.1016/j.cmpb.2022.107103] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 08/05/2022] [Accepted: 08/30/2022] [Indexed: 06/15/2023]
Abstract
BACKGROUND AND OBJECTIVE Diffuse large B-cell lymphoma (DLBCL) is common in adults' non-Hodgkin's lymphoma. Relapse mainly occurs within two years after diagnosis and has a poor prognosis. Relapse after two years is less frequent and has a better prognosis. In this work, we constructed a relapse prediction model for diffuse large B-cell lymphoma patients within two years, expecting to provide a reference for Clinicians to implement individualized treatment. METHOD We propose a secondary-level class imbalance method based on Gaussian mixture model (GMM) clustering resampling to balance the data. Then use a multi-kernel support vector machine(SVM) to inscribe heterogeneous clinical data. Finally, merging them to identify recurrence patients within two years. RESULTS Among all the class imbalance methods in this work, Inverse Weighted -GMM +SMOTEENN has the best performance. Compared with NO-GMM (Directl use the SMOTEENN without the GMM clustering process), its Area Under the ROC Curve(AUC) increases by 8.75%, and ECE and brier scores decrease 2.07% and 3.09%, respectively. Among the four classification algorithms in this work, Multiple kernel learning (MKL) has the most minimized brier scores and expected calibration error(ECE), the largest AUC, accuracy, Recall, precision and F1, has the best discrimination and calibration. CONCLUSION Our inverse weighted -GMM+SMOTEENN+MKL (GMM-SENN-MKL) method can handle data class imbalance and clinical heterogeneity data well and can be used to predict recurrence in DLBCL patients.
Collapse
Affiliation(s)
- Meng Xing
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Yanbo Zhang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Hongmei Yu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Zhenhuan Yang
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Xueling Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Qiong Li
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Yanlin Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China
| | - Zhiqiang Zhao
- Department of Hematology, Shanxi Cancer Hospital, Taiyuan, China.
| | - Yanhong Luo
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China; Shanxi Provincial Key Laboratory of Major Diseases Risk Assessment, Taiyuan, China.
| |
Collapse
|
8
|
Garcia-Prieto CA, Martínez-Jiménez F, Valencia A, Porta-Pardo E. Detection of oncogenic and clinically actionable mutations in cancer genomes critically depends on variant calling tools. Bioinformatics 2022; 38:3181-3191. [PMID: 35512388 PMCID: PMC9191211 DOI: 10.1093/bioinformatics/btac306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 02/09/2022] [Accepted: 05/01/2022] [Indexed: 11/22/2022] Open
Abstract
Motivation The analysis of cancer genomes provides fundamental information about its etiology, the processes driving cell transformation or potential treatments. While researchers and clinicians are often only interested in the identification of oncogenic mutations, actionable variants or mutational signatures, the first crucial step in the analysis of any tumor genome is the identification of somatic variants in cancer cells (i.e. those that have been acquired during their evolution). For that purpose, a wide range of computational tools have been developed in recent years to detect somatic mutations in sequencing data from tumor samples. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Results Here, we quantify the impact of variant calling decisions by comparing the results obtained in three important analyses of cancer genomics data (identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants) when changing the somatic variant caller (MuSE, MuTect2, SomaticSniper and VarScan2) or the strategy to combine them (Consensus of two, Consensus of three and Union) across all 33 cancer types from The Cancer Genome Atlas. Our results show that variant calling decisions have a significant impact on these analyses, creating important differences that could even impact treatment decisions for some patients. Moreover, the Consensus of three calling strategy to combine the output of multiple variant calling tools, a very widely used strategy by the research community, can lead to the loss of some cancer driver genes and actionable mutations. Overall, our results highlight the limitations of widespread practices within the cancer genomics community and point to important differences in critical analyses of tumor sequencing data depending on variant calling, affecting even the identification of clinically actionable variants. Availability and implementation Code is available at https://github.com/carlosgarciaprieto/VariantCallingClinicalBenchmark. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos A Garcia-Prieto
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Francisco Martínez-Jiménez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Alfonso Valencia
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Eduard Porta-Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| |
Collapse
|
9
|
Anzar I, Sverchkova A, Samarakoon P, Ellingsen EB, Gaudernack G, Stratford R, Clancy T. Personalized
HLA
typing leads to the discovery of novel
HLA
alleles and tumor‐specific
HLA
variants. HLA 2022; 99:313-327. [PMID: 35073457 PMCID: PMC9546058 DOI: 10.1111/tan.14562] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/08/2022] [Accepted: 01/21/2022] [Indexed: 11/29/2022]
Abstract
Accurate and full‐length typing of the HLA region is important in many clinical and research settings. With the advent of next generation sequencing (NGS), several HLA typing algorithms have been developed, including many that are applicable to whole exome sequencing (WES). However, most of these solutions operate by providing the closest‐matched HLA allele among the known alleles in IPD‐IMGT/HLA Database. These database‐matching approaches have demonstrated very high performance when typing well characterized HLA alleles. However, as they rely on the completeness of the HLA database, they are not optimal for detecting novel or less well characterized alleles. Furthermore, the database‐matching approaches are also not adequate in the context of cancer, where a comprehensive characterization of somatic HLA variation and expression patterns of a tumor's HLA locus may guide therapy and clinical outcome, because of the pivotal role HLA alleles play in tumor antigen recognition and immune escape. Here, we describe a personalized HLA typing approach applied to WES data that leverages the strengths of database‐matching approaches while simultaneously allowing for the discovery of novel HLA alleles and tumor‐specific HLA variants, through the systematic integration of germline and somatic variant calling. We applied this approach on WES from 10 metastatic melanoma patients and validated the HLA typing results using HLA targeted NGS sequencing from patients where at least one HLA germline candidate was detected on Class I HLA. Targeted NGS sequencing confirmed 100% performance for the 1st and 2nd fields. In total, five out of the six detected HLA germline variants were because of Class I ambiguities at the third or fourth fields, and their detection recovered the correct HLA allele genotype. The sixth germline variant let to the formal discovery of a novel Class I allele. Finally, we demonstrated a substantially improved somatic variant detection accuracy in HLA alleles with a 91% of success rate in simulated experiments. The approach described here may allow the field to genotype more accurately using WES data, leading to the discovery of novel HLA alleles and help characterize the relationship between somatic variation in the HLA region and immunosurveillance.
Collapse
Affiliation(s)
- Irantzu Anzar
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | - Angelina Sverchkova
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | - Pubudu Samarakoon
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | | | - Gustav Gaudernack
- Ultimovacs ASA, Oslo Cancer Cluster, Ullernchausseen 64/66 Oslo Norway
| | - Richard Stratford
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| | - Trevor Clancy
- NEC OncoImmunity AS, Oslo Cancer Cluster, Ullernchausseen 64/66, 0379 Oslo Norway
| |
Collapse
|
10
|
Tarabichi M, Demeulemeester J, Verfaillie A, Flanagan AM, Van Loo P, Konopka T. A pan-cancer landscape of somatic mutations in non-unique regions of the human genome. Nat Biotechnol 2021; 39:1589-1596. [PMID: 34282324 PMCID: PMC7612106 DOI: 10.1038/s41587-021-00971-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 06/02/2021] [Indexed: 12/27/2022]
Abstract
A substantial fraction of the human genome displays high sequence similarity with at least one other genomic sequence, posing a challenge for the identification of somatic mutations from short-read sequencing data. Here we annotate genomic variants in 2,658 cancers from the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort with links to similar sites across the human genome. We train a machine learning model to use signals distributed over multiple genomic sites to call somatic events in non-unique regions and validate the data against linked-read sequencing in an independent dataset. Using this approach, we uncover previously hidden mutations in ~1,700 coding sequences and in thousands of regulatory elements, including in known cancer genes, immunoglobulins and highly mutated gene families. Mutations in non-unique regions are consistent with mutations in unique regions in terms of mutation burden and substitution profiles. The analysis provides a systematic summary of the mutation events in non-unique regions at a genome-wide scale across multiple human cancers.
Collapse
Affiliation(s)
- Maxime Tarabichi
- The Francis Crick Institute, London, UK.
- Institute for Interdisciplinary Research, Université Libre de Bruxelles, Brussels, Belgium.
| | - Jonas Demeulemeester
- The Francis Crick Institute, London, UK
- Department of Human Genetics, KU Leuven, Leuven, Belgium
| | | | - Adrienne M Flanagan
- Research Department of Pathology, Cancer Institute, University College London, London, UK
- Department of Cellular and Molecular Pathology, Royal National Orthopaedic Hospital NHS Trust, Stanmore, UK
| | | | - Tomasz Konopka
- The Francis Crick Institute, London, UK.
- William Harvey Research Institute, Queen Mary University of London, London, UK.
| |
Collapse
|
11
|
Farswan A, Jena L, Kaur G, Gupta A, Gupta R, Rani L, Sharma A, Kumar L. Branching clonal evolution patterns predominate mutational landscape in multiple myeloma. Am J Cancer Res 2021; 11:5659-5679. [PMID: 34873486 PMCID: PMC8640818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Accepted: 09/27/2021] [Indexed: 06/13/2023] Open
Abstract
Multiple Myeloma (MM) arises from malignant transformation and deregulated proliferation of clonal plasma cells (PCs) harbouring heterogeneous molecular anomalies. The effect of evolving mutations on clone fitness and their cellular prevalence shapes the progressing myeloma genome and impacts clinical outcomes. Although clonal heterogeneity in MM is well established, which subclonal mutations emerge/persist/perish with progression in MM and which of these can be targeted therapeutically remains an open question. In line with this, we have sequenced pairwise whole exomes of 62 MM patients collected at two time points, i.e., at diagnosis and on progression. Somatic variants were called using a novel ensemble approach where a consensus was deduced from four variant callers (Illumina's Dragen, Strelka2, SomaticSniper and SpeedSeq) and actionable/druggable gene targets were identified. A marked intraclonal heterogeneity was observed. Branching evolution was observed among 72.58% patients, of whom 64.51% had low TMBs (<10) and 61.29% had 2 or more founder clones. The hypermutator patients (with high TMB levels ≥10 to ≤100) showed a significant decrease in their TMBs from diagnosis (median TMB 77.11) to progression (median TMB 31.22). A distinct temporal fall in subclonal driver mutations was identified recurrently across diagnosis to progression e.g., in PABPC1, BRAF, KRAS, CR1, DIS3 and ATM genes in 3 or more patients suggesting such patients could be treated early with target specific drugs like Vemurafenib/Cobimetinib. An analogous rise in driver mutations was observed in KMT2C, FOXD4L1, SP140, NRAS and other genes. A few drivers such as FAT4, IGLL5 and CDKN1A retained consistent distribution patterns at two time points. These findings are clinically relevant and point at consideration of evaluating multi time point subclonal mutational landscapes for designing better risk stratification strategies and tailoring time to time risk adapted combination therapies in future.
Collapse
Affiliation(s)
- Akanksha Farswan
- SBILab, Department of Electronics and Communication Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-D)Delhi 110020, India
| | - Lingaraja Jena
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Gurvinder Kaur
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Anubha Gupta
- SBILab, Department of Electronics and Communication Engineering, Indraprastha Institute of Information Technology-Delhi (IIIT-D)Delhi 110020, India
| | - Ritu Gupta
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Lata Rani
- Laboratory Oncology Unit, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Atul Sharma
- Department of Medical Oncology, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| | - Lalit Kumar
- Department of Medical Oncology, Dr. B.R.A. IRCH, All India Institute of Medical Sciences (AIIMS)New Delhi 110029, India
| |
Collapse
|
12
|
Benchmarking pipelines for subclonal deconvolution of bulk tumour sequencing data. Nat Commun 2021; 12:6396. [PMID: 34737285 PMCID: PMC8569188 DOI: 10.1038/s41467-021-26698-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 10/20/2021] [Indexed: 11/09/2022] Open
Abstract
Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use.
Collapse
|
13
|
|
14
|
Nagy M, Radakovich N, Nazha A. Machine Learning in Oncology: What Should Clinicians Know? JCO Clin Cancer Inform 2021; 4:799-810. [PMID: 32926637 DOI: 10.1200/cci.20.00049] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The volume and complexity of scientific and clinical data in oncology have grown markedly over recent years, including but not limited to the realms of electronic health data, radiographic and histologic data, and genomics. This growth holds promise for a deeper understanding of malignancy and, accordingly, more personalized and effective oncologic care. Such goals require, however, the development of new methods to fully make use of the wealth of available data. Improvements in computer processing power and algorithm development have positioned machine learning, a branch of artificial intelligence, to play a prominent role in oncology research and practice. This review provides an overview of the basics of machine learning and highlights current progress and challenges in applying this technology to cancer diagnosis, prognosis, and treatment recommendations, including a discussion of current takeaways for clinicians.
Collapse
Affiliation(s)
- Matthew Nagy
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Nathan Radakovich
- Cleveland Clinic Lerner College of Medicine of Case Western Reserve University, Cleveland, OH
| | - Aziz Nazha
- Center for Clinical Artificial Intelligence, Cleveland Clinic, Cleveland, OH.,Department of Hematology and Medical Oncology, Cleveland Clinic, Cleveland, OH
| |
Collapse
|
15
|
Khanna A, Larson DE, Srivatsan SN, Mosior M, Abbott TE, Kiwala S, Ley TJ, Duncavage EJ, Walter MJ, Walker JR, Griffith OL, Griffith M, Miller CA. Bam-readcount - rapid generation of basepair-resolution sequence metrics. ARXIV 2021:arXiv:2107.12817v1. [PMID: 34341766 PMCID: PMC8328062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Bam-readcount is a utility for generating low-level information about sequencing data at specific nucleotide positions. Originally designed to help filter genomic mutation calls, the metrics it outputs are useful as input for variant detection tools and for resolving ambiguity between variant callers1,2. In addition, it has found broad applicability in diverse fields including tumor evolution, single-cell genomics, climate change ecology, and tracking community spread of SARS-CoV-2.3-6.
Collapse
Affiliation(s)
- Ajay Khanna
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
| | - David E. Larson
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO
- Current Affiliation: Benson Hill, Inc. St. Louis, MO
| | | | - Matthew Mosior
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
- Current Affiliation: Moffitt Cancer Center, Tampa, FL
| | - Travis E. Abbott
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO
- Current Affiliation: Google, Inc. Mountain View, CA
| | - Susanna Kiwala
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO
| | - Timothy J. Ley
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO
| | - Eric J. Duncavage
- Department of Pathology, Washington University School of Medicine, St. Louis, MO
| | - Matthew J. Walter
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO
| | - Jason R. Walker
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO
| | - Obi L. Griffith
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO
- Department of Genetics, Washington University School of Medicine, St. Louis, MO
| | - Malachi Griffith
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
- McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO
- Department of Genetics, Washington University School of Medicine, St. Louis, MO
| | - Christopher A. Miller
- Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO
- Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO
| |
Collapse
|
16
|
Iqbal MJ, Javed Z, Sadia H, Qureshi IA, Irshad A, Ahmed R, Malik K, Raza S, Abbas A, Pezzani R, Sharifi-Rad J. Clinical applications of artificial intelligence and machine learning in cancer diagnosis: looking into the future. Cancer Cell Int 2021; 21:270. [PMID: 34020642 PMCID: PMC8139146 DOI: 10.1186/s12935-021-01981-1] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 05/13/2021] [Indexed: 02/06/2023] Open
Abstract
Artificial intelligence (AI) is the use of mathematical algorithms to mimic human cognitive abilities and to address difficult healthcare challenges including complex biological abnormalities like cancer. The exponential growth of AI in the last decade is evidenced to be the potential platform for optimal decision-making by super-intelligence, where the human mind is limited to process huge data in a narrow time range. Cancer is a complex and multifaced disorder with thousands of genetic and epigenetic variations. AI-based algorithms hold great promise to pave the way to identify these genetic mutations and aberrant protein interactions at a very early stage. Modern biomedical research is also focused to bring AI technology to the clinics safely and ethically. AI-based assistance to pathologists and physicians could be the great leap forward towards prediction for disease risk, diagnosis, prognosis, and treatments. Clinical applications of AI and Machine Learning (ML) in cancer diagnosis and treatment are the future of medical guidance towards faster mapping of a new treatment for every individual. By using AI base system approach, researchers can collaborate in real-time and share knowledge digitally to potentially heal millions. In this review, we focused to present game-changing technology of the future in clinics, by connecting biology with Artificial Intelligence and explain how AI-based assistance help oncologist for precise treatment.
Collapse
Affiliation(s)
- Muhammad Javed Iqbal
- Department of Biotechnology, Faculty of Sciences, University of Sialkot, Sialkot, Pakistan
| | - Zeeshan Javed
- Office for Research Innovation and Commercialization (ORIC), Lahore Garrison University, Sector-C, DHA Phase-VI, Lahore, Pakistan
| | - Haleema Sadia
- Department of Biotechnology, Balochistan University of Information Technology Engineering and Management Sciences (BUITEMS), Quetta, Pakistan
| | | | - Asma Irshad
- Department of Life Sciences, University of Management Sciences and Technology, Lahore, Pakistan
| | - Rais Ahmed
- Department of Microbiology, Cholistan University of Veterinary and Animal Sciences, Bahawalpur, Pakistan
| | - Kausar Malik
- Center for Excellence in Molecular Biology, University of the Punjab, Lahore, Pakistan
| | - Shahid Raza
- Office for Research Innovation and Commercialization (ORIC), Lahore Garrison University, Sector-C, DHA Phase-VI, Lahore, Pakistan
| | - Asif Abbas
- Department of Biotechnology, Faculty of Sciences, University of Sialkot, Sialkot, Pakistan
| | - Raffaele Pezzani
- Dept. Medicine (DIMED), OU Endocrinology, University of Padova, via Ospedale 105, 35128 Padova, Italy
- AIROB, Associazione Italiana Per La Ricerca Oncologica Di Base, Padova, Italy
| | - Javad Sharifi-Rad
- Phytochemistry Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Facultad de Medicina, Universidad del Azuay, Cuenca, Ecuador
| |
Collapse
|
17
|
Liu S, Tang H, Liu H, Wang J. Multi-label Learning for the Diagnosis of Cancer and Identification of Novel Biomarkers with High-throughput Omics. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200623130416] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
The advancement of bioinformatics and machine learning has facilitated the
diagnosis of cancer and the discovery of omics-based biomarkers.
Objective:
Our study employed a novel data-driven approach to classifying the normal samples and
different types of gastrointestinal cancer samples, to find potential biomarkers for effective diagnosis
and prognosis assessment of gastrointestinal cancer patients.
Methods:
Different feature selection methods were used, and the diagnostic performance of the proposed
biosignatures was benchmarked using support vector machine (SVM) and random forest (RF)
models.
Results:
All models showed satisfactory performance in which Multilabel-RF appeared to be the best.
The accuracy of the Multilabel-RF based model was 83.12%, with precision, recall, F1, and Hamming-
Loss of 79.70%, 68.31%, 0.7357 and 0.1688, respectively. Moreover, proposed biomarker signatures
were highly associated with multifaceted hallmarks in cancer. Functional enrichment analysis and impact
of the biomarker candidates in the prognosis of the patients were also examined.
Conclusion:
We successfully introduced a solid workflow based on multi-label learning with High-
Throughput Omics for diagnosis of cancer and identification of novel biomarkers. Novel transcriptome
biosignatures that may improve the diagnostic accuracy in gastrointestinal cancer are introduced for
further validations in various clinical settings.
Collapse
Affiliation(s)
- Shicai Liu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hailin Tang
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Hongde Liu
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| | - Jinke Wang
- State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China
| |
Collapse
|
18
|
Adjuvant and Neoadjuvant Treatment of Triple-Negative Breast Cancer With Chemotherapy. ACTA ACUST UNITED AC 2021; 27:41-49. [PMID: 33475292 DOI: 10.1097/ppo.0000000000000498] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
ABSTRACT Triple-negative breast cancer (TNBC) accounts for 15% to 20% of all invasive breast carcinomas and is defined by the lack of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2. Although TNBC is characterized by high rates of disease recurrence and worse survival, it is significantly more sensitive to chemotherapy as compared with other breast cancer subtypes. Accordingly, despite great efforts in the genomic characterization of TNBC, chemotherapy still represents the cornerstone of treatment. For the majority of patients with early-stage TNBC, sequential anthracycline- and taxane-based neoadjuvant chemotherapy (NACT) represents the standard therapeutic approach, with pathological complete response that strongly correlates with long-term survival outcomes. However, some issues about the optimal neoadjuvant regimen, as well as the effective role of chemotherapy in patients with residual disease after NACT, are still debated. Herein, we will review the current evidences that guide the use of (neo)adjuvant chemotherapy in patients with early-stage TNBC. Furthermore, we will discuss current controversies, including the incorporation of platinum compounds to the neoadjuvant backbone and the optimal treatment for patients with residual disease after NACT. Lastly, we will outline potential future directions that can guide treatment escalation and de-escalation, as well as the development of new therapies. In our view, the application of multi-omics technologies, liquid biopsy assays, and machine learning algorithms are strongly warranted to pave the way toward personalized anticancer treatment for early-stage TNBC.
Collapse
|
19
|
Nachmanson D, Steward J, Yao H, Officer A, Jeong E, O'Keefe TJ, Hasteh F, Jepsen K, Hirst GL, Esserman LJ, Borowsky AD, Harismendy O. Mutational profiling of micro-dissected pre-malignant lesions from archived specimens. BMC Med Genomics 2020; 13:173. [PMID: 33208147 PMCID: PMC7672910 DOI: 10.1186/s12920-020-00820-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 11/09/2020] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Systematic cancer screening has led to the increased detection of pre-malignant lesions (PMLs). The absence of reliable prognostic markers has led mostly to over treatment resulting in potentially unnecessary stress, or insufficient treatment and avoidable progression. Importantly, most mutational profiling studies have relied on PML synchronous to invasive cancer, or performed in patients without outcome information, hence limiting their utility for biomarker discovery. The limitations in comprehensive mutational profiling of PMLs are in large part due to the significant technical and methodological challenges: most PML specimens are small, fixed in formalin and paraffin embedded (FFPE) and lack matching normal DNA. METHODS Using test DNA from a highly degraded FFPE specimen, multiple targeted sequencing approaches were evaluated, varying DNA input amount (3-200 ng), library preparation strategy (BE: Blunt-End, SS: Single-Strand, AT: A-Tailing) and target size (whole exome vs. cancer gene panel). Variants in high-input DNA from FFPE and mirrored frozen specimens were used for PML-specific variant calling training and testing, respectively. The resulting approach was applied to profile and compare multiple regions micro-dissected (mean area 5 mm2) from 3 breast ductal carcinoma in situ (DCIS). RESULTS Using low-input FFPE DNA, BE and SS libraries resulted in 4.9 and 3.7 increase over AT libraries in the fraction of whole exome covered at 20x (BE:87%, SS:63%, AT:17%). Compared to high-confidence somatic mutations from frozen specimens, PML-specific variant filtering increased recall (BE:85%, SS:80%, AT:75%) and precision (BE:93%, SS:91%, AT:84%) to levels expected from sampling variation. Copy number alterations were consistent across all tested approaches and only impacted by the design of the capture probe-set. Applied to DNA extracted from 9 micro-dissected regions (8 PML, 1 normal epithelium), the approach achieved comparable performance, illustrated the data adequacy to identify candidate driver events (GATA3 mutations, ERBB2 or FGFR1 gains, TP53 loss) and measure intra-lesion genetic heterogeneity. CONCLUSION Alternate experimental and analytical strategies increased the accuracy of DNA sequencing from archived micro-dissected PML regions, supporting the deeper molecular characterization of early cancer lesions and achieving a critical milestone in the development of biology-informed prognostic markers and precision chemo-prevention strategies.
Collapse
Affiliation(s)
- Daniela Nachmanson
- Bioinformatics and Systems Biology Graduate Program - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Joseph Steward
- Moores Cancer Center - UC San Diego Health - 3855 Health Sciences Dr., La Jolla, CA, 92093, USA
| | - Huazhen Yao
- Institute for Genomic Medicine - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Adam Officer
- Bioinformatics and Systems Biology Graduate Program - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA.,Division of Biomedical Informatics, Department of Medicine - UC San Diego School of Medicine, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Eliza Jeong
- Moores Cancer Center - UC San Diego Health - 3855 Health Sciences Dr., La Jolla, CA, 92093, USA
| | - Thomas J O'Keefe
- Division of Breast Surgery and The Comprehensive Breast Health Center - UC San Diego School of Medicine, 3855 Health Sciences Dr., La Jolla, CA, 92093, USA
| | - Farnaz Hasteh
- Department of Pathology - UC San Diego School of Medicine, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Kristen Jepsen
- Institute for Genomic Medicine - UC San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Gillian L Hirst
- Helen Diller Family Comprehensive Cancer Center - UC San Francisco School of Medicine, 1450 3rd St, San Francisco, CA, 94158, USA
| | - Laura J Esserman
- Helen Diller Family Comprehensive Cancer Center - UC San Francisco School of Medicine, 1450 3rd St, San Francisco, CA, 94158, USA
| | - Alexander D Borowsky
- Department of Pathology and Laboratory Medicine - UC Davis Comprehensive Cancer Center, UC Davis School of Medicine, 2279 45th Street, Sacramento, CA, 95817, USA
| | - Olivier Harismendy
- Moores Cancer Center - UC San Diego Health - 3855 Health Sciences Dr., La Jolla, CA, 92093, USA. .,Division of Biomedical Informatics, Department of Medicine - UC San Diego School of Medicine, 9500 Gilman Dr., La Jolla, CA, 92093, USA.
| |
Collapse
|
20
|
Meng J, Victor B, He Z, Liu H, Jiang T. DeepSSV: detecting somatic small variants in paired tumor and normal sequencing data with convolutional neural network. Brief Bioinform 2020; 22:5960414. [PMID: 33164053 DOI: 10.1093/bib/bbaa272] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Revised: 09/05/2020] [Accepted: 09/19/2020] [Indexed: 01/16/2023] Open
Abstract
It is of considerable interest to detect somatic mutations in paired tumor and normal sequencing data. A number of callers that are based on statistical or machine learning approaches have been developed to detect somatic small variants. However, they take into consideration only limited information about the reference and potential variant allele in both tumor and normal samples at a candidate somatic site. Also, they differ in how biological and technological noises are addressed. Hence, they are expected to produce divergent outputs. To overcome the drawbacks of existing somatic callers, we develop a deep learning-based tool called DeepSSV, which employs a convolutional neural network (CNN) model to learn increasingly abstract feature representations from the raw data in higher feature layers. DeepSSV creates a spatially oriented representation of read alignments around the candidate somatic sites adapted for the convolutional architecture, which enables it to expand to effectively gather scattered evidence. Moreover, DeepSSV incorporates the mapping information of both reference allele-supporting and variant allele-supporting reads in the tumor and normal samples at a genomic site that are readily available in the pileup format file. Together, the CNN model can process the whole alignment information. Such representational richness allows the model to capture the dependencies in the sequence and identify context-based sequencing artifacts. We fitted the model on ground truth somatic mutations and did benchmarking experiments on simulated and real tumors. The benchmarking results demonstrate that DeepSSV outperforms its state-of-the-art competitors in overall F1 score.
Collapse
Affiliation(s)
- Jing Meng
- Suzhou Institute of Systems Medicine, Center for Systems Medicine, Chinese Academy of Medical Sciences & Peking Union Medical College, Suzhou, Jiangsu, China
| | | | - Zhen He
- La Trobe University, Melbourne, Victoria, Australia
| | | | - Taijiao Jiang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| |
Collapse
|
21
|
Wang M, Luo W, Jones K, Bian X, Williams R, Higson H, Wu D, Hicks B, Yeager M, Zhu B. SomaticCombiner: improving the performance of somatic variant calling based on evaluation tests and a consensus approach. Sci Rep 2020; 10:12898. [PMID: 32732891 PMCID: PMC7393490 DOI: 10.1038/s41598-020-69772-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 07/16/2020] [Indexed: 02/06/2023] Open
Abstract
It is challenging to identify somatic variants from high-throughput sequence reads due to tumor heterogeneity, sub-clonality, and sequencing artifacts. In this study, we evaluated the performance of eight primary somatic variant callers and multiple ensemble methods using both real and synthetic whole-genome sequencing, whole-exome sequencing, and deep targeted sequencing datasets with the NA12878 cell line. The test results showed that a simple consensus approach can significantly improve performance even with a limited number of callers and is more robust and stable than machine learning based ensemble approaches. To fully exploit the multi-callers, we also developed a software package, SomaticCombiner, that can combine multiple callers and integrates a new variant allelic frequency (VAF) adaptive majority voting approach, which can maintain sensitive detection for variants with low VAFs.
Collapse
Affiliation(s)
- Mingyi Wang
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA.
| | - Wen Luo
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Kristine Jones
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Xiaopeng Bian
- Center for Biomedical Informatics and Information Technology, National Cancer Institute, Rockville, MD, 20850, USA
| | - Russell Williams
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Herbert Higson
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Dongjing Wu
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Belynda Hicks
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Meredith Yeager
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA
| | - Bin Zhu
- Cancer Genomics Research Laboratory, Division of Cancer Epidemiology and Genetics, Frederick National Laboratory for Cancer Research, Frederick, MD, 20877, USA.
| |
Collapse
|
22
|
Binatti A, Bresolin S, Bortoluzzi S, Coppe A. iWhale: a computational pipeline based on Docker and SCons for detection and annotation of somatic variants in cancer WES data. Brief Bioinform 2020; 22:5840042. [PMID: 32436933 PMCID: PMC8557746 DOI: 10.1093/bib/bbaa065] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2019] [Revised: 03/27/2020] [Accepted: 03/30/2020] [Indexed: 12/11/2022] Open
Abstract
Whole exome sequencing (WES) is a powerful approach for discovering sequence variants in cancer cells but its time effectiveness is limited by the complexity and issues of WES data analysis. Here we present iWhale, a customizable pipeline based on Docker and SCons, reliably detecting somatic variants by three complementary callers (MuTect2, Strelka2 and VarScan2). The results are combined to obtain a single variant call format file for each sample and variants are annotated by integrating a wide range of information extracted from several reference databases, ultimately allowing variant and gene prioritization according to different criteria. iWhale allows users to conduct a complex series of WES analyses with a powerful yet customizable and easy-to-use tool, running on most operating systems (macOs, GNU/Linux and Windows). iWhale code is freely available at https://github.com/alexcoppe/iWhale and the docker image is downloadable from https://hub.docker.com/r/alexcoppe/iwhale.
Collapse
Affiliation(s)
| | | | - Stefania Bortoluzzi
- Corresponding authors: Stefania Bortoluzzi, Department of Molecular Medicine, University of Padova, Padova, Italy. E-mail: ; Alessandro Coppe, Department of Women's and Children's Health, Department of Biology, University of Padova and Department of Biology, Padova, Italy. Tel.: +39 049 8276502; E-mail:
| | - Alessandro Coppe
- Corresponding authors: Stefania Bortoluzzi, Department of Molecular Medicine, University of Padova, Padova, Italy. E-mail: ; Alessandro Coppe, Department of Women's and Children's Health, Department of Biology, University of Padova and Department of Biology, Padova, Italy. Tel.: +39 049 8276502; E-mail:
| |
Collapse
|
23
|
Bartha Á, Győrffy B. Comprehensive Outline of Whole Exome Sequencing Data Analysis Tools Available in Clinical Oncology. Cancers (Basel) 2019; 11:E1725. [PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 10/31/2019] [Accepted: 11/01/2019] [Indexed: 12/17/2022] Open
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.
Collapse
Affiliation(s)
- Áron Bartha
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| | - Balázs Győrffy
- Semmelweis University, Department of Bioinformatics and 2nd Department of Pediatrics, H-1094 Budapest, Hungary.
- TTK Cancer Biomarker Research Group, Institute of Enzymology, Magyar tudósokkörútja 2., H-1117 Budapest, Hungary.
| |
Collapse
|