1
|
Shukla K, Idanwekhai K, Naradikian M, Ting S, Schoenberger SP, Brunk E. Machine Learning of Three-Dimensional Protein Structures to Predict the Functional Impacts of Genome Variation. J Chem Inf Model 2024; 64:5328-5343. [PMID: 38635316 DOI: 10.1021/acs.jcim.3c01967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Research in the human genome sciences generates a substantial amount of genetic data for hundreds of thousands of individuals, which concomitantly increases the number of variants of unknown significance (VUS). Bioinformatic analyses can successfully reveal rare variants and variants with clear associations with disease-related phenotypes. These studies have had a significant impact on how clinical genetic screens are interpreted and how patients are stratified for treatment. There are few, if any, computational methods for variants comparable to biological activity predictions. To address this gap, we developed a machine learning method that uses protein three-dimensional structures from AlphaFold to predict how a variant will influence changes to a gene's downstream biological pathways. We trained state-of-the-art machine learning classifiers to predict which protein regions will most likely impact transcriptional activities of two proto-oncogenes, nuclear factor erythroid 2 (NFE2L2)-related factor 2 (NRF2) and c-Myc. We have identified classifiers that attain accuracies higher than 80%, which have allowed us to identify a set of key protein regions that lead to significant perturbations in c-Myc or NRF2 transcriptional pathway activities.
Collapse
Affiliation(s)
- Kriti Shukla
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Kelvin Idanwekhai
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | - Martin Naradikian
- La Jolla Institute for Immunology, San Diego, California 92093, United States
| | - Stephanie Ting
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| | | | - Elizabeth Brunk
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Integrative Program for Biological and Genome Sciences (IBGS), University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
- Computational Medicine Program, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27516, United States
| |
Collapse
|
2
|
Cao X, Huber S, Ahari AJ, Traube FR, Seifert M, Oakes CC, Secheyko P, Vilov S, Scheller IF, Wagner N, Yépez VA, Blombery P, Haferlach T, Heinig M, Wachutka L, Hutter S, Gagneur J. Analysis of 3760 hematologic malignancies reveals rare transcriptomic aberrations of driver genes. Genome Med 2024; 16:70. [PMID: 38769532 PMCID: PMC11103968 DOI: 10.1186/s13073-024-01331-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 04/04/2024] [Indexed: 05/22/2024] Open
Abstract
BACKGROUND Rare oncogenic driver events, particularly affecting the expression or splicing of driver genes, are suspected to substantially contribute to the large heterogeneity of hematologic malignancies. However, their identification remains challenging. METHODS To address this issue, we generated the largest dataset to date of matched whole genome sequencing and total RNA sequencing of hematologic malignancies from 3760 patients spanning 24 disease entities. Taking advantage of our dataset size, we focused on discovering rare regulatory aberrations. Therefore, we called expression and splicing outliers using an extension of the workflow DROP (Detection of RNA Outliers Pipeline) and AbSplice, a variant effect predictor that identifies genetic variants causing aberrant splicing. We next trained a machine learning model integrating these results to prioritize new candidate disease-specific driver genes. RESULTS We found a median of seven expression outlier genes, two splicing outlier genes, and two rare splice-affecting variants per sample. Each category showed significant enrichment for already well-characterized driver genes, with odds ratios exceeding three among genes called in more than five samples. On held-out data, our integrative modeling significantly outperformed modeling based solely on genomic data and revealed promising novel candidate driver genes. Remarkably, we found a truncated form of the low density lipoprotein receptor LRP1B transcript to be aberrantly overexpressed in about half of hairy cell leukemia variant (HCL-V) samples and, to a lesser extent, in closely related B-cell neoplasms. This observation, which was confirmed in an independent cohort, suggests LRP1B as a novel marker for a HCL-V subclass and a yet unreported functional role of LRP1B within these rare entities. CONCLUSIONS Altogether, our census of expression and splicing outliers for 24 hematologic malignancy entities and the companion computational workflow constitute unique resources to deepen our understanding of rare oncogenic events in hematologic cancers.
Collapse
Affiliation(s)
- Xueqi Cao
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Graduate School of Quantitative Biosciences (QBM), Munich, Germany
| | - Sandra Huber
- Munich Leukemia Laboratory (MLL), Munich, Germany
| | - Ata Jadid Ahari
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Franziska R Traube
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Stuttgart, Germany
| | - Marc Seifert
- Department of Haematology, Oncology and Clinical Immunology, University Hospital Düsseldorf, Düsseldorf, Germany
| | - Christopher C Oakes
- Division of Hematology, Department of Internal Medicine, The Ohio State University, Columbus, OH, USA
| | - Polina Secheyko
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Faculty of Biology, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Sergey Vilov
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Ines F Scheller
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Nils Wagner
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany
| | - Vicente A Yépez
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
| | - Piers Blombery
- Peter MacCallum Cancer Centre, Melbourne, Australia
- University of Melbourne, Melbourne, Australia
- Torsten Haferlach Leukämiediagnostik Stiftung, Munich, Germany
| | | | - Matthias Heinig
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany
| | - Leonhard Wachutka
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
| | | | - Julien Gagneur
- School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- Graduate School of Quantitative Biosciences (QBM), Munich, Germany.
- Computational Health Center, Helmholtz Center Munich, Neuherberg, Germany.
- Institute of Human Genetics, School of Medicine and Health, Technical University of Munich, Munich, Germany.
| |
Collapse
|
3
|
Echeverría-Garcés G, Ramos-Medina MJ, Vargas R, Cabrera-Andrade A, Altamirano-Colina A, Freire MP, Montalvo-Guerrero J, Rivera-Orellana S, Echeverría-Espinoza P, Quiñones LA, López-Cortés A. Gastric cancer actionable genomic alterations across diverse populations worldwide and pharmacogenomics strategies based on precision oncology. Front Pharmacol 2024; 15:1373007. [PMID: 38756376 PMCID: PMC11096557 DOI: 10.3389/fphar.2024.1373007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 04/10/2024] [Indexed: 05/18/2024] Open
Abstract
Introduction: Gastric cancer is one of the most prevalent types of cancer worldwide. The World Health Organization (WHO), the International Agency for Research on Cancer (IARC), and the Global Cancer Statistics (GLOBOCAN) reported an age standardized global incidence rate of 9.2 per 100,000 individuals for gastric cancer in 2022, with a mortality rate of 6.1. Despite considerable progress in precision oncology through the efforts of international consortia, understanding the genomic features and their influence on the effectiveness of anti-cancer treatments across diverse ethnic groups remains essential. Methods: Our study aimed to address this need by conducting integrated in silico analyses to identify actionable genomic alterations in gastric cancer driver genes, assess their impact using deleteriousness scores, and determine allele frequencies across nine global populations: European Finnish, European non-Finnish, Latino, East Asian, South Asian, African, Middle Eastern, Ashkenazi Jewish, and Amish. Furthermore, our goal was to prioritize targeted therapeutic strategies based on pharmacogenomics clinical guidelines, in silico drug prescriptions, and clinical trial data. Results: Our comprehensive analysis examined 275,634 variants within 60 gastric cancer driver genes from 730,947 exome sequences and 76,215 whole-genome sequences from unrelated individuals, identifying 13,542 annotated and predicted oncogenic variants. We prioritized the most prevalent and deleterious oncogenic variants for subsequent pharmacogenomics testing. Additionally, we discovered actionable genomic alterations in the ARID1A, ATM, BCOR, ERBB2, ERBB3, CDKN2A, KIT, PIK3CA, PTEN, NTRK3, TP53, and CDKN2A genes that could enhance the efficacy of anti-cancer therapies, as suggested by in silico drug prescription analyses, reviews of current pharmacogenomics clinical guidelines, and evaluations of phase III and IV clinical trials targeting gastric cancer driver proteins. Discussion: These findings underline the urgency of consolidating efforts to devise effective prevention measures, invest in genomic profiling for underrepresented populations, and ensure the inclusion of ethnic minorities in future clinical trials and cancer research in developed countries.
Collapse
Affiliation(s)
- Gabriela Echeverría-Garcés
- Centro de Referencia Nacional de Genómica, Secuenciación y Bioinformática, Instituto Nacional de Investigación en Salud Pública “Leopoldo Izquieta Pérez”, Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Santiago, Chile
| | - María José Ramos-Medina
- German Cancer Research Center (DKFZ), Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Rodrigo Vargas
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Santiago, Chile
- Department of Molecular Biology, Galileo University, Guatemala City, Guatemala
| | - Alejandro Cabrera-Andrade
- Escuela de Enfermería, Facultad de Ciencias de La Salud, Universidad de Las Américas, Quito, Ecuador
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | | | - María Paula Freire
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | | | | | | | - Luis A. Quiñones
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Santiago, Chile
- Laboratory of Chemical Carcinogenesis and Pharmacogenetics, Department of Basic-Clinical Oncology (DOBC), Faculty of Medicine, University of Chile, Santiago, Chile
- Department of Pharmaceutical Sciences and Technology, Faculty of Chemical and Pharmaceutical Sciences, University of Chile, Santiago, Chile
| | - Andrés López-Cortés
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| |
Collapse
|
4
|
Ng JK, Chen Y, Akinwe TM, Heins HB, Mehinovic E, Chang Y, Payne ZL, Manuel JG, Karchin R, Turner TN. Proteome-Wide Assessment of Clustering of Missense Variants in Neurodevelopmental Disorders Versus Cancer. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.02.24302238. [PMID: 38352539 PMCID: PMC10863034 DOI: 10.1101/2024.02.02.24302238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Missense de novo variants (DNVs) and missense somatic variants contribute to neurodevelopmental disorders (NDDs) and cancer, respectively. Proteins with statistical enrichment based on analyses of these variants exhibit convergence in the differing NDD and cancer phenotypes. Herein, the question of why some of the same proteins are identified in both phenotypes is examined through investigation of clustering of missense variation at the protein level. Our hypothesis is that missense variation is present in different protein locations in the two phenotypes leading to the distinct phenotypic outcomes. We tested this hypothesis in 1D protein space using our software CLUMP. Furthermore, we newly developed 3D-CLUMP that uses 3D protein structures to spatially test clustering of missense variation for proteome-wide significance. We examined missense DNVs in 39,883 parent-child sequenced trios with NDDs and missense somatic variants from 10,543 sequenced tumors covering five TCGA cancer types and two COSMIC pan-cancer aggregates of tissue types. There were 57 proteins with proteome-wide significant missense variation clustering in NDDs when compared to cancers and 79 proteins with proteome-wide significant missense clustering in cancers compared to NDDs. While our main objective was to identify differences in patterns of missense variation, we also identified a novel NDD protein BLTP2. Overall, our study is innovative, provides new insights into differential missense variation in NDDs and cancer at the protein-level, and contributes necessary information toward building a framework for thinking about prognostic and therapeutic aspects of these proteins.
Collapse
Affiliation(s)
- Jeffrey K. Ng
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Yilin Chen
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Titilope M. Akinwe
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Molecular Genetics & Genomics Graduate Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hillary B. Heins
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Elvisa Mehinovic
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Yoonhoo Chang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Human & Statistical Genetics Graduate Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Zachary L. Payne
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Molecular Genetics & Genomics Graduate Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Juana G. Manuel
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Rachel Karchin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- The Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
5
|
Balasooriya ER, Madhusanka D, López-Palacios TP, Eastmond RJ, Jayatunge D, Owen JJ, Gashler JS, Egbert CM, Bulathsinghalage C, Liu L, Piccolo SR, Andersen JL. Integrating Clinical Cancer and PTM Proteomics Data Identifies a Mechanism of ACK1 Kinase Activation. Mol Cancer Res 2024; 22:137-151. [PMID: 37847650 PMCID: PMC10831333 DOI: 10.1158/1541-7786.mcr-23-0153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 08/17/2023] [Accepted: 10/13/2023] [Indexed: 10/19/2023]
Abstract
Beyond the most common oncogenes activated by mutation (mut-drivers), there likely exists a variety of low-frequency mut-drivers, each of which is a possible frontier for targeted therapy. To identify new and understudied mut-drivers, we developed a machine learning (ML) model that integrates curated clinical cancer data and posttranslational modification (PTM) proteomics databases. We applied the approach to 62,746 patient cancers spanning 84 cancer types and predicted 3,964 oncogenic mutations across 1,148 genes, many of which disrupt PTMs of known and unknown function. The list of putative mut-drivers includes established drivers and others with poorly understood roles in cancer. This ML model is available as a web application. As a case study, we focused the approach on nonreceptor tyrosine kinases (NRTK) and found a recurrent mutation in activated CDC42 kinase-1 (ACK1) that disrupts the Mig6 homology region (MHR) and ubiquitin-association (UBA) domains on the ACK1 C-terminus. By studying these domains in cultured cells, we found that disruption of the MHR domain helps activate the kinase while disruption of the UBA increases kinase stability by blocking its lysosomal degradation. This ACK1 mutation is analogous to lymphoma-associated mutations in its sister kinase, TNK1, which also disrupt a C-terminal inhibitory motif and UBA domain. This study establishes a mut-driver discovery tool for the research community and identifies a mechanism of ACK1 hyperactivation shared among ACK family kinases. IMPLICATIONS This research identifies a potentially targetable activating mutation in ACK1 and other possible oncogenic mutations, including PTM-disrupting mutations, for further study.
Collapse
Affiliation(s)
- Eranga R. Balasooriya
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
- Center for Cancer Research, Massachusetts General Hospital, Boston, Massachusetts
- Dept. of Medicine, Harvard Medical School, Boston, Massachusetts
| | - Deshan Madhusanka
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
- Department of Oncological Sciences and Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah
| | - Tania P. López-Palacios
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
- Department of Oncological Sciences and Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah
| | - Riley J. Eastmond
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
| | - Dasun Jayatunge
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
- Department of Oncological Sciences and Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah
| | - Jake J. Owen
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
| | - Jack S. Gashler
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
| | - Christina M. Egbert
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
| | | | - Lu Liu
- Department of Computer Science, North Dakota State University, Fargo, North Dakota
| | | | - Joshua L. Andersen
- The Fritz B. Burns Cancer Research Laboratory, Department of Chemistry and Biochemistry, Brigham Young University, Provo, Utah
- Department of Oncological Sciences and Huntsman Cancer Institute, University of Utah School of Medicine, Salt Lake City, Utah
| |
Collapse
|
6
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
7
|
Demajo S, Ramis-Zaldivar JE, Muiños F, Grau ML, Andrianova M, López-Bigas N, González-Pérez A. Identification of Clonal Hematopoiesis Driver Mutations through In Silico Saturation Mutagenesis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.13.23299893. [PMID: 38168256 PMCID: PMC10760256 DOI: 10.1101/2023.12.13.23299893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
Clonal hematopoiesis (CH) is a phenomenon of clonal expansion of hematopoietic stem cells driven by somatic mutations affecting certain genes. Recently, CH has been linked to the development of a number of hematologic malignancies, cardiovascular diseases and other conditions. Although the most frequently mutated CH driver genes have been identified, a systematic landscape of the mutations capable of initiating this phenomenon is still lacking. Here, we train high-quality machine-learning models for 12 of the most recurrent CH driver genes to identify their driver mutations. These models outperform an experimental base-editing approach and expert-curated rules based on prior knowledge of the function of these genes. Moreover, their application to identify CH driver mutations across almost half a million donors of the UK Biobank reproduces known associations between CH driver mutations and age, and the prevalence of several diseases and conditions. We thus propose that these models support the accurate identification of CH across healthy individuals.
Collapse
Affiliation(s)
- Santiago Demajo
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
| | - Joan Enric Ramis-Zaldivar
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Ferran Muiños
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
| | - Miguel L Grau
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Maria Andrianova
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
| | - Núria López-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Abel González-Pérez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Baldiri Reixac, 10, 08028 Barcelona, Spain
- Centro de Investigación Biomédica en Red en Cáncer (CIBERONC), Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
8
|
Hatano N, Kamada M, Kojima R, Okuno Y. Network-based prediction approach for cancer-specific driver missense mutations using a graph neural network. BMC Bioinformatics 2023; 24:383. [PMID: 37817080 PMCID: PMC10565986 DOI: 10.1186/s12859-023-05507-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/02/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND In cancer genomic medicine, finding driver mutations involved in cancer development and tumor growth is crucial. Machine-learning methods to predict driver missense mutations have been developed because variants are frequently detected by genomic sequencing. However, even though the abnormalities in molecular networks are associated with cancer, many of these methods focus on individual variants and do not consider molecular networks. Here we propose a new network-based method, Net-DMPred, to predict driver missense mutations considering molecular networks. Net-DMPred consists of the graph part and the prediction part. In the graph part, molecular networks are learned by a graph neural network (GNN). The prediction part learns whether variants are driver variants using features of individual variants combined with the graph features learned in the graph part. RESULTS Net-DMPred, which considers molecular networks, performed better than conventional methods. Furthermore, the prediction performance differed by the molecular network structure used in learning, suggesting that it is important to consider not only the local network related to cancer but also the large-scale network in living organisms. CONCLUSIONS We propose a network-based machine learning method, Net-DMPred, for predicting cancer driver missense mutations. Our method enables us to consider the entire graph architecture representing the molecular network because it uses GNN. Net-DMPred is expected to detect driver mutations from a lot of missense mutations that are not known to be associated with cancer.
Collapse
Affiliation(s)
- Narumi Hatano
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Mayumi Kamada
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| | - Ryosuke Kojima
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science(R-CCS), Kobe, Japan.
| |
Collapse
|
9
|
Yang H, Liu Y, Yang Y, Li D, Wang Z. InDEP: an interpretable machine learning approach to predict cancer driver genes from multi-omics data. Brief Bioinform 2023; 24:bbad318. [PMID: 37649392 DOI: 10.1093/bib/bbad318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Revised: 06/14/2023] [Accepted: 08/16/2023] [Indexed: 09/01/2023] Open
Abstract
Cancer driver genes are critical in driving tumor cell growth, and precisely identifying these genes is crucial in advancing our understanding of cancer pathogenesis and developing targeted cancer drugs. Despite the current methods for discovering cancer driver genes that mainly rely on integrating multi-omics data, many existing models are overly complex, and it is difficult to interpret the results accurately. This study aims to address this issue by introducing InDEP, an interpretable machine learning framework based on cascade forests. InDEP is designed with easy-to-interpret features, cascade forests based on decision trees and a KernelSHAP module that enables fine-grained post-hoc interpretation. Integrating multi-omics data, InDEP can identify essential features of classified driver genes at both the gene and cancer-type levels. The framework accurately identifies driver genes, discovers new patterns that make genes as driver genes and refines the cancer driver gene catalog. In comparison with state-of-the-art methods, InDEP proved to be more accurate on the test set and identified reliable candidate driver genes. Mutational features were the primary drivers for InDEP's identifying driver genes, with other omics features also contributing. At the gene level, the framework concluded that substitution-type mutations were the main reason most genes were identified as driver genes. InDEP's ability to identify reliable candidate driver genes opens up new avenues for precision oncology and discovering new biomedical knowledge. This framework can help advance cancer research by providing an interpretable method for identifying cancer driver genes and their contribution to cancer pathogenesis, facilitating the development of targeted cancer drugs.
Collapse
Affiliation(s)
- Hai Yang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yawen Liu
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Yijing Yang
- Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, Illinois, United States of America
| | - Dongdong Li
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| | - Zhe Wang
- Department of Computer Science and Engineering, East China University of Science and Technology, 200237, Shanghai, PR China
| |
Collapse
|
10
|
Li Y, Porta-Pardo E, Tokheim C, Bailey MH, Yaron TM, Stathias V, Geffen Y, Imbach KJ, Cao S, Anand S, Akiyama Y, Liu W, Wyczalkowski MA, Song Y, Storrs EP, Wendl MC, Zhang W, Sibai M, Ruiz-Serra V, Liang WW, Terekhanova NV, Rodrigues FM, Clauser KR, Heiman DI, Zhang Q, Aguet F, Calinawan AP, Dhanasekaran SM, Birger C, Satpathy S, Zhou DC, Wang LB, Baral J, Johnson JL, Huntsman EM, Pugliese P, Colaprico A, Iavarone A, Chheda MG, Ricketts CJ, Fenyö D, Payne SH, Rodriguez H, Robles AI, Gillette MA, Kumar-Sinha C, Lazar AJ, Cantley LC, Getz G, Ding L. Pan-cancer proteogenomics connects oncogenic drivers to functional states. Cell 2023; 186:3921-3944.e25. [PMID: 37582357 DOI: 10.1016/j.cell.2023.07.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 12/30/2022] [Accepted: 07/10/2023] [Indexed: 08/17/2023]
Abstract
Cancer driver events refer to key genetic aberrations that drive oncogenesis; however, their exact molecular mechanisms remain insufficiently understood. Here, our multi-omics pan-cancer analysis uncovers insights into the impacts of cancer drivers by identifying their significant cis-effects and distal trans-effects quantified at the RNA, protein, and phosphoprotein levels. Salient observations include the association of point mutations and copy-number alterations with the rewiring of protein interaction networks, and notably, most cancer genes converge toward similar molecular states denoted by sequence-based kinase activity profiles. A correlation between predicted neoantigen burden and measured T cell infiltration suggests potential vulnerabilities for immunotherapies. Patterns of cancer hallmarks vary by polygenic protein abundance ranging from uniform to heterogeneous. Overall, our work demonstrates the value of comprehensive proteogenomics in understanding the functional states of oncogenic drivers and their links to cancer development, surpassing the limitations of studying individual cancer types.
Collapse
Affiliation(s)
- Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Eduard Porta-Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Collin Tokheim
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Matthew H Bailey
- Department of Biology and Simmons Center for Cancer Research, Brigham Young University, Provo, UT 84602, USA
| | - Tomer M Yaron
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA; Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Vasileios Stathias
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Department of Molecular and Cellular Pharmacology, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Yifat Geffen
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02115, USA
| | - Kathleen J Imbach
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Song Cao
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Shankara Anand
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Yo Akiyama
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Wenke Liu
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Yizhe Song
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Erik P Storrs
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Michael C Wendl
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Mathematics, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Wubing Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Mustafa Sibai
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Victoria Ruiz-Serra
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain; Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Wen-Wei Liang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Nadezhda V Terekhanova
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Fernanda Martins Rodrigues
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Karl R Clauser
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - David I Heiman
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Qing Zhang
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Francois Aguet
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Anna P Calinawan
- Department of Genetic and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Saravana M Dhanasekaran
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chet Birger
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Shankha Satpathy
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA
| | - Daniel Cui Zhou
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Liang-Bo Wang
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Jessika Baral
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Jared L Johnson
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Emily M Huntsman
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA
| | - Pietro Pugliese
- Department of Science and Technology, University of Sannio, 82100 Benevento, Italy
| | - Antonio Colaprico
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Department of Public Health Sciences, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Antonio Iavarone
- Sylvester Comprehensive Cancer Center, University of Miami Miller School of Medicine, Miami, FL 33136, USA; Department of Neurological Surgery, Department of Biochemistry and Molecular Biology, University of Miami Miller School of Medicine, Miami, FL 33136, USA
| | - Milan G Chheda
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63130, USA; Department of Neurology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Christopher J Ricketts
- Urologic Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - David Fenyö
- Institute for Systems Genetics, NYU Grossman School of Medicine, New York, NY 10016, USA; Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT 84602, USA
| | - Henry Rodriguez
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA
| | - Ana I Robles
- Office of Cancer Clinical Proteomics Research, National Cancer Institute, Rockville, MD 20850, USA
| | - Michael A Gillette
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Boston, MA 02115, USA
| | - Chandan Kumar-Sinha
- Michigan Center for Translational Pathology, Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alexander J Lazar
- Departments of Pathology & Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Lewis C Cantley
- Meyer Cancer Center, Weill Cornell Medicine, New York, NY 10021, USA; Department of Medicine, Weill Cornell Medicine, New York, NY 10021, USA.
| | - Gad Getz
- Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA 02142, USA; Cancer Center and Department of Pathology, Massachusetts General Hospital, Boston, MA 02115, USA; Harvard Medical School, Boston, MA 02115, USA.
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO 63110, USA; McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA; Department of Genetics, Washington University in St. Louis, St. Louis, MO 63130, USA; Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO 63130, USA.
| |
Collapse
|
11
|
Johnson A, Ng PKS, Kahle M, Castillo J, Amador B, Wang Y, Zeng J, Holla V, Vu T, Su F, Kim SH, Conway T, Jiang X, Chen K, Shaw KRM, Yap TA, Rodon J, Mills GB, Meric-Bernstam F. Actionability classification of variants of unknown significance correlates with functional effect. NPJ Precis Oncol 2023; 7:67. [PMID: 37454202 DOI: 10.1038/s41698-023-00420-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 07/03/2023] [Indexed: 07/18/2023] Open
Abstract
Genomically-informed therapy requires consideration of the functional impact of genomic alterations on protein expression and/or function. However, a substantial number of variants are of unknown significance (VUS). The MD Anderson Precision Oncology Decision Support (PODS) team developed an actionability classification scheme that categorizes VUS as either "Unknown" or "Potentially" actionable based on their location within functional domains and/or proximity to known oncogenic variants. We then compared PODS VUS actionability classification with results from a functional genomics platform consisting of mutant generation and cell viability assays. 106 (24%) of 438 VUS in 20 actionable genes were classified as oncogenic in functional assays. Variants categorized by PODS as Potentially actionable (N = 204) were more likely to be oncogenic than those categorized as Unknown (N = 230) (37% vs 13%, p = 4.08e-09). Our results demonstrate that rule-based actionability classification of VUS can identify patients more likely to have actionable variants for consideration with genomically-matched therapy.
Collapse
Affiliation(s)
- Amber Johnson
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Patrick Kwok-Shing Ng
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Michael Kahle
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Julia Castillo
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Bianca Amador
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Yujia Wang
- Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jia Zeng
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Vijaykumar Holla
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Thuy Vu
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Fei Su
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sun-Hee Kim
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tara Conway
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Xianli Jiang
- Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ken Chen
- Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kenna R Mills Shaw
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Timothy A Yap
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Investigational Cancer Therapeutics (Phase I Program), The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Jordi Rodon
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Investigational Cancer Therapeutics (Phase I Program), The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Gordon B Mills
- Division of Oncological Sciences, Knight Cancer Institute, Oregon Health and Science University, Portland, OR, USA
| | - Funda Meric-Bernstam
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Investigational Cancer Therapeutics (Phase I Program), The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| |
Collapse
|
12
|
Sierk M, Ratnayake S, Wagle MM, Chen B, Park B, Wang J, Youkharibache P, Meerzaman D. 3DVizSNP: a tool for rapidly visualizing missense mutations identified in high throughput experiments in iCn3D. BMC Bioinformatics 2023; 24:244. [PMID: 37296383 DOI: 10.1186/s12859-023-05370-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/30/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND High throughput experiments in cancer and other areas of genomic research identify large numbers of sequence variants that need to be evaluated for phenotypic impact. While many tools exist to score the likely impact of single nucleotide polymorphisms (SNPs) based on sequence alone, the three-dimensional structural environment is essential for understanding the biological impact of a nonsynonymous mutation. RESULTS We present a program, 3DVizSNP, that enables the rapid visualization of nonsynonymous missense mutations extracted from a variant caller format file using the web-based iCn3D visualization platform. The program, written in Python, leverages REST APIs and can be run locally without installing any other software or databases, or from a webserver hosted by the National Cancer Institute. It automatically selects the appropriate experimental structure from the Protein Data Bank, if available, or the predicted structure from the AlphaFold database, enabling users to rapidly screen SNPs based on their local structural environment. 3DVizSNP leverages iCn3D annotations and its structural analysis functions to assess changes in structural contacts associated with mutations. CONCLUSIONS This tool enables researchers to efficiently make use of 3D structural information to prioritize mutations for further computational and experimental impact assessment. The program is available as a webserver at https://analysistools.cancer.gov/3dvizsnp or as a standalone python program at https://github.com/CBIIT-CGBB/3DVizSNP .
Collapse
Affiliation(s)
- Michael Sierk
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA.
| | - Shashikala Ratnayake
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| | - Manoj M Wagle
- Faculty of Pharmacy, University of Grenoble Alpes, Grenoble, France
- Department of Bioinformatics, Manipal School of Life Sciences, Manipal Academy of Higher Education, Manipal, 576104, India
- School of Mathematics and Statistics, Faculty of Science, and Computational Systems Biology Group, Children's Medical Research Institute, University of Sydney, Camperdown, NSW, Australia
| | - Ben Chen
- Digital Services and Solutions Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| | - Brian Park
- Digital Services and Solutions Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| | - Jiyao Wang
- National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA
| | - Philippe Youkharibache
- Cancer Data Science Laboratory, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA
| | - Daoud Meerzaman
- Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology, National Cancer Institute, NIH, Rockville, MD, 20852, USA
| |
Collapse
|
13
|
Pandey M, Gromiha MM. MutBLESS: A tool to identify disease-prone sites in cancer using deep learning. Biochim Biophys Acta Mol Basis Dis 2023; 1869:166721. [PMID: 37105446 DOI: 10.1016/j.bbadis.2023.166721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Understanding the molecular basis and impact of mutations at different stages of cancer are long-standing challenges in cancer biology. Identification of driver mutations from experiments is expensive and time intensive. In the present study, we collected the data for experimentally known driver mutations in 22 different cancer types and classified them into six categories: breast cancer (BRCA), acute myeloid leukaemia (LAML), endometrial carcinoma (EC), stomach cancer (STAD), skin cancer (SKCM), and other cancer types which contains 5747 disease prone and 5514 neutral sites in 516 proteins. The analysis of amino acid distribution along mutant sites revealed that the motifs AAA and LR are preferred in disease-prone sites whereas QPP and QF are dominant in neutral sites. Further, we developed a method using deep neural networks to predict disease-prone sites with amino acid sequence-based features such as physicochemical properties, secondary structure, tri-peptide motifs and conservation scores. We obtained an average AUC of 0.97 in five cancer types BRCA, LAML, EC, STAD and SKCM in a test dataset and 0.72 in all other cancer types together. Our method showed excellent performance for identifying cancer-specific mutations with an average sensitivity, specificity, and accuracy of 96.56 %, 97.39 %, and 97.64 %, respectively. We developed a web server for identifying cancer-prone sites, and it is available at https://web.iitm.ac.in/bioinfo2/MutBLESS/index.html. We suggest that our method can serve as an effective method to identify disease-prone sites and assist to develop therapeutic strategies.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
14
|
Quan C, Liu F, Qi L, Tie Y. LRT-CLUSTER: A New Clustering Algorithm Based on Likelihood Ratio Test to Identify Driving Genes. Interdiscip Sci 2023; 15:217-230. [PMID: 36848004 DOI: 10.1007/s12539-023-00554-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 01/31/2023] [Accepted: 02/01/2023] [Indexed: 03/01/2023]
Abstract
Somatic mutations often occur at high relapse sites in protein sequences, which indicates that the location clustering of somatic missense mutations can be used to identify driving genes. However, the traditional clustering algorithm has such problems as the background signal over-fitting, the clustering algorithm is not suitable for mutation data, and the performance of identifying low-frequency mutation genes needs to be improved. In this paper, we propose a linear clustering algorithm based on likelihood ratio test knowledge to identify driver genes. In this experiment, firstly, the polynucleotide mutation rate is calculated based on the prior knowledge of likelihood ratio test. Then, the simulation data set is obtained through the background mutation rate model. Finally, the unsupervised peak clustering algorithm is used to, respectively, evaluate the somatic mutation data and the simulation data to identify the driver genes. The experimental results show that our method achieves a better balance of precision and sensitivity. It can also identify the driver genes missed by other methods, making it an effective supplement to other methods. We also discover some potential linkages between genes and between genes and mutation sites, which is of great value to target drug therapy research. Method framework: Our proposed model framework is as follows. a. Counting mutation sites and the number of mutations in tumor gene elements. b. The nucleotide context mutation frequency is counted based on the likelihood ratio test knowledge, and the background mutation rate model is obtained. c. Based on Monte Carlo simulation method, data sets with the same number of mutations as gene elements are randomly sampled to obtain simulated mutation data, and the sampling frequency of each mutation site is related to the mutation rate of polynucleotide. d. The original mutation data and the simulated mutation data after random reconstruction are clustered by peak density, respectively, and the corresponding clustering scores are obtained. e. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the original single nucleotide mutation data through step d. f. According to the observed score and the simulated clustering score, the p-value of the corresponding gene fragment is calculated. g. We can obtain the clustering information statistics in each gene segment and score of each gene segment from the simulated single nucleotide mutation data through step d.
Collapse
Affiliation(s)
- Chenxu Quan
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.,Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Fenghui Liu
- Department of Respiratory and Sleep Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Lin Qi
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China
| | - Yun Tie
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
15
|
Viehweger A. Faltwerk: a library for spatial exploratory data analysis of protein structures. BIOINFORMATICS ADVANCES 2023; 3:vbad007. [PMID: 36908399 PMCID: PMC9998081 DOI: 10.1093/bioadv/vbad007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 01/10/2023] [Accepted: 01/20/2023] [Indexed: 01/25/2023]
Abstract
Summary Proteins are fundamental building blocks of life and are investigated in a broad range of scientific fields, especially in the context of recent progress using in silico structure prediction models and the surge of resulting protein structures in public databases. However, exploratory data analysis of these proteins can be slow because of the need for several methods, ranging from geometric and spatial analysis to visualization. The Python library faltwerk provides an integrated toolkit to perform explorative work with rapid feedback. This toolkit includes support for protein complexes, spatial analysis (point density or spatial autocorrelation), ligand binding site prediction and an intuitive visualization interface based on the grammar of graphics. Availability and implementation faltwerk is distributed under the permissive BSD-3 open source license. Source code and documentation, including an extensive common-use case tutorial, can be found at github.com/phiweger/faltwerk; binaries are available from the pypi repository.
Collapse
Affiliation(s)
- Adrian Viehweger
- Institute of Medical Microbiology and Virology, University of Leipzig Medical Center, Leipzig 04103, Germany.,Institute of Human Genetics, University of Leipzig Medical Center, Leipzig 04103, Germany
| |
Collapse
|
16
|
Abstract
Mutations in genes that confer a selective advantage to hematopoietic stem cells (HSCs) drive clonal hematopoiesis (CH). While some CH drivers have been identified, the compendium of all genes able to drive CH upon mutations in HSCs remains incomplete. Exploiting signals of positive selection in blood somatic mutations may be an effective way to identify CH driver genes, analogously to cancer. Using the tumor sample in blood/tumor pairs as reference, we identify blood somatic mutations across more than 12,000 donors from two large cancer genomics cohorts. The application of IntOGen, a driver discovery pipeline, to both cohorts, and more than 24,000 targeted sequenced samples yields a list of close to 70 genes with signals of positive selection in CH, available at http://www.intogen.org/ch. This approach recovers known CH genes, and discovers other candidates. Identifying the genetic drivers of clonal haematopoiesis (CH) has been challenging due to their low frequencies and a lack of adequate tools. Here, the authors use a reverse calling to detect blood somatic mutations and the IntOGen pipeline to identify CH drivers in large cancer genomics data sets based on signals of positive selection.
Collapse
|
17
|
Zhang W, Roy Burman SS, Chen J, Donovan KA, Cao Y, Shu C, Zhang B, Zeng Z, Gu S, Zhang Y, Li D, Fischer ES, Tokheim C, Shirley Liu X. Machine Learning Modeling of Protein-intrinsic Features Predicts Tractability of Targeted Protein Degradation. GENOMICS, PROTEOMICS & BIOINFORMATICS 2022; 20:882-898. [PMID: 36494034 PMCID: PMC10025769 DOI: 10.1016/j.gpb.2022.11.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 10/25/2022] [Accepted: 11/04/2022] [Indexed: 12/12/2022]
Abstract
Targeted protein degradation (TPD) has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell's endogenous protein degradation machinery. However, the susceptibility of proteins for targeting by TPD approaches, termed "degradability", is largely unknown. Here, we developed a machine learning model, model-free analysis of protein degradability (MAPD), to predict degradability from features intrinsic to protein targets. MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds [with an area under the precision-recall curve (AUPRC) of 0.759 and an area under the receiver operating characteristic curve (AUROC) of 0.775] and is likely generalizable to independent non-kinase proteins. We found five features with statistical significance to achieve optimal prediction, with ubiquitination potential being the most predictive. By structural modeling, we found that E2-accessible ubiquitination sites, but not lysine residues in general, are particularly associated with kinase degradability. Finally, we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins (including proteins encoded by 278 cancer genes) that may be tractable to TPD drug development.
Collapse
Affiliation(s)
- Wubing Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Shourya S Roy Burman
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Jiaye Chen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Katherine A Donovan
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Yang Cao
- Center of Growth, Metabolism, and Aging, Key Laboratory of Bio-resource and Eco-environment, Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610064, China
| | - Chelsea Shu
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Research Scholar Initiative, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Boning Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Zexian Zeng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Shengqing Gu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Yi Zhang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Dian Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Eric S Fischer
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA.
| | - Collin Tokheim
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| | - X Shirley Liu
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
| |
Collapse
|
18
|
Belikov AV, Vyatkin AD, Leonov SV. Novel Driver Strength Index highlights important cancer genes in TCGA PanCanAtlas patients. PeerJ 2022; 10:e13860. [PMID: 35975235 PMCID: PMC9375969 DOI: 10.7717/peerj.13860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 07/18/2022] [Indexed: 01/18/2023] Open
Abstract
Background Cancer driver genes are usually ranked by mutation frequency, which does not necessarily reflect their driver strength. We hypothesize that driver strength is higher for genes preferentially mutated in patients with few driver mutations overall, because these few mutations should be strong enough to initiate cancer. Methods We propose formulas for the Driver Strength Index (DSI) and the Normalized Driver Strength Index (NDSI), the latter independent of gene mutation frequency. We validate them using TCGA PanCanAtlas datasets, established driver prediction algorithms and custom computational pipelines integrating SNA, CNA and aneuploidy driver contributions at the patient-level resolution. Results DSI and especially NDSI provide substantially different gene rankings compared to the frequency approach. E.g., NDSI prioritized members of specific protein families, including G proteins GNAQ, GNA11 and GNAS, isocitrate dehydrogenases IDH1 and IDH2, and fibroblast growth factor receptors FGFR2 and FGFR3. KEGG analysis shows that top NDSI-ranked genes comprise EGFR/FGFR2/GNAQ/GNA11-NRAS/HRAS/KRAS-BRAF pathway, AKT1-MTOR pathway, and TCEB1-VHL-HIF1A pathway. Conclusion Our indices are able to select for driver gene attributes not selected by frequency sorting, potentially for driver strength. Genes and pathways prioritized are likely the strongest contributors to cancer initiation and progression and should become future therapeutic targets.
Collapse
|
19
|
Li B, Roden DM, Capra JA. The 3D mutational constraint on amino acid sites in the human proteome. Nat Commun 2022; 13:3273. [PMID: 35672414 PMCID: PMC9174330 DOI: 10.1038/s41467-022-30936-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Accepted: 05/19/2022] [Indexed: 12/16/2022] Open
Abstract
Quantification of the tolerance of protein sites to genetic variation has become a cornerstone of variant interpretation. We hypothesize that the constraint on missense variation at individual amino acid sites is largely shaped by direct interactions with 3D neighboring sites. To quantify this constraint, we introduce a framework called COntact Set MISsense tolerance (or COSMIS) and comprehensively map the landscape of 3D mutational constraint on 6.1 million amino acid sites covering 16,533 human proteins. We show that 3D mutational constraint is pervasive and that the level of constraint is strongly associated with disease relevance both at the site and the protein level. We demonstrate that COSMIS performs significantly better at variant interpretation tasks than other population-based constraint metrics while also providing structural insight into the functional roles of constrained sites. We anticipate that COSMIS will facilitate the interpretation of protein-coding variation in evolution and prioritization of sites for mechanistic investigation.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37203, USA.
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA.
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Departments of Pharmacology and Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - John A Capra
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, 37203, USA.
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, 94143, USA.
| |
Collapse
|
20
|
Garcia-Prieto CA, Martínez-Jiménez F, Valencia A, Porta-Pardo E. Detection of oncogenic and clinically actionable mutations in cancer genomes critically depends on variant calling tools. Bioinformatics 2022; 38:3181-3191. [PMID: 35512388 PMCID: PMC9191211 DOI: 10.1093/bioinformatics/btac306] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 02/09/2022] [Accepted: 05/01/2022] [Indexed: 11/22/2022] Open
Abstract
Motivation The analysis of cancer genomes provides fundamental information about its etiology, the processes driving cell transformation or potential treatments. While researchers and clinicians are often only interested in the identification of oncogenic mutations, actionable variants or mutational signatures, the first crucial step in the analysis of any tumor genome is the identification of somatic variants in cancer cells (i.e. those that have been acquired during their evolution). For that purpose, a wide range of computational tools have been developed in recent years to detect somatic mutations in sequencing data from tumor samples. While there have been some efforts to benchmark somatic variant calling tools and strategies, the extent to which variant calling decisions impact the results of downstream analyses of tumor genomes remains unknown. Results Here, we quantify the impact of variant calling decisions by comparing the results obtained in three important analyses of cancer genomics data (identification of cancer driver genes, quantification of mutational signatures and detection of clinically actionable variants) when changing the somatic variant caller (MuSE, MuTect2, SomaticSniper and VarScan2) or the strategy to combine them (Consensus of two, Consensus of three and Union) across all 33 cancer types from The Cancer Genome Atlas. Our results show that variant calling decisions have a significant impact on these analyses, creating important differences that could even impact treatment decisions for some patients. Moreover, the Consensus of three calling strategy to combine the output of multiple variant calling tools, a very widely used strategy by the research community, can lead to the loss of some cancer driver genes and actionable mutations. Overall, our results highlight the limitations of widespread practices within the cancer genomics community and point to important differences in critical analyses of tumor sequencing data depending on variant calling, affecting even the identification of clinically actionable variants. Availability and implementation Code is available at https://github.com/carlosgarciaprieto/VariantCallingClinicalBenchmark. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos A Garcia-Prieto
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Francisco Martínez-Jiménez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Alfonso Valencia
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| | - Eduard Porta-Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain.,Barcelona Supercomputing Center (BSC), Barcelona, Spain
| |
Collapse
|
21
|
Li B, Jin B, Capra JA, Bush WS. Integration of Protein Structure and Population-Scale DNA Sequence Data for Disease Gene Discovery and Variant Interpretation. Annu Rev Biomed Data Sci 2022; 5:141-161. [PMID: 35508071 DOI: 10.1146/annurev-biodatasci-122220-112147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The experimental and computational techniques for capturing information about protein structures and genetic variation within the human genome have advanced dramatically in the past 20 years, generating extensive new data resources. In this review, we discuss these advances, along with new approaches for determining the impact a genetic variant has on protein function. We focus on the potential of new methods that integrate human genetic variation into protein structures to discover relationships to disease, including the discovery of mutational hotspots in cancer-related proteins, the localization of protein-altering variants within protein regions for common complex diseases, and the assessment of variants of unknown significance for Mendelian traits. We expect that approaches that integrate these data sources will play increasingly important roles in disease gene discovery and variant interpretation. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Bian Li
- Department of Biological Sciences and Center for Structural Biology, Vanderbilt University, Nashville, Tennessee, USA
| | - Bowen Jin
- Graduate Program in Systems Biology and Bioinformatics, Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio, USA
| | - John A Capra
- Bakar Computational Health Sciences Institute and Department of Epidemiology and Biostatistics, University of California, San Francisco, California, USA;
| | - William S Bush
- Cleveland Institute for Computational Biology, Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, USA;
| |
Collapse
|
22
|
Jin B, Capra JA, Benchek P, Wheeler N, Naj AC, Hamilton-Nelson KL, Farrell JJ, Leung YY, Kunkle B, Vadarajan B, Schellenberg GD, Mayeux R, Wang LS, Farrer LA, Pericak-Vance MA, Martin ER, Haines JL, Crawford DC, Bush WS. An association test of the spatial distribution of rare missense variants within protein structures identifies Alzheimer's disease-related patterns. Genome Res 2022; 32:778-790. [PMID: 35210353 PMCID: PMC8997344 DOI: 10.1101/gr.276069.121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 02/17/2022] [Indexed: 11/24/2022]
Abstract
More than 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer's Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best use missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure-based approach, protein optimized kernel evaluation of missense nucleotides (POKEMON), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2 and SORL1, two known Alzheimer's disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication data set and a validation data set with a larger sample size.
Collapse
Affiliation(s)
- Bowen Jin
- Graduate Program in Systems Biology and Bioinformatics, Department of Nutrition, School of Medicine, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - John A Capra
- The Bakar Computational Health Sciences Institute, Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California 94143, USA
| | - Penelope Benchek
- Cleveland Institute for Computational Biology, Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Nicholas Wheeler
- Cleveland Institute for Computational Biology, Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Adam C Naj
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Kara L Hamilton-Nelson
- The John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
| | - John J Farrell
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, Massachusetts 02118, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Brian Kunkle
- The John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
- Dr. John T. Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
| | - Badri Vadarajan
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Neurology, Gertrude H. Sergievsky Center, Department of Neurology, Columbia University, New York, New York 10032, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Richard Mayeux
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Department of Neurology, Gertrude H. Sergievsky Center, Department of Neurology, Columbia University, New York, New York 10032, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Lindsay A Farrer
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, Boston, Massachusetts 02118, USA
| | - Margaret A Pericak-Vance
- The John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
- Dr. John T. Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
| | - Eden R Martin
- The John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
- Dr. John T. Macdonald Foundation, Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida 33136, USA
| | - Jonathan L Haines
- Cleveland Institute for Computational Biology, Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Dana C Crawford
- Cleveland Institute for Computational Biology, Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - William S Bush
- Cleveland Institute for Computational Biology, Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
23
|
Analysis of missense variants in the human genome reveals widespread gene-specific clustering and improves prediction of pathogenicity. Am J Hum Genet 2022; 109:457-470. [PMID: 35120630 PMCID: PMC8948164 DOI: 10.1016/j.ajhg.2022.01.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Accepted: 01/11/2022] [Indexed: 12/11/2022] Open
Abstract
We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.
Collapse
|
24
|
Comprehensive patient-level classification and quantification of driver events in TCGA PanCanAtlas cohorts. PLoS Genet 2022; 18:e1009996. [PMID: 35030162 PMCID: PMC8759692 DOI: 10.1371/journal.pgen.1009996] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 12/14/2021] [Indexed: 12/14/2022] Open
Abstract
There is a growing need to develop novel therapeutics for targeted treatment of cancer. The prerequisite to success is the knowledge about which types of molecular alterations are predominantly driving tumorigenesis. To shed light onto this subject, we have utilized the largest database of human cancer mutations–TCGA PanCanAtlas, multiple established algorithms for cancer driver prediction (2020plus, CHASMplus, CompositeDriver, dNdScv, DriverNet, HotMAPS, OncodriveCLUSTL, OncodriveFML) and developed four novel computational pipelines: SNADRIF (Single Nucleotide Alteration DRIver Finder), GECNAV (Gene Expression-based Copy Number Alteration Validator), ANDRIF (ANeuploidy DRIver Finder) and PALDRIC (PAtient-Level DRIver Classifier). A unified workflow integrating all these pipelines, algorithms and datasets at cohort and patient levels was created. We have found that there are on average 12 driver events per tumour, of which 0.6 are single nucleotide alterations (SNAs) in oncogenes, 1.5 are amplifications of oncogenes, 1.2 are SNAs in tumour suppressors, 2.1 are deletions of tumour suppressors, 1.5 are driver chromosome losses, 1 is a driver chromosome gain, 2 are driver chromosome arm losses, and 1.5 are driver chromosome arm gains. The average number of driver events per tumour increases with age (from 7 to 15) and cancer stage (from 10 to 15) and varies strongly between cancer types (from 1 to 24). Patients with 1 and 7 driver events per tumour are the most frequent, and there are very few patients with more than 40 events. In tumours having only one driver event, this event is most often an SNA in an oncogene. However, with increasing number of driver events per tumour, the contribution of SNAs decreases, whereas the contribution of copy-number alterations and aneuploidy events increases. By analysing genomic and transcriptomic data from 10000 cancer patients through our custom-built computational pipelines and previously established third-party algorithms, we have found that half of all driver events in a patient’s tumour appear to be gains and losses of chromosomal arms and whole chromosomes. We therefore suggest that future therapeutics development efforts should be focused on targeting aneuploidy. We have also found that approximately a third of driver events in a patient are whole gene amplifications and deletions. Thus, therapies aimed at copy-number alterations also appear very promising. On the other hand, drugs aiming at point mutations are predicted to be less successful, as these alterations are responsible for just a couple of drivers per tumour. One notable exception are patients having only one driver event in their tumours, as this event is almost always a single nucleotide alteration in an oncogene.
Collapse
|
25
|
Porta-Pardo E, Ruiz-Serra V, Valentini S, Valencia A. The structural coverage of the human proteome before and after AlphaFold. PLoS Comput Biol 2022; 18:e1009818. [PMID: 35073311 PMCID: PMC8812986 DOI: 10.1371/journal.pcbi.1009818] [Citation(s) in RCA: 50] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2021] [Revised: 02/03/2022] [Accepted: 01/07/2022] [Indexed: 12/12/2022] Open
Abstract
The protein structure field is experiencing a revolution. From the increased throughput of techniques to determine experimental structures, to developments such as cryo-EM that allow us to find the structures of large protein complexes or, more recently, the development of artificial intelligence tools, such as AlphaFold, that can predict with high accuracy the folding of proteins for which the availability of homology templates is limited. Here we quantify the effect of the recently released AlphaFold database of protein structural models in our knowledge on human proteins. Our results indicate that our current baseline for structural coverage of 48%, considering experimentally-derived or template-based homology models, elevates up to 76% when including AlphaFold predictions. At the same time the fraction of dark proteome is reduced from 26% to just 10% when AlphaFold models are considered. Furthermore, although the coverage of disease-associated genes and mutations was near complete before AlphaFold release (69% of Clinvar pathogenic mutations and 88% of oncogenic mutations), AlphaFold models still provide an additional coverage of 3% to 13% of these critically important sets of biomedical genes and mutations. Finally, we show how the contribution of AlphaFold models to the structural coverage of non-human organisms, including important pathogenic bacteria, is significantly larger than that of the human proteome. Overall, our results show that the sequence-structure gap of human proteins has almost disappeared, an outstanding success of direct consequences for the knowledge on the human genome and the derived medical applications. Protein structures are key to understand many biological phenomena at the molecular scale: from the effects of genetic variation to how different proteins interact with each other to create molecular pathways that, together, have a biological function. Obtaining experimental structures, however, is extremely consuming in terms of both, time and resources. For this and other reasons, scientists have long worked to develop computational approaches that predict the structure of a protein using only its sequence as input. Recently, a group of scientists at Deepmind have developed AlphaFold2, a computational tool that is extremely accurate at this task. Moreover, they have used this tool to predict the structures of all human proteins. In this manuscript we provide an overview of the structural coverage of the human proteome before AlphaFold models were released and how much we have gained thanks to these models. We also show how the gain affects our understanding of human pathogenic variants, both germline and somatic. Finally, we provide evidence suggesting that the gain in non-human organisms is larger than for the human proteome, particularly in the case of bacteria.
Collapse
Affiliation(s)
- Eduard Porta-Pardo
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
- * E-mail: (EP-P); (AV)
| | - Victoria Ruiz-Serra
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
| | - Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento, Italy
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Institució Catalana de Recerca Avançada (ICREA), Barcelona, Spain
- * E-mail: (EP-P); (AV)
| |
Collapse
|
26
|
Zheng F, Kelly MR, Ramms DJ, Heintschel ML, Tao K, Tutuncuoglu B, Lee JJ, Ono K, Foussard H, Chen M, Herrington KA, Silva E, Liu S, Chen J, Churas C, Wilson N, Kratz A, Pillich RT, Patel DN, Park J, Kuenzi B, Yu MK, Licon K, Pratt D, Kreisberg JF, Kim M, Swaney DL, Nan X, Fraley SI, Gutkind JS, Krogan NJ, Ideker T. Interpretation of cancer mutations using a multiscale map of protein systems. Science 2021; 374:eabf3067. [PMID: 34591613 PMCID: PMC9126298 DOI: 10.1126/science.abf3067] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges—how to comprehensively map such systems and how to identify which are under mutational selection—have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis. We then developed a unified statistical model that pinpoints 395 specific systems under mutational selection across 13 cancer types. This map, called NeST (Nested Systems in Tumors), incorporates canonical processes and notable discoveries, including a PIK3CA-actomyosin complex that inhibits phosphatidylinositol 3-kinase signaling and recurrent mutations in collagen complexes that promote tumor proliferation. These systems can be used as clinical biomarkers and implicate a total of 548 genes in cancer evolution and progression. This work shows how disparate tumor mutations converge on protein assemblies at different scales.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Marcus R. Kelly
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dana J. Ramms
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Marissa L. Heintschel
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kai Tao
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Beril Tutuncuoglu
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - John J. Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Helene Foussard
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Michael Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Kari A. Herrington
- Department of Biochemistry and Biophysics Center for Advanced Light Microscopy at UCSF, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Erica Silva
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sophie Liu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Nicholas Wilson
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Anton Kratz
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Rudolf T. Pillich
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Devin N. Patel
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Brent Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Michael K. Yu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Licon
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F. Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Minkyu Kim
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Danielle L. Swaney
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Xiaolin Nan
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
- Knight Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Stephanie I. Fraley
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - J. Silvio Gutkind
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Nevan J. Krogan
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| |
Collapse
|
27
|
Chen F, Wendl MC, Wyczalkowski MA, Bailey MH, Li Y, Ding L. Moving pan-cancer studies from basic research toward the clinic. NATURE CANCER 2021; 2:879-890. [PMID: 35121865 DOI: 10.1038/s43018-021-00250-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 07/21/2021] [Indexed: 06/14/2023]
Abstract
Although all cancers share common hallmarks, we have long realized that there is no silver-bullet treatment for the disease. Many clinical oncologists specialize in a single cancer type, based predominantly on the tissue of origin. With advances brought by genetics and cancer genomic research, we now know that cancers are profoundly different, both in origins and in genetic alterations. At the same time, commonalities such as key driver mutations, altered pathways, mutational, immune and microbial signatures and other areas (many revealed by pan-cancer studies) point to the intriguing possibility of targeting common traits across diverse cancer types with the same therapeutic strategies. Studies designed to delineate differences and similarities across cancer types are thus critical in discerning the basic dynamics of oncogenesis, as well as informing diagnoses, prognoses and therapies. We anticipate growing emphases on the development and application of therapies targeting underlying commonalities of different cancer types, while tailoring to the unique tissue environment and intrinsic molecular fingerprints of each cancer type and subtype. Here we summarize the facets of pan-cancer research and how they are pushing progress toward personalized medicine.
Collapse
Affiliation(s)
- Feng Chen
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
- Department of Cell Biology and Physiology, Washington University in St. Louis, St. Louis, MO, USA
| | - Michael C Wendl
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA
- Department of Mathematics, Washington University in St. Louis, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA
| | - Matthew H Bailey
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT, USA
- Eccles Institute of Human Genetics, University of Utah, Salt Lake City, UT, USA
| | - Yize Li
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA
| | - Li Ding
- Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA.
- Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA.
- Department of Genetics, Washington University in St. Louis, St. Louis, MO, USA.
- McDonnell Genome Institute, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
28
|
Ozturk K, Carter H. Predicting functional consequences of mutations using molecular interaction network features. Hum Genet 2021; 141:1195-1210. [PMID: 34432150 PMCID: PMC8873243 DOI: 10.1007/s00439-021-02329-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Accepted: 07/31/2021] [Indexed: 12/13/2022]
Abstract
Variant interpretation remains a central challenge for precision medicine. Missense variants are particularly difficult to understand as they change only a single amino acid in a protein sequence yet can have large and varied effects on protein activity. Numerous tools have been developed to identify missense variants with putative disease consequences from protein sequence and structure. However, biological function arises through higher order interactions among proteins and molecules within cells. We therefore sought to capture information about the potential of missense mutations to perturb protein interaction networks by integrating protein structure and interaction data. We developed 16 network-based annotations for missense mutations that provide orthogonal information to features classically used to prioritize variants. We then evaluated them in the context of a proven machine-learning framework for variant effect prediction across multiple benchmark datasets to demonstrate their potential to improve variant classification. Interestingly, network features resulted in larger performance gains for classifying somatic mutations than for germline variants, possibly due to different constraints on what mutations are tolerated at the cellular versus organismal level. Our results suggest that modeling variant potential to perturb context-specific interactome networks is a fruitful strategy to advance in silico variant effect prediction.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.,Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA. .,Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA. .,Moores Cancer Center, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
29
|
Kan Y, Jiang L, Tang J, Guo Y, Guo F. A systematic view of computational methods for identifying driver genes based on somatic mutation data. Brief Funct Genomics 2021; 20:333-343. [PMID: 34312663 DOI: 10.1093/bfgp/elab032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 06/16/2021] [Accepted: 06/22/2021] [Indexed: 11/13/2022] Open
Abstract
Abnormal changes of driver genes are serious for human health and biomedical research. Identifying driver genes, exactly from enormous genes with mutations, promotes accurate diagnosis and treatment of cancer. A lot of works about uncovering driver genes have been developed over the past decades. By analyzing previous works, we find that computational methods are more efficient than traditional biological experiments when distinguishing driver genes from massive data. In this study, we summarize eight common computational algorithms only using somatic mutation data. We first group these methods into three categories according to mutation features they apply. Then, we conclude a general process of nominating candidate cancer driver genes. Finally, we evaluate three representative methods on 10 kinds of cancer derived from The Cancer Genome Atlas Program and five Chinese projects from the International Cancer Genome Consortium. In addition, we compare results of methods with various parameters. Evaluation is performed from four perspectives, including CGC, OG/TSG, Q-value and QQQuantile-Quantileplot. To sum up, we present algorithms using somatic mutation data in order to offer a systematic view of various mutation features and lay the foundation of methods based on integration of mutation information and other types of data.
Collapse
Affiliation(s)
- Yingxin Kan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
30
|
Padhi EM, Hayeck TJ, Cheng Z, Chatterjee S, Mannion BJ, Byrska-Bishop M, Willems M, Pinson L, Redon S, Benech C, Uguen K, Audebert-Bellanger S, Le Marechal C, Férec C, Efthymiou S, Rahman F, Maqbool S, Maroofian R, Houlden H, Musunuri R, Narzisi G, Abhyankar A, Hunter RD, Akiyama J, Fries LE, Ng JK, Mehinovic E, Stong N, Allen AS, Dickel DE, Bernier RA, Gorkin DU, Pennacchio LA, Zody MC, Turner TN. Coding and noncoding variants in EBF3 are involved in HADDS and simplex autism. Hum Genomics 2021; 15:44. [PMID: 34256850 PMCID: PMC8278787 DOI: 10.1186/s40246-021-00342-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Accepted: 06/17/2021] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Previous research in autism and other neurodevelopmental disorders (NDDs) has indicated an important contribution of protein-coding (coding) de novo variants (DNVs) within specific genes. The role of de novo noncoding variation has been observable as a general increase in genetic burden but has yet to be resolved to individual functional elements. In this study, we assessed whole-genome sequencing data in 2671 families with autism (discovery cohort of 516 families, replication cohort of 2155 families). We focused on DNVs in enhancers with characterized in vivo activity in the brain and identified an excess of DNVs in an enhancer named hs737. RESULTS We adapted the fitDNM statistical model to work in noncoding regions and tested enhancers for excess of DNVs in families with autism. We found only one enhancer (hs737) with nominal significance in the discovery (p = 0.0172), replication (p = 2.5 × 10-3), and combined dataset (p = 1.1 × 10-4). Each individual with a DNV in hs737 had shared phenotypes including being male, intact cognitive function, and hypotonia or motor delay. Our in vitro assessment of the DNVs showed they all reduce enhancer activity in a neuronal cell line. By epigenomic analyses, we found that hs737 is brain-specific and targets the transcription factor gene EBF3 in human fetal brain. EBF3 is genome-wide significant for coding DNVs in NDDs (missense p = 8.12 × 10-35, loss-of-function p = 2.26 × 10-13) and is widely expressed in the body. Through characterization of promoters bound by EBF3 in neuronal cells, we saw enrichment for binding to NDD genes (p = 7.43 × 10-6, OR = 1.87) involved in gene regulation. Individuals with coding DNVs have greater phenotypic severity (hypotonia, ataxia, and delayed development syndrome [HADDS]) in comparison to individuals with noncoding DNVs that have autism and hypotonia. CONCLUSIONS In this study, we identify DNVs in the hs737 enhancer in individuals with autism. Through multiple approaches, we find hs737 targets the gene EBF3 that is genome-wide significant in NDDs. By assessment of noncoding variation and the genes they affect, we are beginning to understand their impact on gene regulatory networks in NDDs.
Collapse
Affiliation(s)
- Evin M Padhi
- Department of Genetics, Washington University School of Medicine, 4523 Clayton Avenue, Campus Box 8232, St. Louis, MO, 63110, USA
| | - Tristan J Hayeck
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Zhang Cheng
- Center for Epigenomics, University of California San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA, 92093, USA
| | - Sumantra Chatterjee
- Center for Human Genetics and Genomics, NYU School of Medicine, New York, NY, 10016, USA
| | - Brandon J Mannion
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Marjolaine Willems
- University of Montpellier, département de Génétique, maladies rares médecine personnalisée, U 1298, CHU Montpellier, University of Montpellier, Montpellier, France
| | - Lucile Pinson
- University of Montpellier, département de Génétique, maladies rares médecine personnalisée, U 1298, CHU Montpellier, University of Montpellier, Montpellier, France
| | - Sylvia Redon
- CHU Brest, Inserm, Univ Brest, EFS,UMR 1078, GGB, F-29200, Brest, France
| | - Caroline Benech
- CHU Brest, Inserm, Univ Brest, EFS,UMR 1078, GGB, F-29200, Brest, France
| | - Kevin Uguen
- CHU Brest, Inserm, Univ Brest, EFS,UMR 1078, GGB, F-29200, Brest, France
| | | | - Cédric Le Marechal
- CHU Brest, Inserm, Univ Brest, EFS,UMR 1078, GGB, F-29200, Brest, France
| | - Claude Férec
- CHU Brest, Inserm, Univ Brest, EFS,UMR 1078, GGB, F-29200, Brest, France
| | - Stephanie Efthymiou
- Department of Neuromuscular Disorders, UCL Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Fatima Rahman
- Development and Behavioral Pediatrics Department, Institute of Child Health and Children Hospital, Lahore, Pakistan
| | - Shazia Maqbool
- Department of Neuromuscular Disorders, UCL Institute of Neurology, Queen Square, London, WC1N 3BG, UK
- Development and Behavioral Pediatrics Department, Institute of Child Health and Children Hospital, Lahore, Pakistan
| | - Reza Maroofian
- Department of Neuromuscular Disorders, UCL Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | - Henry Houlden
- Department of Neuromuscular Disorders, UCL Institute of Neurology, Queen Square, London, WC1N 3BG, UK
| | | | | | | | - Riana D Hunter
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Jennifer Akiyama
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Lauren E Fries
- Center for Human Genetics and Genomics, NYU School of Medicine, New York, NY, 10016, USA
| | - Jeffrey K Ng
- Department of Genetics, Washington University School of Medicine, 4523 Clayton Avenue, Campus Box 8232, St. Louis, MO, 63110, USA
| | - Elvisa Mehinovic
- Department of Genetics, Washington University School of Medicine, 4523 Clayton Avenue, Campus Box 8232, St. Louis, MO, 63110, USA
| | - Nick Stong
- Institute for Genomic Medicine, Columbia University, New York, NY, 10027, USA
| | - Andrew S Allen
- Center for Statistical Genetics and Genomics, Duke University, Durham, NC, 27708, USA
- Division of Integrative Genomics, Duke University, Durham, NC, 27708, USA
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, 27708, USA
| | - Diane E Dickel
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Raphael A Bernier
- Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, WA, 98195, USA
| | - David U Gorkin
- Center for Epigenomics, University of California San Diego School of Medicine, 9500 Gilman Drive, La Jolla, CA, 92093, USA
- Department of Biology, Emory University, Atlanta, GA, 30322, USA
| | - Len A Pennacchio
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA, 94598, USA
| | | | - Tychele N Turner
- Department of Genetics, Washington University School of Medicine, 4523 Clayton Avenue, Campus Box 8232, St. Louis, MO, 63110, USA.
| |
Collapse
|
31
|
Varela NM, Guevara-Ramírez P, Acevedo C, Zambrano T, Armendáriz-Castillo I, Guerrero S, Quiñones LA, López-Cortés A. A New Insight for the Identification of Oncogenic Variants in Breast and Prostate Cancers in Diverse Human Populations, With a Focus on Latinos. Front Pharmacol 2021; 12:630658. [PMID: 33912047 PMCID: PMC8072346 DOI: 10.3389/fphar.2021.630658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 01/25/2021] [Indexed: 12/24/2022] Open
Abstract
Background: Breast cancer (BRCA) and prostate cancer (PRCA) are the most commonly diagnosed cancer types in Latin American women and men, respectively. Although in recent years large-scale efforts from international consortia have focused on improving precision oncology, a better understanding of genomic features of BRCA and PRCA in developing regions and racial/ethnic minority populations is still required. Methods: To fill in this gap, we performed integrated in silico analyses to elucidate oncogenic variants from BRCA and PRCA driver genes; to calculate their deleteriousness scores and allele frequencies from seven human populations worldwide, including Latinos; and to propose the most effective therapeutic strategies based on precision oncology. Results: We analyzed 339,100 variants belonging to 99 BRCA and 82 PRCA driver genes and identified 18,512 and 15,648 known/predicted oncogenic variants, respectively. Regarding known oncogenic variants, we prioritized the most frequent and deleterious variants of BRCA (n = 230) and PRCA (n = 167) from Latino, African, Ashkenazi Jewish, East Asian, South Asian, European Finnish, and European non-Finnish populations, to incorporate them into pharmacogenomics testing. Lastly, we identified which oncogenic variants may shape the response to anti-cancer therapies, detailing the current status of pharmacogenomics guidelines and clinical trials involved in BRCA and PRCA cancer driver proteins. Conclusion: It is imperative to unify efforts where developing countries might invest in obtaining databases of genomic profiles of their populations, and developed countries might incorporate racial/ethnic minority populations in future clinical trials and cancer researches with the overall objective of fomenting pharmacogenomics in clinical practice and public health policies.
Collapse
Affiliation(s)
- Nelson M Varela
- Laboratory of Chemical Carcinogenesis and Pharmacogenetics, Department of Basic and Clinical Oncology, Faculty of Medicine, University of Chile, Santiago, Chile.,Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| | - Patricia Guevara-Ramírez
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| | - Cristian Acevedo
- Laboratory of Chemical Carcinogenesis and Pharmacogenetics, Department of Basic and Clinical Oncology, Faculty of Medicine, University of Chile, Santiago, Chile.,Department of Basic and Clinical Oncology, Clinical Hospital University of Chile, Santiago, Chile
| | - Tomás Zambrano
- Department of Medical Technology, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Isaac Armendáriz-Castillo
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Santiago Guerrero
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador
| | - Luis A Quiñones
- Laboratory of Chemical Carcinogenesis and Pharmacogenetics, Department of Basic and Clinical Oncology, Faculty of Medicine, University of Chile, Santiago, Chile.,Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain
| | - Andrés López-Cortés
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Madrid, Spain.,Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito, Ecuador.,Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
| |
Collapse
|
32
|
Mészáros B, Hajdu-Soltész B, Zeke A, Dosztányi Z. Mutations of Intrinsically Disordered Protein Regions Can Drive Cancer but Lack Therapeutic Strategies. Biomolecules 2021; 11:biom11030381. [PMID: 33806614 PMCID: PMC8000335 DOI: 10.3390/biom11030381] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/22/2021] [Accepted: 02/24/2021] [Indexed: 12/22/2022] Open
Abstract
Many proteins contain intrinsically disordered regions (IDRs) which carry out important functions without relying on a single well-defined conformation. IDRs are increasingly recognized as critical elements of regulatory networks and have been also associated with cancer. However, it is unknown whether mutations targeting IDRs represent a distinct class of driver events associated with specific molecular and system-level properties, cancer types and treatment options. Here, we used an integrative computational approach to explore the direct role of intrinsically disordered protein regions driving cancer. We showed that around 20% of cancer drivers are primarily targeted through a disordered region. These IDRs can function in multiple ways which are distinct from the functional mechanisms of ordered drivers. Disordered drivers play a central role in context-dependent interaction networks and are enriched in specific biological processes such as transcription, gene expression regulation and protein degradation. Furthermore, their modulation represents an alternative mechanism for the emergence of all known cancer hallmarks. Importantly, in certain cancer patients, mutations of disordered drivers represent key driving events. However, treatment options for such patients are currently severely limited. The presented study highlights a largely overlooked class of cancer drivers associated with specific cancer types that need novel therapeutic options.
Collapse
Affiliation(s)
- Bálint Mészáros
- Department of Biochemistry, ELTE Eötvös Loránd University, H-1117 Budapest, Hungary; (B.M.); (B.H.-S.)
- EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Borbála Hajdu-Soltész
- Department of Biochemistry, ELTE Eötvös Loránd University, H-1117 Budapest, Hungary; (B.M.); (B.H.-S.)
| | - András Zeke
- Institute of Enzymology, RCNS, P.O. Box 7, H-1518 Budapest, Hungary;
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös Loránd University, H-1117 Budapest, Hungary; (B.M.); (B.H.-S.)
- Correspondence: ; Tel.: +36-1-372 2500/8537
| |
Collapse
|
33
|
Comprehensive characterization of protein-protein interactions perturbed by disease mutations. Nat Genet 2021; 53:342-353. [PMID: 33558758 DOI: 10.1038/s41588-020-00774-y] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 12/22/2020] [Indexed: 02/07/2023]
Abstract
Technological and computational advances in genomics and interactomics have made it possible to identify how disease mutations perturb protein-protein interaction (PPI) networks within human cells. Here, we show that disease-associated germline variants are significantly enriched in sequences encoding PPI interfaces compared to variants identified in healthy participants from the projects 1000 Genomes and ExAC. Somatic missense mutations are also significantly enriched in PPI interfaces compared to noninterfaces in 10,861 tumor exomes. We computationally identified 470 putative oncoPPIs in a pan-cancer analysis and demonstrate that oncoPPIs are highly correlated with patient survival and drug resistance/sensitivity. We experimentally validate the network effects of 13 oncoPPIs using a systematic binary interaction assay, and also demonstrate the functional consequences of two of these on tumor cell growth. In summary, this human interactome network framework provides a powerful tool for prioritization of alleles with PPI-perturbing mutations to inform pathobiological mechanism- and genotype-based therapeutic discovery.
Collapse
|
34
|
Li Y, Dong YP, Qian YW, Yu LX, Wen W, Cui XL, Wang HY. Identification of important genes and drug repurposing based on clinical-centered analysis across human cancers. Acta Pharmacol Sin 2021; 42:282-289. [PMID: 32555508 DOI: 10.1038/s41401-020-0451-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 05/30/2020] [Indexed: 01/19/2023] Open
Abstract
Identification of the functional impact of mutated and altered genes in cancer is critical for implementing precision oncology and drug repurposing. In recent years, the emergence of multiomics data from large, well-characterized patient cohorts has provided us with an unprecedented opportunity to address this problem. In this study, we investigated survival-associated genes across 26 cancer types and found that these genes tended to be hub genes and had higher K-core values in biological networks. Moreover, the genes associated with adverse outcomes were mainly enriched in pathways related to genetic information processing and cellular processes, while the genes with favorable outcomes were enriched in metabolism and immune regulation pathways. We proposed using the number of survival-related neighbors to assess the impact of mutations. In addition, by integrating other databases including the Human Protein Atlas and the DrugBank database, we predicted novel targets and anticancer drugs using the drug repurposing strategy. Our results illustrated the significance of multidimensional analysis of clinical data in important gene identification and drug development.
Collapse
|
35
|
Chen S, He X, Li R, Duan X, Niu B. HotSpot3D web server: an integrated resource for mutation analysis in protein 3D structures. Bioinformatics 2020; 36:3944-3946. [PMID: 32315389 DOI: 10.1093/bioinformatics/btaa258] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Revised: 03/25/2020] [Accepted: 04/15/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION HotSpot3D is a widely used software for identifying mutation hotspots on the 3D structures of proteins. To further assist users, we developed a new HotSpot3D web server to make this software more versatile, convenient and interactive. RESULTS The HotSpot3D web server performs data pre-processing, clustering, visualization and log-viewing on one stop. Users can interactively explore each cluster and easily re-visualize the mutational clusters within browsers. We also provide a database that allows users to search and visualize proximal mutations from 33 cancers in the Cancer Genome Atlas. AVAILABILITY AND IMPLEMENTATION http://niulab.scgrid.cn/HotSpot3D/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shanyu Chen
- Computer Network Information Center, Chinese Academy of Sciences.,University of Chinese Academy of Sciences, Beijing 100190, China
| | - Xiaoyu He
- Computer Network Information Center, Chinese Academy of Sciences.,University of Chinese Academy of Sciences, Beijing 100190, China
| | - Ruilin Li
- Computer Network Information Center, Chinese Academy of Sciences
| | - Xiaohong Duan
- ChosenMed Technology (Beijing) Co. Ltd, Beijing 100176, China
| | - Beifang Niu
- Computer Network Information Center, Chinese Academy of Sciences.,University of Chinese Academy of Sciences, Beijing 100190, China.,ChosenMed Technology (Beijing) Co. Ltd, Beijing 100176, China
| |
Collapse
|
36
|
Abbaspourkharyeki M, Anvekar NJ, Ramachandra NB. The Possible Role of Point Mutations and Activation of the CDC27 Gene in Progression of Multiple Myeloma. Meta Gene 2020. [DOI: 10.1016/j.mgene.2020.100761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
37
|
Porta‐Pardo E, Valencia A, Godzik A. Understanding oncogenicity of cancer driver genes and mutations in the cancer genomics era. FEBS Lett 2020; 594:4233-4246. [PMID: 32239503 PMCID: PMC7529711 DOI: 10.1002/1873-3468.13781] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2019] [Revised: 01/23/2020] [Accepted: 02/09/2020] [Indexed: 12/12/2022]
Abstract
One of the key challenges of cancer biology is to catalogue and understand the somatic genomic alterations leading to cancer. Although alternative definitions and search methods have been developed to identify cancer driver genes and mutations, analyses of thousands of cancer genomes return a remarkably similar catalogue of around 300 genes that are mutated in at least one cancer type. Yet, many features of these genes and their role in cancer remain unclear, first and foremost when a somatic mutation is truly oncogenic. In this review, we first summarize some of the recent efforts in completing the catalogue of cancer driver genes. Then, we give an overview of different aspects that influence the oncogenicity of somatic mutations in the core cancer driver genes, including their interactions with the germline genome, other cancer driver mutations, the immune system, or their potential role in healthy tissues. In the coming years, this research holds promise to illuminate how, when, and why cancer driver genes and mutations are really drivers, and thereby move personalized cancer medicine and targeted therapies forward.
Collapse
Affiliation(s)
- Eduard Porta‐Pardo
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
- Josep Carreras Leukaemia Research Institute (IJC)BadalonaSpain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC)BarcelonaSpain
- Institucio Catalana de Recerca I Estudis Avançats (ICREA)BarcelonaSpain
| | - Adam Godzik
- Division of Biomedical SciencesUniversity of California Riverside School of MedicineRiversideCAUSA
| |
Collapse
|
38
|
Martinez-Ledesma E, Flores D, Trevino V. Computational methods for detecting cancer hotspots. Comput Struct Biotechnol J 2020; 18:3567-3576. [PMID: 33304455 PMCID: PMC7711189 DOI: 10.1016/j.csbj.2020.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 11/12/2020] [Accepted: 11/13/2020] [Indexed: 12/14/2022] Open
Abstract
Cancer mutations that are recurrently observed among patients are known as hotspots. Hotspots are highly relevant because they are, presumably, likely functional. Known hotspots in BRAF, PIK3CA, TP53, KRAS, IDH1 support this idea. However, hundreds of hotspots have never been validated experimentally. The detection of hotspots nevertheless is challenging because background mutations obscure their statistical and computational identification. Although several algorithms have been applied to identify hotspots, they have not been reviewed before. Thus, in this mini-review, we summarize more than 40 computational methods applied to detect cancer hotspots in coding and non-coding DNA. We first organize the methods in cluster-based, 3D, position-specific, and miscellaneous to provide a general overview. Then, we describe their embed procedures, implementations, variations, and differences. Finally, we discuss some advantages, provide some ideas for future developments, and mention opportunities such as application to viral integrations, translocations, and epigenetics.
Collapse
Affiliation(s)
- Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| | - David Flores
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
- Universidad del Caribe, Departamento de Ciencias Básicas e Ingenierías, Cancún, Quintana Roo, Mexico
| | - Victor Trevino
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| |
Collapse
|
39
|
Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc Natl Acad Sci U S A 2020; 117:28201-28211. [PMID: 33106425 PMCID: PMC7668189 DOI: 10.1073/pnas.2002660117] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Recent large-scale sequencing efforts have enabled the detection of millions of missense variants. Elucidating their functional effect is of crucial importance but challenging. We approach this problem by performing a wide-scale characterization of missense variants from 1,330 disease-associated genes using >14,000 protein structures. We identify 3D features associated with pathogenic and benign variants that unveiled the mutations’ effect at the molecular level. We further extend our analysis to account for the different essential structural regions in proteins performing different functions. By analyzing variants from 24 gene groups encoding for different protein functional families, we capture function-specific characteristics of missense variants, which match the experimental readouts. We show that our results derived using structural data will effectively inform variant interpretation. Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms.
Collapse
|
40
|
Khalighi S, Singh S, Varadan V. Untangling a complex web: Computational analyses of tumor molecular profiles to decode driver mechanisms. J Genet Genomics 2020; 47:595-609. [PMID: 33423960 PMCID: PMC7902422 DOI: 10.1016/j.jgg.2020.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Revised: 11/04/2020] [Accepted: 11/14/2020] [Indexed: 12/19/2022]
Abstract
Genome-scale studies focusing on molecular profiling of cancers across tissue types have revealed a plethora of aberrations across the genomic, transcriptomic, and epigenomic scales. The significant molecular heterogeneity across individual tumors even within the same tissue context complicates decoding the key etiologic mechanisms of this disease. Furthermore, it is increasingly likely that biologic mechanisms underlying the pathobiology of cancer involve multiple molecular entities interacting across functional scales. This has motivated the development of computational approaches that integrate molecular measurements with prior biological knowledge in increasingly intricate ways to enable the discovery of driver genomic aberrations across cancers. Here, we review diverse methodological approaches that have powered significant advances in our understanding of the genomic underpinnings of cancer at the cohort and at the individual tumor scales. We outline the key advances and challenges in the computational discovery of cancer mechanisms while motivating the development of systems biology approaches to comprehensively decode the biologic drivers of this complex disease.
Collapse
Affiliation(s)
- Sirvan Khalighi
- Division of General Medical Sciences-Oncology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Salendra Singh
- Division of General Medical Sciences-Oncology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Vinay Varadan
- Division of General Medical Sciences-Oncology, Case Comprehensive Cancer Center, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA.
| |
Collapse
|
41
|
Martínez-Jiménez F, Muiños F, Sentís I, Deu-Pons J, Reyes-Salazar I, Arnedo-Pac C, Mularoni L, Pich O, Bonet J, Kranas H, Gonzalez-Perez A, Lopez-Bigas N. A compendium of mutational cancer driver genes. Nat Rev Cancer 2020; 20:555-572. [PMID: 32778778 DOI: 10.1038/s41568-020-0290-x] [Citation(s) in RCA: 503] [Impact Index Per Article: 125.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/02/2020] [Indexed: 12/11/2022]
Abstract
A fundamental goal in cancer research is to understand the mechanisms of cell transformation. This is key to developing more efficient cancer detection methods and therapeutic approaches. One milestone towards this objective is the identification of all the genes with mutations capable of driving tumours. Since the 1970s, the list of cancer genes has been growing steadily. Because cancer driver genes are under positive selection in tumorigenesis, their observed patterns of somatic mutations across tumours in a cohort deviate from those expected from neutral mutagenesis. These deviations, which constitute signals of positive selection, may be detected by carefully designed bioinformatics methods, which have become the state of the art in the identification of driver genes. A systematic approach combining several of these signals could lead to a compendium of mutational cancer genes. In this Review, we present the Integrative OncoGenomics (IntOGen) pipeline, an implementation of such an approach to obtain the compendium of mutational cancer drivers. Its application to somatic mutations of more than 28,000 tumours of 66 cancer types reveals 568 cancer genes and points towards their mechanisms of tumorigenesis. The application of this approach to the ever-growing datasets of somatic tumour mutations will support the continuous refinement of our knowledge of the genetic basis of cancer.
Collapse
Affiliation(s)
- Francisco Martínez-Jiménez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Ferran Muiños
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Inés Sentís
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jordi Deu-Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Iker Reyes-Salazar
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Claudia Arnedo-Pac
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Loris Mularoni
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Oriol Pich
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Jose Bonet
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Hanna Kranas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain.
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain.
- Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain.
- Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.
| |
Collapse
|
42
|
Tang ZZ, Sliwoski GR, Chen G, Jin B, Bush WS, Li B, Capra JA. PSCAN: Spatial scan tests guided by protein structures improve complex disease gene discovery and signal variant detection. Genome Biol 2020; 21:217. [PMID: 32847609 PMCID: PMC7448521 DOI: 10.1186/s13059-020-02121-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 07/27/2020] [Indexed: 12/25/2022] Open
Abstract
Germline disease-causing variants are generally more spatially clustered in protein 3-dimensional structures than benign variants. Motivated by this tendency, we develop a fast and powerful protein-structure-based scan (PSCAN) approach for evaluating gene-level associations with complex disease and detecting signal variants. We validate PSCAN's performance on synthetic data and two real data sets for lipid traits and Alzheimer's disease. Our results demonstrate that PSCAN performs competitively with existing gene-level tests while increasing power and identifying more specific signal variant sets. Furthermore, PSCAN enables generation of hypotheses about the molecular basis for the associations in the context of protein structures and functional domains.
Collapse
Affiliation(s)
- Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, 53715 WI USA
- Wisconsin Institute for Discovery, Madison, 53715 WI USA
| | - Gregory R. Sliwoski
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, 37232 TN USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, 53715 WI USA
| | - Bowen Jin
- Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106 OH USA
| | - William S. Bush
- Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106 OH USA
- Institute for Computational Biology, Case Western Reserve University, Cleveland, 44106 OH USA
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Vanderbilt University Medical Center, Nashville, 37232 TN USA
| | - John A. Capra
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, 37232 TN USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, 37232 TN USA
- Departments of Biological Sciences and Computer Science, Vanderbilt University, Nashville, 37232 TN USA
- Center for Structural Biology, Vanderbilt University, Nashville, 37232 TN USA
| |
Collapse
|
43
|
Kobren SN, Chazelle B, Singh M. PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities. Cell Syst 2020; 11:63-74.e7. [PMID: 32711844 PMCID: PMC7493809 DOI: 10.1016/j.cels.2020.06.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 02/23/2020] [Accepted: 06/05/2020] [Indexed: 12/12/2022]
Abstract
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event. A fast, analytical framework called PertInInt enables efficient integration of multiple measures of protein site functionality—including interaction, domain, and evolutionary conservation—with gene-level mutation data in order to rapidly detect cancer driver genes along with their disrupted functionalities.
Collapse
Affiliation(s)
- Shilpa Nadimpalli Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
44
|
Arnedo-Pac C, Mularoni L, Muiños F, Gonzalez-Perez A, Lopez-Bigas N. OncodriveCLUSTL: a sequence-based clustering method to identify cancer drivers. Bioinformatics 2020; 35:4788-4790. [PMID: 31228182 PMCID: PMC6853674 DOI: 10.1093/bioinformatics/btz501] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 04/25/2019] [Accepted: 06/18/2019] [Indexed: 12/12/2022] Open
Abstract
Motivation Identification of the genomic alterations driving tumorigenesis is one of the main goals in oncogenomics research. Given the evolutionary principles of cancer development, computational methods that detect signals of positive selection in the pattern of tumor mutations have been effectively applied in the search for cancer genes. One of these signals is the abnormal clustering of mutations, which has been shown to be complementary to other signals in the detection of driver genes. Results We have developed OncodriveCLUSTL, a new sequence-based clustering algorithm to detect significant clustering signals across genomic regions. OncodriveCLUSTL is based on a local background model derived from the simulation of mutations accounting for the composition of tri- or penta-nucleotide context substitutions observed in the cohort under study. Our method can identify known clusters and bona-fide cancer drivers across cohorts of tumor whole-exomes, outperforming the existing OncodriveCLUST algorithm and complementing other methods based on different signals of positive selection. Our results indicate that OncodriveCLUSTL can be applied to the analysis of non-coding genomic elements and non-human mutations data. Availability and implementation OncodriveCLUSTL is available as an installable Python 3.5 package. The source code and running examples are freely available at https://bitbucket.org/bbglab/oncodriveclustl under GNU Affero General Public License. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claudia Arnedo-Pac
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Loris Mularoni
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Ferran Muiños
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain
| | - Abel Gonzalez-Perez
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain
| | - Nuria Lopez-Bigas
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology (BIST), Barcelona, Spain.,Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona 08010, Spain
| |
Collapse
|
45
|
Loveday C, Litchfield K, Proszek PZ, Cornish AJ, Santo F, Levy M, Macintyre G, Holryod A, Broderick P, Dudakia D, Benton B, Bakir MA, Hiley C, Grist E, Swanton C, Huddart R, Powles T, Chowdhury S, Shipley J, O'Connor S, Brenton JD, Reid A, de Castro DG, Houlston RS, Turnbull C. Genomic landscape of platinum resistant and sensitive testicular cancers. Nat Commun 2020; 11:2189. [PMID: 32366847 PMCID: PMC7198558 DOI: 10.1038/s41467-020-15768-x] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 03/23/2020] [Indexed: 12/11/2022] Open
Abstract
While most testicular germ cell tumours (TGCTs) exhibit exquisite sensitivity to platinum chemotherapy, ~10% are platinum resistant. To gain insight into the underlying mechanisms, we undertake whole exome sequencing and copy number analysis in 40 tumours from 26 cases with platinum-resistant TGCT, and combine this with published genomic data on an additional 624 TGCTs. We integrate analyses for driver mutations, mutational burden, global, arm-level and focal copy number (CN) events, and SNV and CN signatures. Albeit preliminary and observational in nature, these analyses provide support for a possible mechanistic link between early driver mutations in RAS and KIT and the widespread copy number events by which TGCT is characterised.
Collapse
Affiliation(s)
- Chey Loveday
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Kevin Litchfield
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
| | - Paula Z Proszek
- The Centre for Molecular Pathology, The Royal Marsden NHS Trust, Sutton, London, UK
| | - Alex J Cornish
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Flavia Santo
- The Centre for Molecular Pathology, The Royal Marsden NHS Trust, Sutton, London, UK
| | - Max Levy
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Geoff Macintyre
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Amy Holryod
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Peter Broderick
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Darshna Dudakia
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Barbara Benton
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
| | - Maise Al Bakir
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
| | - Crispin Hiley
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
| | - Emily Grist
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Charles Swanton
- Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK
- Cancer Research UK Lung Cancer Centre of Excellence, UCL Cancer Institute, London, UK
- Translational Cancer Therapeutics Laboratory, UCL Cancer Institute, London, UK
| | - Robert Huddart
- Academic Radiotherapy Unit, Institute of Cancer Research, London, UK
| | - Tom Powles
- Barts Cancer Institute, Queen Mary University, London, UK
| | - Simon Chowdhury
- Department of Oncology, Guys and St Thomas' NHS Foundation Trust, London, UK
| | - Janet Shipley
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
- Division of Cancer Therapeutics, The Institute of Cancer Research, London, UK
| | - Simon O'Connor
- The Centre for Molecular Pathology, The Royal Marsden NHS Trust, Sutton, London, UK
- Addenbrooke's Hospital, Cambridge, UK
- Department of Oncology, University of Cambridge, Cambridge, UK
| | - James D Brenton
- Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK
| | - Alison Reid
- Academic Uro-oncology Unit, The Royal Marsden NHS Foundation Trust, Sutton, London, UK
| | | | - Richard S Houlston
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK
- Division of Molecular Pathology, The Institute of Cancer Research, London, UK
| | - Clare Turnbull
- Division of Genetics & Epidemiology, The Institute of Cancer Research, London, UK.
- William Harvey Research Institute, Queen Mary University, London, UK.
- Guys and St Thomas' NHS Foundation Trust, Great Maze Pond, London, UK.
- Public Health England, National Cancer Registration and Analysis Service, London, UK.
| |
Collapse
|
46
|
Saito Y, Koya J, Araki M, Kogure Y, Shingaki S, Tabata M, McClure MB, Yoshifuji K, Matsumoto S, Isaka Y, Tanaka H, Kanai T, Miyano S, Shiraishi Y, Okuno Y, Kataoka K. Landscape and function of multiple mutations within individual oncogenes. Nature 2020; 582:95-99. [PMID: 32494066 DOI: 10.1038/s41586-020-2175-2] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Accepted: 02/13/2020] [Indexed: 01/09/2023]
Abstract
Sporadic reports have described cancer cases in which multiple driver mutations (MMs) occur in the same oncogene1,2. However, the overall landscape and relevance of MMs remain elusive. Here we carried out a pan-cancer analysis of 60,954 cancer samples, and identified 14 pan-cancer and 6 cancer-type-specific oncogenes in which MMs occur more frequently than expected: 9% of samples with at least one mutation in these genes harboured MMs. In various oncogenes, MMs are preferentially present in cis and show markedly different mutational patterns compared with single mutations in terms of type (missense mutations versus in-frame indels), position and amino-acid substitution, suggesting a cis-acting effect on mutational selection. MMs show an overrepresentation of functionally weak, infrequent mutations, which confer enhanced oncogenicity in combination. Cells with MMs in the PIK3CA and NOTCH1 genes exhibit stronger dependencies on the mutated genes themselves, enhanced downstream signalling activation and/or greater sensitivity to inhibitory drugs than those with single mutations. Together oncogenic MMs are a relatively common driver event, providing the underlying mechanism for clonal selection of suboptimal mutations that are individually rare but collectively account for a substantial proportion of oncogenic mutations.
Collapse
Affiliation(s)
- Yuki Saito
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan.,Department of Gastroenterology, Keio University School of Medicine, Tokyo, Japan
| | - Junji Koya
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Mitsugu Araki
- Department of Clinical System Onco-Informatics, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yasunori Kogure
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Sumito Shingaki
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Mariko Tabata
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan.,Department of Urology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Marni B McClure
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan
| | - Kota Yoshifuji
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan.,Department of Hematology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Shigeyuki Matsumoto
- Medical Sciences Innovation Hub Program, RIKEN Cluster for Science, Technology and Innovation Hub, Yokohama, Japan
| | - Yuta Isaka
- Research and Development Group for In Silico Drug Discovery, Center for Cluster Development and Coordination, Foundation for Biomedical Research and Innovation, Kobe, Japan
| | - Hiroko Tanaka
- Laboratory of Sequence Analysis, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Takanori Kanai
- Department of Gastroenterology, Keio University School of Medicine, Tokyo, Japan
| | - Satoru Miyano
- Laboratory of Sequence Analysis, Human Genome Center, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | - Yuichi Shiraishi
- Center for Cancer Genomics and Advanced Therapeutics, National Cancer Center, Tokyo, Japan
| | - Yasushi Okuno
- Department of Clinical System Onco-Informatics, Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Keisuke Kataoka
- Division of Molecular Oncology, National Cancer Center Research Institute, Tokyo, Japan.
| |
Collapse
|
47
|
Chen H, Li J, Wang Y, Ng PKS, Tsang YH, Shaw KR, Mills GB, Liang H. Comprehensive assessment of computational algorithms in predicting cancer driver mutations. Genome Biol 2020; 21:43. [PMID: 32079540 PMCID: PMC7033911 DOI: 10.1186/s13059-020-01954-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 02/07/2020] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient's tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. RESULTS We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. CONCLUSIONS Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.
Collapse
Affiliation(s)
- Hu Chen
- Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine, Houston, TX, 77030, USA.,Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Jun Li
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Yumeng Wang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Patrick Kwok-Shing Ng
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Yiu Huen Tsang
- Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University, Portland, OR, 97239, USA
| | - Kenna R Shaw
- Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Gordon B Mills
- Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University, Portland, OR, 97239, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA. .,Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA.
| |
Collapse
|
48
|
Trevino V. HotSpotAnnotations-a database for hotspot mutations and annotations in cancer. Database (Oxford) 2020; 2020:baaa025. [PMID: 32386297 PMCID: PMC7211031 DOI: 10.1093/database/baaa025] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 02/20/2020] [Accepted: 03/11/2020] [Indexed: 12/21/2022]
Abstract
Hotspots, recurrently mutated DNA positions in cancer, are thought to be oncogenic drivers because random chance is unlikely and the knowledge of clear examples of oncogenic hotspots in genes like BRAF, IDH1, KRAS and NRAS among many other genes. Hotspots are attractive because provide opportunities for biomedical research and novel treatments. Nevertheless, recent evidence, such as DNA hairpins for APOBEC3A, suggests that a considerable fraction of hotspots seem to be passengers rather than drivers. To document hotspots, the database HotSpotsAnnotations is proposed. For this, a statistical model was implemented to detect putative hotspots, which was applied to TCGA cancer datasets covering 33 cancer types, 10 182 patients and 3 175 929 mutations. Then, genes and hotspots were annotated by two published methods (APOBEC3A hairpins and dN/dS ratio) that may inform and warn researchers about possible false functional hotspots. Moreover, manual annotation from users can be added and shared. From the 23 198 detected as possible hotspots, 4435 were selected after false discovery rate correction and minimum mutation count. From these, 305 were annotated as likely for APOBEC3A whereas 442 were annotated as unlikely. To date, this is the first database dedicated to annotating hotspots for possible false functional hotspots.
Collapse
Affiliation(s)
- Victor Trevino
- Tecnologico de Monterrey, Escuela de Medicina, Cátedra de Bioinformática, Morones Prieto No. 3000, Colonia Los Doctores, Monterrey, Nuevo León 64710, Mexico
| |
Collapse
|
49
|
Shao XM, Bhattacharya R, Huang J, Sivakumar IKA, Tokheim C, Zheng L, Hirsch D, Kaminow B, Omdahl A, Bonsack M, Riemer AB, Velculescu VE, Anagnostou V, Pagel KA, Karchin R. High-Throughput Prediction of MHC Class I and II Neoantigens with MHCnuggets. Cancer Immunol Res 2019; 8:396-408. [PMID: 31871119 DOI: 10.1158/2326-6066.cir-19-0464] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Revised: 10/08/2019] [Accepted: 12/20/2019] [Indexed: 02/04/2023]
Abstract
Computational prediction of binding between neoantigen peptides and major histocompatibility complex (MHC) proteins can be used to predict patient response to cancer immunotherapy. Current neoantigen predictors focus on in silico estimation of MHC binding affinity and are limited by low predictive value for actual peptide presentation, inadequate support for rare MHC alleles, and poor scalability to high-throughput data sets. To address these limitations, we developed MHCnuggets, a deep neural network method that predicts peptide-MHC binding. MHCnuggets can predict binding for common or rare alleles of MHC class I or II with a single neural network architecture. Using a long short-term memory network (LSTM), MHCnuggets accepts peptides of variable length and is faster than other methods. When compared with methods that integrate binding affinity and MHC-bound peptide (HLAp) data from mass spectrometry, MHCnuggets yields a 4-fold increase in positive predictive value on independent HLAp data. We applied MHCnuggets to 26 cancer types in The Cancer Genome Atlas, processing 26.3 million allele-peptide comparisons in under 2.3 hours, yielding 101,326 unique predicted immunogenic missense mutations (IMM). Predicted IMM hotspots occurred in 38 genes, including 24 driver genes. Predicted IMM load was significantly associated with increased immune cell infiltration (P < 2 × 10-16), including CD8+ T cells. Only 0.16% of predicted IMMs were observed in more than 2 patients, with 61.7% of these derived from driver mutations. Thus, we describe a method for neoantigen prediction and its performance characteristics and demonstrate its utility in data sets representing multiple human cancers.
Collapse
Affiliation(s)
- Xiaoshan M Shao
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Rohit Bhattacharya
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - Justin Huang
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Computer Science, Johns Hopkins University, Baltimore, Maryland
| | - I K Ashok Sivakumar
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Computer Science, Johns Hopkins University, Baltimore, Maryland.,Applied Physics Laboratory, Johns Hopkins University, Laurel, Maryland
| | - Collin Tokheim
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Lily Zheng
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Dylan Hirsch
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Benjamin Kaminow
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Ashton Omdahl
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Maria Bonsack
- Immunotherapy and Immunoprevention, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Molecular Vaccine Design, German Center for Infection Research (DZIF), partner site Heidelberg, Heidelberg, Germany.,Faculty of Biosciences, Heidelberg University, Heidelberg, Germany
| | - Angelika B Riemer
- Immunotherapy and Immunoprevention, German Cancer Research Center (DKFZ), Heidelberg, Germany.,Molecular Vaccine Design, German Center for Infection Research (DZIF), partner site Heidelberg, Heidelberg, Germany
| | - Victor E Velculescu
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.,The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Valsamo Anagnostou
- The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Kymberleigh A Pagel
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland.,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Rachel Karchin
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, Maryland. .,Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland.,The Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, Maryland
| |
Collapse
|
50
|
Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures. Proc Natl Acad Sci U S A 2019; 116:18962-18970. [PMID: 31462496 PMCID: PMC6754584 DOI: 10.1073/pnas.1901156116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue-residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.
Collapse
|