1
|
Soleymani S, Gravel N, Huang LC, Yeung W, Bozorgi E, Bendzunas NG, Kochut KJ, Kannan N. Dark kinase annotation, mining, and visualization using the Protein Kinase Ontology. PeerJ 2023; 11:e16087. [PMID: 38077442 PMCID: PMC10704995 DOI: 10.7717/peerj.16087] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 08/22/2023] [Indexed: 12/18/2023] Open
Abstract
The Protein Kinase Ontology (ProKinO) is an integrated knowledge graph that conceptualizes the complex relationships among protein kinase sequence, structure, function, and disease in a human and machine-readable format. In this study, we have significantly expanded ProKinO by incorporating additional data on expression patterns and drug interactions. Furthermore, we have developed a completely new browser from the ground up to render the knowledge graph visible and interactive on the web. We have enriched ProKinO with new classes and relationships that capture information on kinase ligand binding sites, expression patterns, and functional features. These additions extend ProKinO's capabilities as a discovery tool, enabling it to uncover novel insights about understudied members of the protein kinase family. We next demonstrate the application of ProKinO. Specifically, through graph mining and aggregate SPARQL queries, we identify the p21-activated protein kinase 5 (PAK5) as one of the most frequently mutated dark kinases in human cancers with abnormal expression in multiple cancers, including a previously unappreciated role in acute myeloid leukemia. We have identified recurrent oncogenic mutations in the PAK5 activation loop predicted to alter substrate binding and phosphorylation. Additionally, we have identified common ligand/drug binding residues in PAK family kinases, underscoring ProKinO's potential application in drug discovery. The updated ontology browser and the addition of a web component, ProtVista, which enables interactive mining of kinase sequence annotations in 3D structures and Alphafold models, provide a valuable resource for the signaling community. The updated ProKinO database is accessible at https://prokino.uga.edu.
Collapse
Affiliation(s)
- Saber Soleymani
- Department of Computer Science, University of Georgia, Athens, GA, United States
| | - Nathan Gravel
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Liang-Chin Huang
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Wayland Yeung
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
| | - Elika Bozorgi
- Department of Computer Science, University of Georgia, Athens, GA, United States
| | - Nathaniel G. Bendzunas
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States
| | - Krzysztof J. Kochut
- Department of Computer Science, University of Georgia, Athens, GA, United States
| | - Natarajan Kannan
- Institute of Bioinformatics, University of Georgia, Athens, GA, United States
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, United States
| |
Collapse
|
2
|
Li MM, Awasthi S, Ghosh S, Bisht D, Coban Akdemir ZH, Sheynkman GM, Sahni N, Yi SS. Gain-of-Function Variomics and Multi-omics Network Biology for Precision Medicine. Methods Mol Biol 2023; 2660:357-372. [PMID: 37191809 PMCID: PMC10476052 DOI: 10.1007/978-1-0716-3163-8_24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Traditionally, disease causal mutations were thought to disrupt gene function. However, it becomes more clear that many deleterious mutations could exhibit a "gain-of-function" (GOF) behavior. Systematic investigation of such mutations has been lacking and largely overlooked. Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in disease. Elucidating the functional pathways rewired by GOF mutations will be crucial for prioritizing disease-causing variants and their resultant therapeutic liabilities. In distinct cell types (with varying genotypes), precise signal transduction controls cell decision, including gene regulation and phenotypic output. When signal transduction goes awry due to GOF mutations, it would give rise to various disease types. Quantitative and molecular understanding of network perturbations by GOF mutations may provide explanations for 'missing heritability" in previous genome-wide association studies. We envision that it will be instrumental to push current paradigm toward a thorough functional and quantitative modeling of all GOF mutations and their mechanistic molecular events involved in disease development and progression. Many fundamental questions pertaining to genotype-phenotype relationships remain unresolved. For example, which GOF mutations are key for gene regulation and cellular decisions? What are the GOF mechanisms at various regulation levels? How do interaction networks undergo rewiring upon GOF mutations? Is it possible to leverage GOF mutations to reprogram signal transduction in cells, aiming to cure disease? To begin to address these questions, we will cover a wide range of topics regarding GOF disease mutations and their characterization by multi-omic networks. We highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks. We also discuss advances in bioinformatic and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of GOF mutations.
Collapse
Affiliation(s)
- Mark M Li
- Livestrong Cancer Institutes, Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Sharad Awasthi
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Sumanta Ghosh
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Deepa Bisht
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zeynep H Coban Akdemir
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Gloria M Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA, USA
- Center for Public Health Genomics, and UVA Comprehensive Cancer Center, University of Virginia, Charlottesville, VA, USA
| | - Nidhi Sahni
- Department of Epigenetics and Molecular Carcinogenesis, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
- Quantitative and Computational Biosciences Program, Baylor College of Medicine, Houston, TX, USA.
| | - S Stephen Yi
- Livestrong Cancer Institutes, Department of Oncology, Dell Medical School, The University of Texas at Austin, Austin, TX, USA.
- Oden Institute for Computational Engineering and Sciences (ICES), The University of Texas at Austin, Austin, TX, USA.
- Department of Biomedical Engineering, Cockrell School of Engineering, The University of Texas at Austin, Austin, TX, USA.
- Interdisciplinary Life Sciences Graduate Programs (ILSGP), College of Natural Sciences, The University of Texas at Austin, Austin, TX, USA.
| |
Collapse
|
3
|
Huang KL, Scott AD, Zhou DC, Wang LB, Weerasinghe A, Elmas A, Liu R, Wu Y, Wendl MC, Wyczalkowski MA, Baral J, Sengupta S, Lai CW, Ruggles K, Payne SH, Raphael B, Fenyö D, Chen K, Mills G, Ding L. Spatially interacting phosphorylation sites and mutations in cancer. Nat Commun 2021; 12:2313. [PMID: 33875650 PMCID: PMC8055881 DOI: 10.1038/s41467-021-22481-w] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 02/17/2021] [Indexed: 11/18/2022] Open
Abstract
Advances in mass-spectrometry have generated increasingly large-scale proteomics datasets containing tens of thousands of phosphorylation sites (phosphosites) that require prioritization. We develop a bioinformatics tool called HotPho and systematically discover 3D co-clustering of phosphosites and cancer mutations on protein structures. HotPho identifies 474 such hybrid clusters containing 1255 co-clustering phosphosites, including RET p.S904/Y928, the conserved HRAS/KRAS p.Y96, and IDH1 p.Y139/IDH2 p.Y179 that are adjacent to recurrent mutations on protein structures not found by linear proximity approaches. Hybrid clusters, enriched in histone and kinase domains, frequently include expression-associated mutations experimentally shown as activating and conferring genetic dependency. Approximately 300 co-clustering phosphosites are verified in patient samples of 5 cancer types or previously implicated in cancer, including CTNNB1 p.S29/Y30, EGFR p.S720, MAPK1 p.S142, and PTPN12 p.S275. In summary, systematic 3D clustering analysis highlights nearly 3,000 likely functional mutations and over 1000 cancer phosphosites for downstream investigation and evaluation of potential clinical relevance.
Collapse
Affiliation(s)
- Kuan-Lin Huang
- Department of Genetics and Genomics, Tisch Cancer Institute, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
| | - Adam D Scott
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Daniel Cui Zhou
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Liang-Bo Wang
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Amila Weerasinghe
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Abdulkadir Elmas
- Department of Genetics and Genomics, Tisch Cancer Institute, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ruiyang Liu
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Yige Wu
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Michael C Wendl
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Matthew A Wyczalkowski
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Jessika Baral
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Sohini Sengupta
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA
| | - Chin-Wen Lai
- Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO, USA
| | - Kelly Ruggles
- Center for Health Informatics and Bioinformatics, New York University School of Medicine, New York, NY, USA
| | - Samuel H Payne
- Department of Biology, Brigham Young University, Provo, UT, USA
| | - Benjamin Raphael
- Lewis-Sigler Institute, Princeton University, Princeton, NJ, USA
| | - David Fenyö
- Center for Health Informatics and Bioinformatics, New York University School of Medicine, New York, NY, USA
| | - Ken Chen
- Departments of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Gordon Mills
- Knight Cancer Institute, Oregon Health & Science University, Portland, OR, USA
| | - Li Ding
- Department of Medicine, McDonnell Genome Institute, Department of Genetics, Siteman Cancer Center, Washington University in St. Louis, St. Louis, MO, USA.
| |
Collapse
|
4
|
Sarkar A, Yang Y, Vihinen M. Variation benchmark datasets: update, criteria, quality and applications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5710862. [PMID: 32016318 PMCID: PMC6997940 DOI: 10.1093/database/baz117] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]
Abstract
Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench
Collapse
Affiliation(s)
- Anasua Sarkar
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| | - Yang Yang
- School of Computer Science and Technology, Soochow University, No1. Shizi Street, Suzhou, 215006 Jiangsu, China.,Provincial Key Laboratory for Computer Information Processing Technology, No1. Shizi Street, Soochow University, Suzhou, 215006 Jiangsu, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| |
Collapse
|
5
|
Savage SR, Zhang B. Using phosphoproteomics data to understand cellular signaling: a comprehensive guide to bioinformatics resources. Clin Proteomics 2020; 17:27. [PMID: 32676006 PMCID: PMC7353784 DOI: 10.1186/s12014-020-09290-x] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 07/04/2020] [Indexed: 12/19/2022] Open
Abstract
Mass spectrometry-based phosphoproteomics is becoming an essential methodology for the study of global cellular signaling. Numerous bioinformatics resources are available to facilitate the translation of phosphopeptide identification and quantification results into novel biological and clinical insights, a critical step in phosphoproteomics data analysis. These resources include knowledge bases of kinases and phosphatases, phosphorylation sites, kinase inhibitors, and sequence variants affecting kinase function, and bioinformatics tools that can predict phosphorylation sites in addition to the kinase that phosphorylates them, infer kinase activity, and predict the effect of mutations on kinase signaling. However, these resources exist in silos and it is challenging to select among multiple resources with similar functions. Therefore, we put together a comprehensive collection of resources related to phosphoproteomics data interpretation, compared the use of tools with similar functions, and assessed the usability from the standpoint of typical biologists or clinicians. Overall, tools could be improved by standardization of enzyme names, flexibility of data input and output format, consistent maintenance, and detailed manuals.
Collapse
Affiliation(s)
- Sara R. Savage
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN USA
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX USA
| | - Bing Zhang
- Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA
| |
Collapse
|
6
|
Ochoa S, Martínez-Pérez E, Zea DJ, Molina-Vila MA, Marino-Buslje C. Comutation and exclusion analysis in human tumors: A tool for cancer biology studies and for rational selection of multitargeted therapeutic approaches. Hum Mutat 2019; 40:413-425. [PMID: 30629309 DOI: 10.1002/humu.23705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 12/20/2018] [Accepted: 01/03/2019] [Indexed: 11/11/2022]
Abstract
Malignant tumors originate from somatic mutations and other genomic and epigenomic alterations, which lead to loss of control of the cellular circuitry. These alterations present patterns of co-occurrence and mutual exclusivity that can influence prognosis and modify response to drugs, highlighting the need for multitargeted therapies. Studies in this area have generally focused in particular malignancies and considered whole genes instead of specific mutations, ignoring the fact that different alterations in the same gene can have widely different effects. Here, we present a comprehensive analysis of co-dependencies of individual somatic mutations in the whole spectrum of human tumors. Combining multitesting with conditional and expected mutational probabilities, we have discovered rules governing the codependencies of driver and nondriver mutations. We also uncovered pairs and networks of comutations and exclusions, some of them restricted to certain cancer types and others widespread. These pairs and networks are not only of basic but also of clinical interest, and can be of help in the selection of multitargeted antitumor therapies. In this respect, recurrent driver comutations suggest combinations of drugs that might be effective in the clinical setting, while recurrent exclusions indicate combinations unlikely to be useful.
Collapse
Affiliation(s)
- Soledad Ochoa
- Fundación Instituto Leloir, Avda. Patricias Argentinas 435, Buenos Aires, Argentina
| | | | - Diego Javier Zea
- Fundación Instituto Leloir, Avda. Patricias Argentinas 435, Buenos Aires, Argentina
| | - Miguel Angel Molina-Vila
- Laboratory of Onchology, Hospital Universitario Quirón Dexeus, C/Sabino Arana 5-19, 08028, Barcelona, Spain
| | | |
Collapse
|
7
|
Computational Approaches to Prioritize Cancer Driver Missense Mutations. Int J Mol Sci 2018; 19:ijms19072113. [PMID: 30037003 PMCID: PMC6073793 DOI: 10.3390/ijms19072113] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/02/2018] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Cancer is a complex disease that is driven by genetic alterations. There has been a rapid development of genome-wide techniques during the last decade along with a significant lowering of the cost of gene sequencing, which has generated widely available cancer genomic data. However, the interpretation of genomic data and the prediction of the association of genetic variations with cancer and disease phenotypes still requires significant improvement. Missense mutations, which can render proteins non-functional and provide a selective growth advantage to cancer cells, are frequently detected in cancer. Effects caused by missense mutations can be pinpointed by in silico modeling, which makes it more feasible to find a treatment and reverse the effect. Specific human phenotypes are largely determined by stability, activity, and interactions between proteins and other biomolecules that work together to execute specific cellular functions. Therefore, analysis of missense mutations’ effects on proteins and their complexes would provide important clues for identifying functionally important missense mutations, understanding the molecular mechanisms of cancer progression and facilitating treatment and prevention. Herein, we summarize the major computational approaches and tools that provide not only the classification of missense mutations as cancer drivers or passengers but also the molecular mechanisms induced by driver mutations. This review focuses on the discussion of annotation and prediction methods based on structural and biophysical data, analysis of somatic cancer missense mutations in 3D structures of proteins and their complexes, predictions of the effects of missense mutations on protein stability, protein-protein and protein-nucleic acid interactions, and assessment of conformational changes in protein conformations induced by mutations.
Collapse
|
8
|
Rodrigues CHM, Ascher DB, Pires DEV. Kinact: a computational approach for predicting activating missense mutations in protein kinases. Nucleic Acids Res 2018; 46:W127-W132. [PMID: 29788456 PMCID: PMC6031004 DOI: 10.1093/nar/gky375] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 04/15/2018] [Accepted: 04/28/2018] [Indexed: 12/31/2022] Open
Abstract
Protein phosphorylation is tightly regulated due to its vital role in many cellular processes. While gain of function mutations leading to constitutive activation of protein kinases are known to be driver events of many cancers, the identification of these mutations has proven challenging. Here we present Kinact, a novel machine learning approach for predicting kinase activating missense mutations using information from sequence and structure. By adapting our graph-based signatures, Kinact represents both structural and sequence information, which are used as evidence to train predictive models. We show the combination of structural and sequence features significantly improved the overall accuracy compared to considering either primary or tertiary structure alone, highlighting their complementarity. Kinact achieved a precision of 87% and 94% and Area Under ROC Curve of 0.89 and 0.92 on 10-fold cross-validation, and on blind tests, respectively, outperforming well established tools (P < 0.01). We further show that Kinact performs equally well on homology models built using templates with sequence identity as low as 33%. Kinact is freely available as a user-friendly web server at http://biosig.unimelb.edu.au/kinact/.
Collapse
Affiliation(s)
- Carlos HM Rodrigues
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne
| | - David B Ascher
- Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne
- Department of Biochemistry, University of Cambridge
- Instituto René Rachou, Fundação Oswaldo Cruz
| | | |
Collapse
|
9
|
Kalaivani R, Reema R, Srinivasan N. Recognition of sites of functional specialisation in all known eukaryotic protein kinase families. PLoS Comput Biol 2018; 14:e1005975. [PMID: 29438395 PMCID: PMC5826538 DOI: 10.1371/journal.pcbi.1005975] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2017] [Revised: 02/26/2018] [Accepted: 01/13/2018] [Indexed: 11/25/2022] Open
Abstract
The conserved function of protein phosphorylation, catalysed by members of protein kinase superfamily, is regulated in different ways in different kinase families. Further, differences in activating triggers, cellular localisation, domain architecture and substrate specificity between kinase families are also well known. While the transfer of γ-phosphate from ATP to the hydroxyl group of Ser/Thr/Tyr is mediated by a conserved Asp, the characteristic functional and regulatory sites are specialized at the level of families or sub-families. Such family-specific sites of functional specialization are unknown for most families of kinases. In this work, we systematically identify the family-specific residue features by comparing the extent of conservation of physicochemical properties, Shannon entropy and statistical probability of residue distributions between families of kinases. An integrated discriminatory score, which combines these three features, is developed to demarcate the functionally specialized sites in a kinase family from other sites. We achieved an area under ROC curve of 0.992 for the discrimination of kinase families. Our approach was extensively tested on well-studied families CDK and MAPK, wherein specific protein interaction sites and substrate recognition sites were successfully detected (p-value < 0.05). We also find that the known family-specific oncogenic driver mutation sites were scored high by our method. The method was applied to all known kinases encompassing 107 families from diverse eukaryotic organisms leading to a comprehensive list of family-specific functional sites. Apart from other uses, our method facilitates identification of specific protein interaction sites and drug target sites in a kinase family.
Collapse
Affiliation(s)
- Raju Kalaivani
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | - Raju Reema
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | |
Collapse
|
10
|
Analysis of somatic mutations across the kinome reveals loss-of-function mutations in multiple cancer types. Sci Rep 2017; 7:6418. [PMID: 28743916 PMCID: PMC5527104 DOI: 10.1038/s41598-017-06366-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Accepted: 06/13/2017] [Indexed: 12/17/2022] Open
Abstract
In this study we use somatic cancer mutations to identify important functional residues within sets of related genes. We focus on protein kinases, a superfamily of phosphotransferases that share homologous sequences and structural motifs and have many connections to cancer. We develop several statistical tests for identifying Significantly Mutated Positions (SMPs), which are positions in an alignment with mutations that show signs of selection. We apply our methods to 21,917 mutations that map to the alignment of human kinases and identify 23 SMPs. SMPs occur throughout the alignment, with many in the important A-loop region, and others spread between the N and C lobes of the kinase domain. Since mutations are pooled across the superfamily, these positions may be important to many protein kinases. We select eleven mutations from these positions for functional validation. All eleven mutations cause a reduction or loss of function in the affected kinase. The tested mutations are from four genes, including two tumor suppressors (TGFBR1 and CHEK2) and two oncogenes (KDR and ERBB2). They also represent multiple cancer types, and include both recurrent and non-recurrent events. Many of these mutations warrant further investigation as potential cancer drivers.
Collapse
|
11
|
Li M, Goncearenco A, Panchenko AR. Annotating Mutational Effects on Proteins and Protein Interactions: Designing Novel and Revisiting Existing Protocols. Methods Mol Biol 2017; 1550:235-260. [PMID: 28188534 PMCID: PMC5388446 DOI: 10.1007/978-1-4939-6747-6_17] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
In this review we describe a protocol to annotate the effects of missense mutations on proteins, their functions, stability, and binding. For this purpose we present a collection of the most comprehensive databases which store different types of sequencing data on missense mutations, we discuss their relationships, possible intersections, and unique features. Next, we suggest an annotation workflow using the state-of-the art methods and highlight their usability, advantages, and limitations for different cases. Finally, we address a particularly difficult problem of deciphering the molecular mechanisms of mutations on proteins and protein complexes to understand the origins and mechanisms of diseases.
Collapse
Affiliation(s)
- Minghui Li
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Anna R Panchenko
- National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
12
|
Unsupervised detection of cancer driver mutations with parsimony-guided learning. Nat Genet 2016; 48:1288-94. [PMID: 27618449 DOI: 10.1038/ng.3658] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Accepted: 08/05/2016] [Indexed: 02/08/2023]
Abstract
Methods are needed to reliably prioritize biologically active driver mutations over inactive passengers in high-throughput sequencing cancer data sets. We present ParsSNP, an unsupervised functional impact predictor that is guided by parsimony. ParsSNP uses an expectation-maximization framework to find mutations that explain tumor incidence broadly, without using predefined training labels that can introduce biases. We compare ParsSNP to five existing tools (CanDrA, CHASM, FATHMM Cancer, TransFIC, and Condel) across five distinct benchmarks. ParsSNP outperformed the existing tools in 24 of 25 comparisons. To investigate the real-world benefit of these improvements, we applied ParsSNP to an independent data set of 30 patients with diffuse-type gastric cancer. ParsSNP identified many known and likely driver mutations that other methods did not detect, including truncation mutations in known tumor suppressors and the recurrent driver substitution RHOA p.Tyr42Cys. In conclusion, ParsSNP uses an innovative, parsimony-based approach to prioritize cancer driver mutations and provides dramatic improvements over existing methods.
Collapse
|
13
|
Pons T, Vazquez M, Matey-Hernandez ML, Brunak S, Valencia A, Izarzugaza JM. KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily. BMC Genomics 2016; 17 Suppl 2:396. [PMID: 27357839 PMCID: PMC4928150 DOI: 10.1186/s12864-016-2723-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Background The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease. Results KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified. A public implementation of KinMutRF, including documentation and examples, is available online (http://kinmut2.bioinfo.cnio.es). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2. Conclusions KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2723-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Tirso Pons
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - Miguel Vazquez
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - María Luisa Matey-Hernandez
- Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kemitorvet, Building 208, 2800 Kgs., Lyngby, Denmark
| | - Søren Brunak
- Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kemitorvet, Building 208, 2800 Kgs., Lyngby, Denmark.,Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Blegdamsvej 3A, 2200, Copenhagen, Denmark
| | - Alfonso Valencia
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernández Almagro, 3, 28029, Madrid, Spain
| | - Jose Mg Izarzugaza
- Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kemitorvet, Building 208, 2800 Kgs., Lyngby, Denmark.
| |
Collapse
|
14
|
Damodaran S, Miya J, Kautto E, Zhu E, Samorodnitsky E, Datta J, Reeser JW, Roychowdhury S. Cancer Driver Log (CanDL): Catalog of Potentially Actionable Cancer Mutations. J Mol Diagn 2016; 17:554-9. [PMID: 26320871 DOI: 10.1016/j.jmoldx.2015.05.002] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2015] [Revised: 04/12/2015] [Accepted: 05/04/2015] [Indexed: 12/28/2022] Open
Abstract
Massively parallel sequencing technologies have enabled characterization of genomic alterations across multiple tumor types. Efforts have focused on identifying driver mutations because they represent potential targets for therapy. However, because of the presence of driver and passenger mutations, it is often challenging to assign the clinical relevance of specific mutations observed in patients. Currently, there are multiple databases and tools that provide in silico assessment for potential drivers; however, there is no comprehensive resource for mutations with functional characterization. Therefore, we created an expert-curated database of potentially actionable driver mutations for molecular pathologists to facilitate annotation of cancer genomic testing. We reviewed scientific literature to identify variants that have been functionally characterized in vitro or in vivo as driver mutations. We obtained the chromosome location and all possible nucleotide positions for each amino acid change and uploaded them to the Cancer Driver Log (CanDL) database with associated literature reference indicating functional driver evidence. In addition to a simple interface, the database allows users to download all or selected genes as a comma-separated values file for incorporation into their own analysis pipeline. Furthermore, the database includes a mechanism for third-party contributions to support updates for novel driver mutations. Overall, this freely available database will facilitate rapid annotation of cancer genomic testing in molecular pathology laboratories for mutations.
Collapse
Affiliation(s)
- Senthilkumar Damodaran
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University, Columbus, Ohio
| | - Jharna Miya
- Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio
| | - Esko Kautto
- Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio
| | - Eliot Zhu
- Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio
| | | | - Jharna Datta
- Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio
| | - Julie W Reeser
- Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio
| | - Sameek Roychowdhury
- Division of Medical Oncology, Department of Internal Medicine, The Ohio State University, Columbus, Ohio; Comprehensive Cancer Center, The Ohio State University, Columbus, Ohio; Department of Pharmacology, The Ohio State University, Columbus, Ohio.
| |
Collapse
|
15
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|
16
|
Vazquez M, Pons T, Brunak S, Valencia A, Izarzugaza JMG. wKinMut-2: Identification and Interpretation of Pathogenic Variants in Human Protein Kinases. Hum Mutat 2015; 37:36-42. [PMID: 26443060 DOI: 10.1002/humu.22914] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2015] [Accepted: 09/22/2015] [Indexed: 12/31/2022]
Abstract
Most genomic alterations are tolerated while only a minor fraction disrupts molecular function sufficiently to drive disease. Protein kinases play a central biological function and the functional consequences of their variants are abundantly characterized. However, this heterogeneous information is often scattered across different sources, which makes the integrative analysis complex and laborious. wKinMut-2 constitutes a solution to facilitate the interpretation of the consequences of human protein kinase variation. Nine methods predict their pathogenicity, including a kinase-specific random forest approach. To understand the biological mechanisms causative of human diseases and cancer, information from pertinent reference knowledge bases and the literature is automatically mined, digested, and homogenized. Variants are visualized in their structural contexts and residues affecting catalytic and drug binding are identified. Known protein-protein interactions are reported. Altogether, this information is intended to assist the generation of new working hypothesis to be corroborated with ulterior experimental work. The wKinMut-2 system, along with a user manual and examples, is freely accessible at http://kinmut2.bioinfo.cnio.es, the code for local installations can be downloaded from https://github.com/Rbbt-Workflows/KinMut2.
Collapse
Affiliation(s)
- Miguel Vazquez
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Tirso Pons
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Søren Brunak
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen 2200, Denmark.,Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kongens Lyngby 2800, Denmark
| | - Alfonso Valencia
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, 28029, Spain
| | - Jose M G Izarzugaza
- Center for Biological Sequence Analysis (CBS), Systems Biology Department, Technical University of Denmark (DTU), Kongens Lyngby 2800, Denmark
| |
Collapse
|
17
|
Anoosha P, Huang LT, Sakthivel R, Karunagaran D, Gromiha MM. Discrimination of driver and passenger mutations in epidermal growth factor receptor in cancer. Mutat Res 2015; 780:24-34. [PMID: 26264175 DOI: 10.1016/j.mrfmmm.2015.07.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 05/21/2015] [Accepted: 07/07/2015] [Indexed: 06/04/2023]
Abstract
Cancer is one of the most life-threatening diseases and mutations in several genes are the vital cause in tumorigenesis. Protein kinases play essential roles in cancer progression and specifically, epidermal growth factor receptor (EGFR) is an important target for cancer therapy. In this work, we have developed a method to classify single amino acid polymorphisms (SAPs) in EGFR into disease-causing (driver) and neutral (passenger) mutations using both sequence and structure based features of the mutation site by machine learning approaches. We compiled a set of 222 features and selected a set of 21 properties utilizing feature selection methods, for maximizing the prediction performance. In a set of 540 mutants, we obtained an overall classification accuracy of 67.8% with 10 fold cross validation using support vector machines. Further, the mutations have been grouped into four sets based on secondary structure and accessible surface area, which enhanced the overall classification accuracy to 80.2%, 81.9%, 77.9% and 75.1% for helix, strand, coil-buried and coil-exposed mutants, respectively. The method was tested with a blind dataset of 60 mutations, which showed an average accuracy of 85.4%. These accuracy levels are superior to other methods available in the literature for EGFR mutants, with an increase of more than 30%. Moreover, we have screened all possible single amino acid polymorphisms (SAPs) in EGFR and suggested the probable driver and passenger mutations, which would help in the development of mutation specific drugs for cancer treatment.
Collapse
Affiliation(s)
- P Anoosha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - Liang-Tsung Huang
- Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan
| | - R Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - D Karunagaran
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600 036, Tamil Nadu, India.
| |
Collapse
|
18
|
Kumar RD, Searleman AC, Swamidass SJ, Griffith OL, Bose R. Statistically identifying tumor suppressors and oncogenes from pan-cancer genome-sequencing data. Bioinformatics 2015. [PMID: 26209800 DOI: 10.1093/bioinformatics/btv430] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Several tools exist to identify cancer driver genes based on somatic mutation data. However, these tools do not account for subclasses of cancer genes: oncogenes, which undergo gain-of-function events, and tumor suppressor genes (TSGs) which undergo loss-of-function. A method which accounts for these subclasses could improve performance while also suggesting a mechanism of action for new putative cancer genes. RESULTS We develop a panel of five complementary statistical tests and assess their performance against a curated set of 99 HiConf cancer genes using a pan-cancer dataset of 1.7 million mutations. We identify patient bias as a novel signal for cancer gene discovery, and use it to significantly improve detection of oncogenes over existing methods (AUROC = 0.894). Additionally, our test of truncation event rate separates oncogenes and TSGs from one another (AUROC = 0.922). Finally, a random forest integrating the five tests further improves performance and identifies new cancer genes, including CACNG3, HDAC2, HIST1H1E, NXF1, GPS2 and HLA-DRB1. AVAILABILITY AND IMPLEMENTATION All mutation data, instructions, functions for computing the statistics and integrating them, as well as the HiConf gene panel, are available at www.github.com/Bose-Lab/Improved-Detection-of-Cancer-Genes. CONTACT rbose@dom.wustl.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Runjun D Kumar
- Division of Oncology, Department of Medicine, Washington University School of Medicine, Computational and Systems Biology Program, Washington University in St Louis
| | - Adam C Searleman
- Division of Oncology, Department of Medicine, Washington University School of Medicine
| | - S Joshua Swamidass
- Computational and Systems Biology Program, Washington University in St Louis, Department of Pathology and Immunology, Washington University School of Medicine and
| | - Obi L Griffith
- Division of Oncology, Department of Medicine, Washington University School of Medicine, Division of Oncology, Department of Medicine, Washington University School of Medicine
| | - Ron Bose
- Division of Oncology, Department of Medicine, Washington University School of Medicine
| |
Collapse
|