1
|
Martinez-Ledesma E, Flores D, Trevino V. Computational methods for detecting cancer hotspots. Comput Struct Biotechnol J 2020; 18:3567-3576. [PMID: 33304455 PMCID: PMC7711189 DOI: 10.1016/j.csbj.2020.11.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Revised: 11/12/2020] [Accepted: 11/13/2020] [Indexed: 12/14/2022] Open
Abstract
Cancer mutations that are recurrently observed among patients are known as hotspots. Hotspots are highly relevant because they are, presumably, likely functional. Known hotspots in BRAF, PIK3CA, TP53, KRAS, IDH1 support this idea. However, hundreds of hotspots have never been validated experimentally. The detection of hotspots nevertheless is challenging because background mutations obscure their statistical and computational identification. Although several algorithms have been applied to identify hotspots, they have not been reviewed before. Thus, in this mini-review, we summarize more than 40 computational methods applied to detect cancer hotspots in coding and non-coding DNA. We first organize the methods in cluster-based, 3D, position-specific, and miscellaneous to provide a general overview. Then, we describe their embed procedures, implementations, variations, and differences. Finally, we discuss some advantages, provide some ideas for future developments, and mention opportunities such as application to viral integrations, translocations, and epigenetics.
Collapse
Affiliation(s)
- Emmanuel Martinez-Ledesma
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| | - David Flores
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
- Universidad del Caribe, Departamento de Ciencias Básicas e Ingenierías, Cancún, Quintana Roo, Mexico
| | - Victor Trevino
- Tecnologico de Monterrey, Escuela de Medicina y Ciencias de la Salud, Bioinformática y Diagnóstico Clínico, Monterrey, Nuevo León, Mexico
| |
Collapse
|
2
|
Kobren SN, Chazelle B, Singh M. PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities. Cell Syst 2020; 11:63-74.e7. [PMID: 32711844 PMCID: PMC7493809 DOI: 10.1016/j.cels.2020.06.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 02/23/2020] [Accepted: 06/05/2020] [Indexed: 12/12/2022]
Abstract
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event. A fast, analytical framework called PertInInt enables efficient integration of multiple measures of protein site functionality—including interaction, domain, and evolutionary conservation—with gene-level mutation data in order to rapidly detect cancer driver genes along with their disrupted functionalities.
Collapse
Affiliation(s)
- Shilpa Nadimpalli Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
3
|
Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures. Proc Natl Acad Sci U S A 2019; 116:18962-18970. [PMID: 31462496 PMCID: PMC6754584 DOI: 10.1073/pnas.1901156116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue-residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.
Collapse
|
4
|
Malhotra S, Alsulami AF, Heiyun Y, Ochoa BM, Jubb H, Forbes S, Blundell TL. Understanding the impacts of missense mutations on structures and functions of human cancer-related genes: A preliminary computational analysis of the COSMIC Cancer Gene Census. PLoS One 2019; 14:e0219935. [PMID: 31323058 PMCID: PMC6641202 DOI: 10.1371/journal.pone.0219935] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Accepted: 07/03/2019] [Indexed: 12/12/2022] Open
Abstract
Genomics and genome screening are proving central to the study of cancer. However, a good appreciation of the protein structures coded by cancer genes is also invaluable, especially for the understanding of functions, for assessing ligandability of potential targets, and for designing new drugs. To complement the wealth of information on the genetics of cancer in COSMIC, the most comprehensive database for cancer somatic mutations available, structural information obtained experimentally has been brought together recently in COSMIC-3D. Even where structural information is available for a gene in the Cancer Gene Census, a list of genes in COSMIC with substantial evidence supporting their impacts in cancer, this information is quite often for a single domain in a larger protein or for a single protomer in a multiprotein assembly. Here, we show that over 60% of the genes included in the Cancer Gene Census are predicted to possess multiple domains. Many are also multicomponent and membrane-associated molecular assemblies, with mutations recorded in COSMIC affecting such assemblies. However, only 469 of the gene products have a structure represented in the PDB, and of these only 87 structures have 90–100% coverage over the sequence and 69 have less than 10% coverage. As a first step to bridging gaps in our knowledge in the many cases where individual protein structures and domains are lacking, we discuss our attempts of protein structure modelling using our pipeline and investigating the effects of mutations using two of our in-house methods (SDM2 and mCSM) and identifying potential driver mutations. This allows us to begin to understand the effects of mutations not only on protein stability but also on protein-protein, protein-ligand and protein-nucleic acid interactions. In addition, we consider ways to combine the structural information with the wealth of mutation data available in COSMIC. We discuss the impacts of COSMIC missense mutations on protein structure in order to identify and assess the molecular consequences of cancer-driving mutations.
Collapse
Affiliation(s)
- Sony Malhotra
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (SM); (TLB)
| | - Ali F. Alsulami
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | - Yang Heiyun
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
| | | | - Harry Jubb
- Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Simon Forbes
- Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Tom L. Blundell
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom
- * E-mail: (SM); (TLB)
| |
Collapse
|
5
|
Functional characterization of 3D protein structures informed by human genetic diversity. Proc Natl Acad Sci U S A 2019; 116:8960-8965. [PMID: 30988206 PMCID: PMC6500140 DOI: 10.1073/pnas.1820813116] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Sequence variation data of the human proteome can be used to analyze 3D protein structures to derive functional insights. We used genetic variant data from nearly 140,000 individuals to analyze 3D positional conservation in 4,715 proteins and 3,951 homology models using 860,292 missense and 465,886 synonymous variants. Sixty percent of protein structures harbor at least one intolerant 3D site as defined by significant depletion of observed over expected missense variation. Structural intolerance data correlated with deep mutational scanning functional readouts for PPARG, MAPK1/ERK2, UBE2I, SUMO1, PTEN, CALM1, CALM2, and TPK1 and with shallow mutagenesis data for 1,026 proteins. The 3D structural intolerance analysis revealed different features for ligand binding pockets and orthosteric and allosteric sites. Large-scale data on human genetic variation support a definition of functional 3D sites proteome-wide.
Collapse
|
6
|
Ashford P, Pang CSM, Moya-García AA, Adeyelu T, Orengo CA. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci Rep 2019; 9:263. [PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/13/2018] [Indexed: 12/31/2022] Open
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.
Collapse
Affiliation(s)
- Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Aurelio A Moya-García
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.,Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, Málaga, Spain
| | - Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
7
|
Spacial models of malfunctioned protein complexes help to elucidate signal transduction critical for insulin release. Biosystems 2018; 177:48-55. [PMID: 30395892 DOI: 10.1016/j.biosystems.2018.11.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Revised: 10/30/2018] [Accepted: 11/01/2018] [Indexed: 12/14/2022]
Abstract
Mutations in gene KCNJ11 encoding the Kir6.2 subunit of the ATP-sensitive potassium channel (KATP), a representative of a quite complex biosystem, may affect insulin release from pancreatic beta-cells. Both gain and loss of channel activity are observed, which lead to varied clinical phenotypes ranging from neonatal diabetes to congenital hyperinsulinism. In order to understand the mechanisms of the channel function better we mapped, based on the literature review, known medically relevant Kir6.2/SUR1 mutations into recently (2017) determined CryoEM 3D structures of this complex. We used a clustering algorithm to find hots spots in the 3D structure, thus we may hypothesize about their nano-mechanical role in the channel gating and the insulin level control. We also adapted a simple model of the channel gating to cover all currently known factors that can influence the KATP biosystem functions.
Collapse
|
8
|
Computational Approaches to Prioritize Cancer Driver Missense Mutations. Int J Mol Sci 2018; 19:ijms19072113. [PMID: 30037003 PMCID: PMC6073793 DOI: 10.3390/ijms19072113] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 07/02/2018] [Accepted: 07/05/2018] [Indexed: 12/31/2022] Open
Abstract
Cancer is a complex disease that is driven by genetic alterations. There has been a rapid development of genome-wide techniques during the last decade along with a significant lowering of the cost of gene sequencing, which has generated widely available cancer genomic data. However, the interpretation of genomic data and the prediction of the association of genetic variations with cancer and disease phenotypes still requires significant improvement. Missense mutations, which can render proteins non-functional and provide a selective growth advantage to cancer cells, are frequently detected in cancer. Effects caused by missense mutations can be pinpointed by in silico modeling, which makes it more feasible to find a treatment and reverse the effect. Specific human phenotypes are largely determined by stability, activity, and interactions between proteins and other biomolecules that work together to execute specific cellular functions. Therefore, analysis of missense mutations’ effects on proteins and their complexes would provide important clues for identifying functionally important missense mutations, understanding the molecular mechanisms of cancer progression and facilitating treatment and prevention. Herein, we summarize the major computational approaches and tools that provide not only the classification of missense mutations as cancer drivers or passengers but also the molecular mechanisms induced by driver mutations. This review focuses on the discussion of annotation and prediction methods based on structural and biophysical data, analysis of somatic cancer missense mutations in 3D structures of proteins and their complexes, predictions of the effects of missense mutations on protein stability, protein-protein and protein-nucleic acid interactions, and assessment of conformational changes in protein conformations induced by mutations.
Collapse
|
9
|
Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat Methods 2017; 14:782-788. [PMID: 28714987 DOI: 10.1038/nmeth.4364] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 06/16/2017] [Indexed: 12/19/2022]
Abstract
Understanding genetic events that lead to cancer initiation and progression remains one of the biggest challenges in cancer biology. Traditionally, most algorithms for cancer-driver identification look for genes that have more mutations than expected from the average background mutation rate. However, there is now a wide variety of methods that look for nonrandom distribution of mutations within proteins as a signal for the driving role of mutations in cancer. Here we classify and review such subgene-resolution algorithms, compare their findings on four distinct cancer data sets from The Cancer Genome Atlas and discuss how predictions from these algorithms can be interpreted in the emerging paradigms that challenge the simple dichotomy between driver and passenger genes.
Collapse
|
10
|
3D clusters of somatic mutations in cancer reveal numerous rare mutations as functional targets. Genome Med 2017; 9:4. [PMID: 28115009 PMCID: PMC5260099 DOI: 10.1186/s13073-016-0393-x] [Citation(s) in RCA: 148] [Impact Index Per Article: 21.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Accepted: 12/15/2016] [Indexed: 01/12/2023] Open
Abstract
Many mutations in cancer are of unknown functional significance. Standard methods use statistically significant recurrence of mutations in tumor samples as an indicator of functional impact. We extend such analyses into the long tail of rare mutations by considering recurrence of mutations in clusters of spatially close residues in protein structures. Analyzing 10,000 tumor exomes, we identify more than 3000 rarely mutated residues in proteins as potentially functional and experimentally validate several in RAC1 and MAP2K1. These potential driver mutations (web resources: 3dhotspots.org and cBioPortal.org) can extend the scope of genomically informed clinical trials and of personalized choice of therapy.
Collapse
|
11
|
Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J, Wyczalkowski MA, Liang WW, Zhang Q, McLellan MD, Sun SQ, Tripathi P, Lou C, Ye K, Mashl RJ, Wallis J, Wendl MC, Chen F, Ding L. Protein-structure-guided discovery of functional mutations across 19 cancer types. Nat Genet 2016; 48:827-37. [PMID: 27294619 PMCID: PMC5315576 DOI: 10.1038/ng.3586] [Citation(s) in RCA: 100] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2016] [Accepted: 05/13/2016] [Indexed: 02/07/2023]
Abstract
Local concentrations of mutations are well known in human cancers. However, their three-dimensional spatial relationships in the encoded protein have yet to be systematically explored. We developed a computational tool, HotSpot3D, to identify such spatial hotspots (clusters) and to interpret the potential function of variants within them. We applied HotSpot3D to >4,400 TCGA tumors across 19 cancer types, discovering >6,000 intra- and intermolecular clusters, some of which showed tumor and/or tissue specificity. In addition, we identified 369 rare mutations in genes including TP53, PTEN, VHL, EGFR, and FBXW7 and 99 medium-recurrence mutations in genes such as RUNX1, MTOR, CA3, PI3, and PTPN11, all mapping within clusters having potential functional implications. As a proof of concept, we validated our predictions in EGFR using high-throughput phosphorylation data and cell-line-based experimental evaluation. Finally, mutation-drug cluster and network analysis predicted over 800 promising candidates for druggable mutations, raising new possibilities for designing personalized treatments for patients carrying specific mutations.
Collapse
Affiliation(s)
- Beifang Niu
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Adam D. Scott
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Sohini Sengupta
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Matthew H. Bailey
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Prag Batra
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Jie Ning
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Nephrology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Matthew A. Wyczalkowski
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Wen-Wei Liang
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Qunyuan Zhang
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Department of Genetics, Washington University, St. Louis, Missouri 63108, USA
| | - Michael D. McLellan
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Sam Q. Sun
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Piyush Tripathi
- Division of Nephrology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Carolyn Lou
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - Kai Ye
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Department of Genetics, Washington University, St. Louis, Missouri 63108, USA
| | - R. Jay Mashl
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
| | - John Wallis
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
| | - Michael C. Wendl
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
- Department of Genetics, Washington University, St. Louis, Missouri 63108, USA
- Department of Mathematics, Washington University, St. Louis, Missouri 63108, USA
| | - Feng Chen
- Division of Nephrology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
- Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA
- Department of Cell Biology and Physiology, Washington University, St. Louis, Missouri 63108, USA
| | - Li Ding
- McDonnell Genome Institute, Washington University, St. Louis, Missouri 63108, USA
- Division of Oncology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
- Division of Nephrology, Department of Medicine, Washington University, St. Louis, Missouri 63108, USA
- Siteman Cancer Center, Washington University, St. Louis, Missouri 63108, USA
| |
Collapse
|
12
|
Ryslik GA, Cheng Y, Modis Y, Zhao H. Leveraging protein quaternary structure to identify oncogenic driver mutations. BMC Bioinformatics 2016; 17:137. [PMID: 27001666 PMCID: PMC4802602 DOI: 10.1186/s12859-016-0963-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 02/18/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Identifying key "driver" mutations which are responsible for tumorigenesis is critical in the development of new oncology drugs. Due to multiple pharmacological successes in treating cancers that are caused by such driver mutations, a large body of methods have been developed to differentiate these mutations from the benign "passenger" mutations which occur in the tumor but do not further progress the disease. Under the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of algorithms that identify these clusters has become a critical area of research. RESULTS We have developed a novel methodology, QuartPAC (Quaternary Protein Amino acid Clustering), that identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins. The R package is available on Bioconductor at: http://bioconductor.jp/packages/3.1/bioc/html/QuartPAC.html . CONCLUSION QuartPAC provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure.
Collapse
Affiliation(s)
- Gregory A. Ryslik
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
| | - Yuwei Cheng
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| | - Yorgo Modis
- />Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH UK
| | - Hongyu Zhao
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| |
Collapse
|
13
|
Chung IF, Chen CY, Su SC, Li CY, Wu KJ, Wang HW, Cheng WC. DriverDBv2: a database for human cancer driver gene research. Nucleic Acids Res 2015; 44:D975-9. [PMID: 26635391 PMCID: PMC4702919 DOI: 10.1093/nar/gkv1314] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 11/10/2015] [Indexed: 11/30/2022] Open
Abstract
We previously presented DriverDB, a database that incorporates ∼6000 cases of exome-seq data, in addition to annotation databases and published bioinformatics algorithms dedicated to driver gene/mutation identification. The database provides two points of view, ‘Cancer’ and ‘Gene’, to help researchers visualize the relationships between cancers and driver genes/mutations. In the updated DriverDBv2 database (http://ngs.ym.edu.tw/driverdb) presented herein, we incorporated >9500 cancer-related RNA-seq datasets and >7000 more exome-seq datasets from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC), and published papers. Seven additional computational algorithms (meaning that the updated database contains 15 in total), which were developed for driver gene identification, are incorporated into our analysis pipeline, and the results are provided in the ‘Cancer’ section. Furthermore, there are two main new features, ‘Expression’ and ‘Hotspot’, in the ‘Gene’ section. ‘Expression’ displays two expression profiles of a gene in terms of sample types and mutation types, respectively. ‘Hotspot’ indicates the hotspot mutation regions of a gene according to the results provided by four bioinformatics tools. A new function, ‘Gene Set’, allows users to investigate the relationships among mutations, expression levels and clinical data for a set of genes, a specific dataset and clinical features.
Collapse
Affiliation(s)
- I-Fang Chung
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan Center for Systems and Synthetic Biology, National Yang-Ming University, Taipei, 11221, Taiwan
| | - Chen-Yang Chen
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei 11221, Taiwan
| | - Shih-Chieh Su
- Research Center for Tumour Medical Science, China Medical University, Taichung, 40402, Taiwan
| | - Chia-Yang Li
- Department of Genome Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 80708, Taiwan Center for Infectious Disease and Cancer Research, Kaohsiung Medical University, Kaohsiung 80708, Taiwan
| | - Kou-Juey Wu
- Research Center for Tumour Medical Science, China Medical University, Taichung, 40402, Taiwan Graduate Institute of Cancer Biology, China Medical University, Taichung, 40402, Taiwan
| | - Hsei-Wei Wang
- VGH-YM Genomic Research Center, National Yang-Ming University, Taipei 11221, Taiwan Institute of Clinical Medicine, Medical College, National Yang-Ming University, Taipei 11221, Taiwan Institute of Microbiology and Immunology, National Yang-Ming University, Taipei 11221, Taiwan Department of Education and Research, Taipei City Hospital, Taipei 10341, Taiwan
| | - Wei-Chung Cheng
- Research Center for Tumour Medical Science, China Medical University, Taichung, 40402, Taiwan Graduate Institute of Cancer Biology, China Medical University, Taichung, 40402, Taiwan
| |
Collapse
|
14
|
Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc Natl Acad Sci U S A 2015; 112:E5486-95. [PMID: 26392535 DOI: 10.1073/pnas.1516373112] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.
Collapse
|
15
|
Vuong H, Cheng F, Lin CC, Zhao Z. Functional consequences of somatic mutations in cancer using protein pocket-based prioritization approach. Genome Med 2014; 6:81. [PMID: 25360158 PMCID: PMC4213513 DOI: 10.1186/s13073-014-0081-7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 10/03/2014] [Indexed: 12/12/2022] Open
Abstract
Background Recently, a number of large-scale cancer genome sequencing projects have generated a large volume of somatic mutations; however, identifying the functional consequences and roles of somatic mutations in tumorigenesis remains a major challenge. Researchers have identified that protein pocket regions play critical roles in the interaction of proteins with small molecules, enzymes, and nucleic acid. As such, investigating the features of somatic mutations in protein pocket regions provides a promising approach to identifying new genotype-phenotype relationships in cancer. Methods In this study, we developed a protein pocket-based computational approach to uncover the functional consequences of somatic mutations in cancer. We mapped 1.2 million somatic mutations across 36 cancer types from the COSMIC database and The Cancer Genome Atlas (TCGA) onto the protein pocket regions of over 5,000 protein three-dimensional structures. We further integrated cancer cell line mutation profiles and drug pharmacological data from the Cancer Cell Line Encyclopedia (CCLE) onto protein pocket regions in order to identify putative biomarkers for anticancer drug responses. Results We found that genes harboring protein pocket somatic mutations were significantly enriched in cancer driver genes. Furthermore, genes harboring pocket somatic mutations tended to be highly co-expressed in a co-expressed protein interaction network. Using a statistical framework, we identified four putative cancer genes (RWDD1, NCF1, PLEK, and VAV3), whose expression profiles were associated with overall poor survival rates in melanoma, lung, or colorectal cancer patients. Finally, genes harboring protein pocket mutations were more likely to be drug-sensitive or drug-resistant. In a case study, we illustrated that the BAX gene was associated with the sensitivity of three anticancer drugs (midostaurin, vinorelbine, and tipifarnib). Conclusions This study provides novel insights into the functional consequences of somatic mutations during tumorigenesis and for anticancer drug responses. The computational approach used might be beneficial to the study of somatic mutations in the era of cancer precision medicine. Electronic supplementary material The online version of this article (doi:10.1186/s13073-014-0081-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huy Vuong
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA
| | - Feixiong Cheng
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA
| | - Chen-Ching Lin
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, 2525 West End Avenue, Suite 600, Nashville, TN 37203 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN 37232 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| |
Collapse
|