1
|
Kamal MM, Mia MS, Faruque MO, Rabby MG, Islam MN, Talukder MEK, Wani TA, Rahman MA, Hasan MM. In silico functional, structural and pathogenicity analysis of missense single nucleotide polymorphisms in human MCM6 gene. Sci Rep 2024; 14:11607. [PMID: 38773180 PMCID: PMC11109216 DOI: 10.1038/s41598-024-62299-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024] Open
Abstract
Single nucleotide polymorphisms (SNPs) are one of the most common determinants and potential biomarkers of human disease pathogenesis. SNPs could alter amino acid residues, leading to the loss of structural and functional integrity of the encoded protein. In humans, members of the minichromosome maintenance (MCM) family play a vital role in cell proliferation and have a significant impact on tumorigenesis. Among the MCM members, the molecular mechanism of how missense SNPs of minichromosome maintenance complex component 6 (MCM6) contribute to DNA replication and tumor pathogenesis is underexplored and needs to be elucidated. Hence, a series of sequence and structure-based computational tools were utilized to determine how mutations affect the corresponding MCM6 protein. From the dbSNP database, among 15,009 SNPs in the MCM6 gene, 642 missense SNPs (4.28%), 291 synonymous SNPs (1.94%), and 12,500 intron SNPs (83.28%) were observed. Out of the 642 missense SNPs, 33 were found to be deleterious during the SIFT analysis. Among these, 11 missense SNPs (I123S, R207C, R222C, L449F, V456M, D463G, H556Y, R602H, R633W, R658C, and P815T) were found as deleterious, probably damaging, affective and disease-associated. Then, I123S, R207C, R222C, V456M, D463G, R602H, R633W, and R658C missense SNPs were found to be highly harmful. Six missense SNPs (I123S, R207C, V456M, D463G, R602H, and R633W) had the potential to destabilize the corresponding protein as predicted by DynaMut2. Interestingly, five high-risk mutations (I123S, V456M, D463G, R602H, and R633W) were distributed in two domains (PF00493 and PF14551). During molecular dynamics simulations analysis, consistent fluctuation in RMSD and RMSF values, high Rg and hydrogen bonds in mutant proteins compared to wild-type revealed that these mutations might alter the protein structure and stability of the corresponding protein. Hence, the results from the analyses guide the exploration of the mechanism by which these missense SNPs of the MCM6 gene alter the structural integrity and functional properties of the protein, which could guide the identification of ways to minimize the harmful effects of these mutations in humans.
Collapse
Affiliation(s)
- Md Mostafa Kamal
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Sohel Mia
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Omar Faruque
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Golam Rabby
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh
| | - Md Numan Islam
- Department of Food Engineering, North Pacific International University of Bangladesh, Dhaka, Bangladesh
| | | | - Tanveer A Wani
- Department of Pharmaceutical Chemistry, College of Pharmacy, King Saud University, 11451, Riyadh, Saudi Arabia
| | - M Atikur Rahman
- Department of Biological Sciences, Alabama State University, 915 S Jackson St, Montgomery, AL, 36104, USA.
| | - Md Mahmudul Hasan
- Department of Nutrition and Food Technology, Jashore University of Science and Technology, Jashore, 7408, Bangladesh.
| |
Collapse
|
2
|
MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol 2024; 7:447. [PMID: 38605212 PMCID: PMC11009406 DOI: 10.1038/s42003-024-06117-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.
Collapse
Affiliation(s)
- Stuart A MacGowan
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
| | - Fábio Madeira
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thiago Britto-Borges
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- Section of Bioinformatics and Systems Cardiology, Department of Internal Medicine III and Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Geoffrey J Barton
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
3
|
Chitluri KK, Emerson IA. The importance of protein domain mutations in cancer therapy. Heliyon 2024; 10:e27655. [PMID: 38509890 PMCID: PMC10950675 DOI: 10.1016/j.heliyon.2024.e27655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 02/28/2024] [Accepted: 03/05/2024] [Indexed: 03/22/2024] Open
Abstract
Cancer is a complex disease that is caused by multiple genetic factors. Researchers have been studying protein domain mutations to understand how they affect the progression and treatment of cancer. These mutations can significantly impact the development and spread of cancer by changing the protein structure, function, and signalling pathways. As a result, there is a growing interest in how these mutations can be used as prognostic indicators for cancer prognosis. Recent studies have shown that protein domain mutations can provide valuable information about the severity of the disease and the patient's response to treatment. They may also be used to predict the response and resistance to targeted therapy in cancer treatment. The clinical implications of protein domain mutations in cancer are significant, and they are regarded as essential biomarkers in oncology. However, additional techniques and approaches are required to characterize changes in protein domains and predict their functional effects. Machine learning and other computational tools offer promising solutions to this challenge, enabling the prediction of the impact of mutations on protein structure and function. Such predictions can aid in the clinical interpretation of genetic information. Furthermore, the development of genome editing tools like CRISPR/Cas9 has made it possible to validate the functional significance of mutants more efficiently and accurately. In conclusion, protein domain mutations hold great promise as prognostic and predictive biomarkers in cancer. Overall, considerable research is still needed to better define genetic and molecular heterogeneity and to resolve the challenges that remain, so that their full potential can be realized.
Collapse
Affiliation(s)
- Kiran Kumar Chitluri
- Bioinformatics Programming Lab, Department of Bio-Sciences, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, TN, 632014, India
| | - Isaac Arnold Emerson
- Bioinformatics Programming Lab, Department of Bio-Sciences, School of Bio Sciences and Technology, Vellore Institute of Technology, Vellore, TN, 632014, India
| |
Collapse
|
4
|
Ruiz-Serra V, Valentini S, Madroñero S, Valencia A, Porta-Pardo E. 3Dmapper: a command line tool for BioBank-scale mapping of variants to protein structures. Bioinformatics 2024; 40:btae171. [PMID: 38565273 PMCID: PMC11018535 DOI: 10.1093/bioinformatics/btae171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 02/09/2024] [Accepted: 03/30/2024] [Indexed: 04/04/2024] Open
Abstract
MOTIVATION The interpretation of genomic data is crucial to understand the molecular mechanisms of biological processes. Protein structures play a vital role in facilitating this interpretation by providing functional context to genetic coding variants. However, mapping genes to proteins is a tedious and error-prone task due to inconsistencies in data formats. Over the past two decades, numerous tools and databases have been developed to automatically map annotated positions and variants to protein structures. However, most of these tools are web-based and not well-suited for large-scale genomic data analysis. RESULTS To address this issue, we introduce 3Dmapper, a stand-alone command-line tool developed in Python and R. It systematically maps annotated protein positions and variants to protein structures, providing a solution that is both efficient and reliable. AVAILABILITY AND IMPLEMENTATION https://github.com/vicruiser/3Dmapper.
Collapse
Affiliation(s)
- Victoria Ruiz-Serra
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Samuel Valentini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento 38123, Italy
| | - Sergi Madroñero
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC)
- Institució Catalana de Recerca Avançada (ICREA)
| | - Eduard Porta-Pardo
- Barcelona Supercomputing Center (BSC)
- Josep Carreras Leukaemia Research Institute (IJC), Badalona 08916, Spain
| |
Collapse
|
5
|
Gracia B, Montes P, Gutierrez AM, Arun B, Karras GI. Protein-folding chaperones predict structure-function relationships and cancer risk in BRCA1 mutation carriers. Cell Rep 2024; 43:113803. [PMID: 38368609 PMCID: PMC10941025 DOI: 10.1016/j.celrep.2024.113803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 12/28/2023] [Accepted: 02/01/2024] [Indexed: 02/20/2024] Open
Abstract
Predicting the risk of cancer mutations is critical for early detection and prevention, but differences in allelic severity of human carriers confound risk predictions. Here, we elucidate protein folding as a cellular mechanism driving differences in mutation severity of tumor suppressor BRCA1. Using a high-throughput protein-protein interaction assay, we show that protein-folding chaperone binding patterns predict the pathogenicity of variants in the BRCA1 C-terminal (BRCT) domain. HSP70 selectively binds 94% of pathogenic BRCA1-BRCT variants, most of which engage HSP70 more than HSP90. Remarkably, the magnitude of HSP70 binding linearly correlates with loss of folding and function. We identify a prevalent class of human hypomorphic BRCA1 variants that bind moderately to chaperones and retain partial folding and function. Furthermore, chaperone binding signifies greater mutation penetrance and earlier cancer onset in the clinic. Our findings demonstrate the utility of chaperones as quantitative cellular biosensors of variant folding, phenotypic severity, and cancer risk.
Collapse
Affiliation(s)
- Brant Gracia
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Patricia Montes
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Angelica Maria Gutierrez
- Department of Breast Medical Oncology and Clinical Cancer Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Banu Arun
- Department of Breast Medical Oncology and Clinical Cancer Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Georgios Ioannis Karras
- Department of Genetics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA; Genetics and Epigenetics Graduate Program, The University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA.
| |
Collapse
|
6
|
Gracia B, Montes P, Gutierrez AM, Arun B, Karras GI. Protein-Folding Chaperones Predict Structure-Function Relationships and Cancer Risk in BRCA1 Mutation Carriers. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.14.557795. [PMID: 37745493 PMCID: PMC10515940 DOI: 10.1101/2023.09.14.557795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Identifying pathogenic mutations and predicting their impact on protein structure, function and phenotype remain major challenges in genome sciences. Protein-folding chaperones participate in structure-function relationships by facilitating the folding of protein variants encoded by mutant genes. Here, we utilize a high-throughput protein-protein interaction assay to test HSP70 and HSP90 chaperone interactions as predictors of pathogenicity for variants in the tumor suppressor BRCA1. Chaperones bind 77% of pathogenic BRCA1-BRCT variants, most of which engaged HSP70 more than HSP90. Remarkably, the magnitude of chaperone binding to variants is proportional to the degree of structural and phenotypic defect induced by BRCA1 mutation. Quantitative chaperone interactions identified BRCA1-BRCT separation-of-function variants and hypomorphic alleles missed by pathogenicity prediction algorithms. Furthermore, increased chaperone binding signified greater cancer risk in human BRCA1 carriers. Altogether, our study showcases the utility of chaperones as quantitative cellular biosensors of variant folding and phenotypic severity. HIGHLIGHTS Chaperones detect an abundance of pathogenic folding variants of BRCA1-BRCT.Degree of chaperone binding reflects severity of structural and phenotypic defect.Chaperones identify separation-of-function and hypomorphic variants. Chaperone interactions indicate penetrance and expressivity of BRCA1 alleles.
Collapse
|
7
|
Pandey M, Gromiha MM. MutBLESS: A tool to identify disease-prone sites in cancer using deep learning. Biochim Biophys Acta Mol Basis Dis 2023; 1869:166721. [PMID: 37105446 DOI: 10.1016/j.bbadis.2023.166721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Understanding the molecular basis and impact of mutations at different stages of cancer are long-standing challenges in cancer biology. Identification of driver mutations from experiments is expensive and time intensive. In the present study, we collected the data for experimentally known driver mutations in 22 different cancer types and classified them into six categories: breast cancer (BRCA), acute myeloid leukaemia (LAML), endometrial carcinoma (EC), stomach cancer (STAD), skin cancer (SKCM), and other cancer types which contains 5747 disease prone and 5514 neutral sites in 516 proteins. The analysis of amino acid distribution along mutant sites revealed that the motifs AAA and LR are preferred in disease-prone sites whereas QPP and QF are dominant in neutral sites. Further, we developed a method using deep neural networks to predict disease-prone sites with amino acid sequence-based features such as physicochemical properties, secondary structure, tri-peptide motifs and conservation scores. We obtained an average AUC of 0.97 in five cancer types BRCA, LAML, EC, STAD and SKCM in a test dataset and 0.72 in all other cancer types together. Our method showed excellent performance for identifying cancer-specific mutations with an average sensitivity, specificity, and accuracy of 96.56 %, 97.39 %, and 97.64 %, respectively. We developed a web server for identifying cancer-prone sites, and it is available at https://web.iitm.ac.in/bioinfo2/MutBLESS/index.html. We suggest that our method can serve as an effective method to identify disease-prone sites and assist to develop therapeutic strategies.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
8
|
Recurrent high-impact mutations at cognate structural positions in class A G protein-coupled receptors expressed in tumors. Proc Natl Acad Sci U S A 2021; 118:2113373118. [PMID: 34916293 PMCID: PMC8713800 DOI: 10.1073/pnas.2113373118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2021] [Indexed: 12/23/2022] Open
Abstract
GPCRs and GPCR pathways are increasingly being implicated in human malignancies, placing them among the most promising cancer drug candidates. Our results reveal enrichment of highly impactful, recurrent GPCR mutations within cancers. We found that cognate mutations in selected class A GPCRs have deleterious effects on signaling function. The results also suggest that olfactory receptors, often considered inconsequential, display a nonrandom mutation pattern in tumors in which they are expressed. These findings support the idea that protein paralogs can act in parallel as members of an onco-group. G protein-coupled receptors (GPCRs) are the largest family of human proteins. They have a common structure and, signaling through a much smaller set of G proteins, arrestins, and effectors, activate downstream pathways that often modulate hallmark mechanisms of cancer. Because there are many more GPCRs than effectors, mutations in different receptors could perturb signaling similarly so as to favor a tumor. We hypothesized that somatic mutations in tumor samples may not be enriched within a single gene but rather that cognate mutations with similar effects on GPCR function are distributed across many receptors. To test this possibility, we systematically aggregated somatic cancer mutations across class A GPCRs and found a nonrandom distribution of positions with variant amino acid residues. Individual cancer types were enriched for highly impactful, recurrent mutations at selected cognate positions of known functional motifs. We also discovered that no single receptor drives this pattern, but rather multiple receptors contain amino acid substitutions at a few cognate positions. Phenotypic characterization suggests these mutations induce perturbation of G protein activation and/or β-arrestin recruitment. These data suggest that recurrent impactful oncogenic mutations perturb different GPCRs to subvert signaling and promote tumor growth or survival. The possibility that multiple different GPCRs could moonlight as drivers or enablers of a given cancer through mutations located at cognate positions across GPCR paralogs opens a window into cancer mechanisms and potential approaches to therapeutics.
Collapse
|
9
|
Chen J, Guo JT. Structural and functional analysis of somatic coding and UTR indels in breast and lung cancer genomes. Sci Rep 2021; 11:21178. [PMID: 34707120 PMCID: PMC8551294 DOI: 10.1038/s41598-021-00583-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022] Open
Abstract
Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.
Collapse
Affiliation(s)
- Jing Chen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
10
|
Grillo E, Ravelli C, Corsini M, Zammataro L, Mitola S. Protein domain-based approaches for the identification and prioritization of therapeutically actionable cancer variants. Biochim Biophys Acta Rev Cancer 2021; 1876:188614. [PMID: 34403770 DOI: 10.1016/j.bbcan.2021.188614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/11/2021] [Accepted: 08/11/2021] [Indexed: 01/04/2023]
Abstract
The tremendous number of cancer variants that can be detected by NGS analyses has required the development of computational approaches to prioritize mutations on the basis of their biological and clinical significance. Standard strategies take a gene-centric approach to the problem, allowing exclusively the identification of highly frequent variants. On the contrary, protein domain (PD)-based approaches allow to identify functionally relevant low frequency variants by searching for mutations that recur on analogous residues across homologous proteins (i.e. containing the same PD). Such approaches enable to transfer information about the effects and druggability from one known mutation to unknown ones. Here we describe how PD-based strategies work, and discuss how they could be exploited for mutation prioritization. The principle that mutations clustered on specific residues of PDs have the same functional consequences and are therapeutically actionable in a similar manner could help the choice of patient-specific targeted drugs, eventually improving the management of cancer patients.
Collapse
Affiliation(s)
- Elisabetta Grillo
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.
| | - Cosetta Ravelli
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Michela Corsini
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Luca Zammataro
- Division of Artificial Intelligence Systems for Immunoinformatics, Kiromic BioPharma, Inc., Houston, USA
| | - Stefania Mitola
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.
| |
Collapse
|
11
|
Dragomir I, Akbar A, Cassidy JW, Patel N, Clifford HW, Contino G. Identifying Cancer Drivers Using DRIVE: A Feature-Based Machine Learning Model for a Pan-Cancer Assessment of Somatic Missense Mutations. Cancers (Basel) 2021; 13:cancers13112779. [PMID: 34205004 PMCID: PMC8199862 DOI: 10.3390/cancers13112779] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Revised: 04/14/2021] [Accepted: 04/21/2021] [Indexed: 11/16/2022] Open
Abstract
Sporadic cancer develops from the accrual of somatic mutations. Out of all small-scale somatic aberrations in coding regions, 95% are base substitutions, with 90% being missense mutations. While multiple studies focused on the importance of this mutation type, a machine learning method based on the number of protein-protein interactions (PPIs) has not been fully explored. This study aims to develop an improved computational method for driver identification, validation and evaluation (DRIVE), which is compared to other methods for assessing its performance. DRIVE aims at distinguishing between driver and passenger mutations using a feature-based learning approach comprising two levels of biological classification for a pan-cancer assessment of somatic mutations. Gene-level features include the maximum number of protein-protein interactions, the biological process and the type of post-translational modifications (PTMs) while mutation-level features are based on pathogenicity scores. Multiple supervised classification algorithms were trained on Genomics Evidence Neoplasia Information Exchange (GENIE) project data and then tested on an independent dataset from The Cancer Genome Atlas (TCGA) study. Finally, the most powerful classifier using DRIVE was evaluated on a benchmark dataset, which showed a better overall performance compared to other state-of-the-art methodologies, however, considerable care must be taken due to the reduced size of the dataset. DRIVE outlines the outstanding potential that multiple levels of a feature-based learning model will play in the future of oncology-based precision medicine.
Collapse
Affiliation(s)
- Ionut Dragomir
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK;
- Centre for Computational Biology, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK
| | - Adnan Akbar
- Cambridge Cancer Genomics, Cambridge CB2 1QN, UK; (A.A.); (J.W.C.); (N.P.); (H.W.C.)
| | - John W. Cassidy
- Cambridge Cancer Genomics, Cambridge CB2 1QN, UK; (A.A.); (J.W.C.); (N.P.); (H.W.C.)
| | - Nirmesh Patel
- Cambridge Cancer Genomics, Cambridge CB2 1QN, UK; (A.A.); (J.W.C.); (N.P.); (H.W.C.)
| | - Harry W. Clifford
- Cambridge Cancer Genomics, Cambridge CB2 1QN, UK; (A.A.); (J.W.C.); (N.P.); (H.W.C.)
| | - Gianmarco Contino
- Institute of Cancer and Genomic Sciences, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham B15 2TT, UK;
- Von Hügel Institute, St Edmund College, University of Cambridge, Cambridge CB3 0BN, UK
- Queen Elizabeth Hospital, University of Birmingham Hospital Trust, Edgbaston, Birmingham B15 2GW, UK
- Correspondence:
| |
Collapse
|
12
|
Chung SS, Ng JCF, Laddach A, Thomas NSB, Fraternali F. Short loop functional commonality identified in leukaemia proteome highlights crucial protein sub-networks. NAR Genom Bioinform 2021; 3:lqab010. [PMID: 33709075 PMCID: PMC7936661 DOI: 10.1093/nargab/lqab010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 12/19/2020] [Accepted: 01/26/2021] [Indexed: 11/13/2022] Open
Abstract
Direct drug targeting of mutated proteins in cancer is not always possible and efficacy can be nullified by compensating protein-protein interactions (PPIs). Here, we establish an in silico pipeline to identify specific PPI sub-networks containing mutated proteins as potential targets, which we apply to mutation data of four different leukaemias. Our method is based on extracting cyclic interactions of a small number of proteins topologically and functionally linked in the Protein-Protein Interaction Network (PPIN), which we call short loop network motifs (SLM). We uncover a new property of PPINs named 'short loop commonality' to measure indirect PPIs occurring via common SLM interactions. This detects 'modules' of PPI networks enriched with annotated biological functions of proteins containing mutation hotspots, exemplified by FLT3 and other receptor tyrosine kinase proteins. We further identify functional dependency or mutual exclusivity of short loop commonality pairs in large-scale cellular CRISPR-Cas9 knockout screening data. Our pipeline provides a new strategy for identifying new therapeutic targets for drug discovery.
Collapse
Affiliation(s)
- Sun Sook Chung
- Department of Haematological Medicine, King's College London, London, SE5 9NU, UK
| | - Joseph C F Ng
- Randall Centre for Cell and Molecular Biophysics, King's College London, London, SE1 1UL, UK
| | - Anna Laddach
- Randall Centre for Cell and Molecular Biophysics, King's College London, London, SE1 1UL, UK
| | - N Shaun B Thomas
- Department of Haematological Medicine, King's College London, London, SE5 9NU, UK
| | - Franca Fraternali
- Randall Centre for Cell and Molecular Biophysics, King's College London, London, SE1 1UL, UK
| |
Collapse
|
13
|
Mészáros B, Hajdu-Soltész B, Zeke A, Dosztányi Z. Mutations of Intrinsically Disordered Protein Regions Can Drive Cancer but Lack Therapeutic Strategies. Biomolecules 2021; 11:biom11030381. [PMID: 33806614 PMCID: PMC8000335 DOI: 10.3390/biom11030381] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 02/22/2021] [Accepted: 02/24/2021] [Indexed: 12/22/2022] Open
Abstract
Many proteins contain intrinsically disordered regions (IDRs) which carry out important functions without relying on a single well-defined conformation. IDRs are increasingly recognized as critical elements of regulatory networks and have been also associated with cancer. However, it is unknown whether mutations targeting IDRs represent a distinct class of driver events associated with specific molecular and system-level properties, cancer types and treatment options. Here, we used an integrative computational approach to explore the direct role of intrinsically disordered protein regions driving cancer. We showed that around 20% of cancer drivers are primarily targeted through a disordered region. These IDRs can function in multiple ways which are distinct from the functional mechanisms of ordered drivers. Disordered drivers play a central role in context-dependent interaction networks and are enriched in specific biological processes such as transcription, gene expression regulation and protein degradation. Furthermore, their modulation represents an alternative mechanism for the emergence of all known cancer hallmarks. Importantly, in certain cancer patients, mutations of disordered drivers represent key driving events. However, treatment options for such patients are currently severely limited. The presented study highlights a largely overlooked class of cancer drivers associated with specific cancer types that need novel therapeutic options.
Collapse
Affiliation(s)
- Bálint Mészáros
- Department of Biochemistry, ELTE Eötvös Loránd University, H-1117 Budapest, Hungary; (B.M.); (B.H.-S.)
- EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany
| | - Borbála Hajdu-Soltész
- Department of Biochemistry, ELTE Eötvös Loránd University, H-1117 Budapest, Hungary; (B.M.); (B.H.-S.)
| | - András Zeke
- Institute of Enzymology, RCNS, P.O. Box 7, H-1518 Budapest, Hungary;
| | - Zsuzsanna Dosztányi
- Department of Biochemistry, ELTE Eötvös Loránd University, H-1117 Budapest, Hungary; (B.M.); (B.H.-S.)
- Correspondence: ; Tel.: +36-1-372 2500/8537
| |
Collapse
|
14
|
Yu T, Choi KP, Chen ES, Zhang L. Stage-specific protein-domain mutational profile of invasive ductal breast cancer. BMC Med Genomics 2020; 13:150. [PMID: 33087126 PMCID: PMC7580001 DOI: 10.1186/s12920-020-00777-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Understanding the mechanisms underlying the malignant progression of cancer cells is crucial for early diagnosis and therapeutic treatment for cancer. Mutational heterogeneity of breast cancer suggests that about a dozen of cancer genes consistently mutate, together with many other genes mutating occasionally, in patients. METHODS Using the whole-exome sequences and clinical information of 468 patients in the TCGA project data portal, we analyzed mutated protein domains and signaling pathway alterations in order to understand how infrequent mutations contribute aggregately to tumor progression in different stages. RESULTS Our findings suggest that while the spectrum of mutated domains was diverse, mutations were aggregated in Pkinase, Pkinase Tyr, Y-Phosphatase and Src-homology 2 domains, highlighting the genetic heterogeneity in activating the protein tyrosine kinase signaling pathways in invasive ductal breast cancer. CONCLUSIONS The study provides new clues to the functional role of infrequent mutations in protein domain regions in different stages for invasive ductal breast cancer, yielding biological insights into metastasis for invasive ductal breast cancer.
Collapse
Affiliation(s)
- Ting Yu
- Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076 Singapore
- Computational Biology Programme, National University of Singapore, 8 Medical Drive, Singapore, 117596 Singapore
| | - Kwok Pui Choi
- Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076 Singapore
- Department of Statistics and Applied Probability, National University of Singapore, 6 Science Drive 2, Singapore, 117546 Singapore
| | - Ee Sin Chen
- Department of Biochemistry, National University of Singapore, 8 Medical Drive, Singapore, 117596 Singapore
| | - Louxin Zhang
- Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore, 119076 Singapore
- Computational Biology Programme, National University of Singapore, 8 Medical Drive, Singapore, 117596 Singapore
| |
Collapse
|
15
|
Lung J, Hung MS, Lin YC, Jiang YY, Fang YH, Lu MS, Hsieh CC, Wang CS, Kuan FC, Lu CH, Chen PT, Lin CM, Chou YL, Lin CK, Yang TM, Chen FF, Lin PY, Hsieh MJ, Tsai YH. A highly sensitive and specific real-time quantitative PCR for BRAF V600E/K mutation screening. Sci Rep 2020; 10:16943. [PMID: 33037234 PMCID: PMC7547094 DOI: 10.1038/s41598-020-72809-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2020] [Accepted: 08/06/2020] [Indexed: 02/07/2023] Open
Abstract
Mutations that lead to constitutive activation of key regulators in cellular processes are one of the most important drivers behind vigorous growth of cancer cells, and are thus prime targets in cancer treatment. BRAF V600E mutation transduces strong growth and survival signals for cancer cells, and is widely present in various types of cancers including lung cancer. A combination of BRAF inhibitor (dabrafenib) and MEK inhibitor (trametinib) has recently been approved and significantly improved the survival of patients with advanced NSCLC harboring BRAF V600E/K mutation. To improve the detection of BRAF V600E/K mutation and investigate the incidence and clinicopathological features of the mutation in lung cancer patients of southern Taiwan, a highly sensitive and specific real-time quantitative PCR (RT-qPCR) method, able to detect single-digit copies of mutant DNA, was established and compared with BRAF V600E-specific immunohistochemistry. Results showed that the BRAF V600E mutation was present at low frequency (0.65%, 2/306) in the studied patient group, and the detection sensitivity and specificity of the new RT-qPCR and V600E-specific immunohistochemistry both reached 100% and 97.6%, respectively. Screening the BRAF V600E/K mutation with the RT-qPCR and V600E-specific immunohistochemistry simultaneously could help improve detection accuracy.
Collapse
Affiliation(s)
- Jrhau Lung
- Department of Medical Research and Development, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Ming-Szu Hung
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
- Department of Medicine, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Department of Respiratory Care, Chang Gung University of Science and Technology, Chiayi Campus, Chiayi, Taiwan
| | - Yu-Ching Lin
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
- Department of Medicine, College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Department of Respiratory Care, Chang Gung University of Science and Technology, Chiayi Campus, Chiayi, Taiwan
| | - Yuan Yuan Jiang
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Yu-Hung Fang
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Ming-Shian Lu
- Division of Thoracic and Cardiovascular Surgery, Department of Surgery, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Ching-Chuan Hsieh
- Department of General Surgery, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Chia-Siu Wang
- Department of General Surgery, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Feng-Che Kuan
- Department of Hematology and Oncology, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Chang-Hsien Lu
- Department of Hematology and Oncology, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Ping-Tsung Chen
- Department of Hematology and Oncology, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Chieh-Mo Lin
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Yen-Li Chou
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Chin-Kuo Lin
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Tsung-Ming Yang
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Fen Fen Chen
- Department of Pathology, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
| | - Paul Yann Lin
- Department of Anatomic Pathology, Dalin Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Chiayi, Taiwan
| | - Meng-Jer Hsieh
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan
- Department of Respiratory Care, College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Ying Huang Tsai
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Chiayi Branch, Chiayi, Taiwan.
- Department of Respiratory Care, College of Medicine, Chang Gung University, Taoyuan, Taiwan.
- Department of Pulmonary and Critical Care Medicine, Chang Gung Memorial Hospital, Linkou Branch, Linkou, Taiwan.
| |
Collapse
|
16
|
Kralovicova J, Borovska I, Kubickova M, Lukavsky PJ, Vorechovsky I. Cancer-Associated Substitutions in RNA Recognition Motifs of PUF60 and U2AF65 Reveal Residues Required for Correct Folding and 3' Splice-Site Selection. Cancers (Basel) 2020; 12:cancers12071865. [PMID: 32664474 PMCID: PMC7408900 DOI: 10.3390/cancers12071865] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 07/05/2020] [Accepted: 07/07/2020] [Indexed: 12/22/2022] Open
Abstract
U2AF65 (U2AF2) and PUF60 (PUF60) are splicing factors important for recruitment of the U2 small nuclear ribonucleoprotein to lariat branch points and selection of 3′ splice sites (3′ss). Both proteins preferentially bind uridine-rich sequences upstream of 3′ss via their RNA recognition motifs (RRMs). Here, we examined 36 RRM substitutions reported in cancer patients to identify variants that alter 3′ss selection, RNA binding and protein properties. Employing PUF60- and U2AF65-dependent 3′ss previously identified by RNA-seq of depleted cells, we found that 43% (10/23) and 15% (2/13) of independent RRM mutations in U2AF65 and PUF60, respectively, conferred splicing defects. At least three RRM mutations increased skipping of internal U2AF2 (~9%, 2/23) or PUF60 (~8%, 1/13) exons, indicating that cancer-associated RRM mutations can have both cis- and trans-acting effects on splicing. We also report residues required for correct folding/stability of each protein and map functional RRM substitutions on to existing high-resolution structures of U2AF65 and PUF60. These results identify new RRM residues critical for 3′ss selection and provide relatively simple tools to detect clonal RRM mutations that enhance the mRNA isoform diversity.
Collapse
Affiliation(s)
- Jana Kralovicova
- Faculty of Medicine, University of Southampton, Southampton SO16 6YD, UK;
- Institute of Molecular Physiology and Genetics, Center of Biosciences, Slovak Academy of Sciences, 840 05 Bratislava, Slovakia;
| | - Ivana Borovska
- Institute of Molecular Physiology and Genetics, Center of Biosciences, Slovak Academy of Sciences, 840 05 Bratislava, Slovakia;
| | - Monika Kubickova
- CEITEC, Masaryk University, 625 00 Brno, Czech Republic; (M.K.); (P.J.L.)
| | - Peter J. Lukavsky
- CEITEC, Masaryk University, 625 00 Brno, Czech Republic; (M.K.); (P.J.L.)
| | - Igor Vorechovsky
- Faculty of Medicine, University of Southampton, Southampton SO16 6YD, UK;
- Correspondence: ; Tel.: +44-2381-206425; Fax: +44-2381-204264
| |
Collapse
|
17
|
Yadav A, Vidal M, Luck K. Precision medicine - networks to the rescue. Curr Opin Biotechnol 2020; 63:177-189. [PMID: 32199228 PMCID: PMC7308189 DOI: 10.1016/j.copbio.2020.02.005] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Accepted: 02/13/2020] [Indexed: 12/11/2022]
Abstract
Genetic variants are often not predictive of the phenotypic outcome. Individuals carrying the same pathogenic variant, associated with Mendelian or complex disease, can manifest to different extents, from severe-to-mild to no disease. Improving the accuracy of predicted clinical manifestations of genetic variants has emerged as one of the biggest challenges in precision medicine, which can only be addressed by understanding the mechanisms underlying genotype-phenotype relationships. Efforts to understand the molecular basis of these relationships have identified complex systems of interacting biomolecules that underlie cellular function. Here, we review recent advances in how modeling cellular systems as networks of interacting proteins has fueled identification of disease-associated processes, delineation of underlying molecular mechanisms, and prediction of the pathogenicity of variants. This review is intended to be inspiring for clinicians, geneticists, and network biologists alike who aim to jointly advance our understanding of human disease and accelerate progress toward precision medicine.
Collapse
Affiliation(s)
- Anupama Yadav
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
| | - Katja Luck
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA; Current address: Institute of Molecular Biology, Mainz, Germany.
| |
Collapse
|
18
|
Biophysical prediction of protein-peptide interactions and signaling networks using machine learning. Nat Methods 2020; 17:175-183. [PMID: 31907444 PMCID: PMC7004877 DOI: 10.1038/s41592-019-0687-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Accepted: 11/15/2019] [Indexed: 12/17/2022]
Abstract
In mammalian cells, much of signal transduction is mediated by weak protein-protein interactions between globular peptide-binding domains (PBDs) and unstructured peptidic motifs in partner proteins. The number and diversity of these PBDs (over 1,800 are known), low binding affinities, and sensitivity of binding properties to minor sequence variation represent a substantial challenge to experimental and computational analysis of PBD specificity and the networks PBDs create. Here we introduce a bespoke machine learning approach, hierarchical statistical mechanical modelling (HSM), capable of accurately predicting the affinities of PBD-peptide interactions across multiple protein families. By synthesizing biophysical priors within a modern machine learning framework, HSM outperforms existing computational methods and high-throughput experimental assays. HSM models are interpretable in familiar biophysical terms at three spatial scales: the energetics of protein-peptide binding, the multi-dentate organization of protein-protein interactions, and the global architecture of signaling networks.
Collapse
|
19
|
Mechanics of actin filaments in cancer onset and progress. INTERNATIONAL REVIEW OF CELL AND MOLECULAR BIOLOGY 2020; 355:205-243. [DOI: 10.1016/bs.ircmb.2020.05.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
20
|
V K MA, Chandrasekaran VM, Pandurangan S. Protein Domain Level Cancer Drug Targets in the Network of MAPK Pathways. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2057-2065. [PMID: 29993692 DOI: 10.1109/tcbb.2018.2829507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Proteins in the MAPK pathways considered as potential drug targets for cancer treatment. Pathways along with the cross-talks increase their scope to view them as a network of MAPK pathways. Side effect causing targeted domains act as a proxy for drug targets due to its structural similarity and frequent reuse of their variants. We proposed to identify non-repeatable protein domains as the drug targets to disrupt the signal transduction than targeting the whole protein. Network based approach is used to understand the contribution of 52 domains in non-hub, non-essential, and intra-pathway cancerous nodes and to identify potential drug target domains. 34 distinct domains in the cancerous proteins are playing vital roles in making cancer as a complex disease and pose challenges to identify potential drug targets. Distribution of domain families follows the power law in the network. Single promiscuous domains are contributing to the formation of hubs like Pkinease, Pkinease Tyr, and Ras. Hub nodes are positively correlated with the domain coverage and targeting them would disrupt functional properties of the proteins. EIF 4EBP, alpha Kinase, Sel1, ROKNT, and KH 1 are the domains identified as potential domain targets for the disruption of the signaling mechanism involved in cancer.
Collapse
|
21
|
Lin CY, Vennam S, Purington N, Lin E, Varma S, Han S, Desa M, Seto T, Wang NJ, Stehr H, Troxell ML, Kurian AW, West RB. Genomic landscape of ductal carcinoma in situ and association with progression. Breast Cancer Res Treat 2019; 178:307-316. [PMID: 31420779 PMCID: PMC6800639 DOI: 10.1007/s10549-019-05401-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 08/07/2019] [Indexed: 01/07/2023]
Abstract
PURPOSE The detection rate of breast ductal carcinoma in situ (DCIS) has increased significantly, raising the concern that DCIS is overdiagnosed and overtreated. Therefore, there is an unmet clinical need to better predict the risk of progression among DCIS patients. Our hypothesis is that by combining molecular signatures with clinicopathologic features, we can elucidate the biology of breast cancer progression, and risk-stratify patients with DCIS. METHODS Targeted exon sequencing with a custom panel of 223 genes/regions was performed for 125 DCIS cases. Among them, 60 were from cases having concurrent or subsequent invasive breast cancer (IBC) (DCIS + IBC group), and 65 from cases with no IBC development over a median follow-up of 13 years (DCIS-only group). Copy number alterations in chromosome 1q32, 8q24, and 11q13 were analyzed using fluorescence in situ hybridization (FISH). Multivariable logistic regression models were fit to the outcome of DCIS progression to IBC as functions of demographic and clinical features. RESULTS We observed recurrent variants of known IBC-related mutations, and the most commonly mutated genes in DCIS were PIK3CA (34.4%) and TP53 (18.4%). There was an inverse association between PIK3CA kinase domain mutations and progression (Odds Ratio [OR] 10.2, p < 0.05). Copy number variations in 1q32 and 8q24 were associated with progression (OR 9.3 and 46, respectively; both p < 0.05). CONCLUSIONS PIK3CA kinase domain mutations and the absence of copy number gains in DCIS are protective against progression to IBC. These results may guide efforts to distinguish low-risk from high-risk DCIS.
Collapse
MESH Headings
- Aged
- Aged, 80 and over
- Carcinoma, Ductal, Breast/genetics
- Carcinoma, Ductal, Breast/pathology
- Carcinoma, Ductal, Breast/therapy
- Carcinoma, Intraductal, Noninfiltrating/genetics
- Carcinoma, Intraductal, Noninfiltrating/pathology
- DNA Copy Number Variations
- Female
- Genetic Predisposition to Disease
- Genome-Wide Association Study/methods
- Genomics/methods
- Humans
- In Situ Hybridization, Fluorescence
- Middle Aged
- Neoplasm Metastasis
- Neoplasm Staging
- Tumor Burden
Collapse
Affiliation(s)
- Chieh-Yu Lin
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
- Department of Pathology and Immunology, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Sujay Vennam
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Natasha Purington
- Department of Medicine, Quantitative Sciences Unit, Stanford University, Stanford, CA, USA
| | - Eric Lin
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Sushama Varma
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Summer Han
- Department of Medicine, Quantitative Sciences Unit, Stanford University, Stanford, CA, USA
| | - Manisha Desa
- Department of Medicine and of Biomedical Data Science, Quantitative Sciences Unit, Stanford University, Stanford, CA, USA
| | - Tina Seto
- Research Information Technology, Stanford University School of Medicine, Stanford, CA, USA
| | - Nicholas J Wang
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, USA
| | - Henning Stehr
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Megan L Troxell
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
| | - Allison W Kurian
- Departments of Medicine and of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
| | - Robert B West
- Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.
| |
Collapse
|
22
|
Leveraging protein dynamics to identify cancer mutational hotspots using 3D structures. Proc Natl Acad Sci U S A 2019; 116:18962-18970. [PMID: 31462496 PMCID: PMC6754584 DOI: 10.1073/pnas.1901156116] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Large-scale exome sequencing of tumors has enabled the identification of cancer drivers using recurrence-based approaches. Some of these methods also employ 3D protein structures to identify mutational hotspots in cancer-associated genes. In determining such mutational clusters in structures, existing approaches overlook protein dynamics, despite its essential role in protein function. We present a framework to identify cancer driver genes using a dynamics-based search of mutational hotspot communities. Mutations are mapped to protein structures, which are partitioned into distinct residue communities. These communities are identified in a framework where residue-residue contact edges are weighted by correlated motions (as inferred by dynamics-based models). We then search for signals of positive selection among these residue communities to identify putative driver genes, while applying our method to the TCGA (The Cancer Genome Atlas) PanCancer Atlas missense mutation catalog. Overall, we predict 1 or more mutational hotspots within the resolved structures of proteins encoded by 434 genes. These genes were enriched among biological processes associated with tumor progression. Additionally, a comparison between our approach and existing cancer hotspot detection methods using structural data suggests that including protein dynamics significantly increases the sensitivity of driver detection.
Collapse
|
23
|
Tokheim C, Karchin R. CHASMplus Reveals the Scope of Somatic Missense Mutations Driving Human Cancers. Cell Syst 2019; 9:9-23.e8. [PMID: 31202631 DOI: 10.1016/j.cels.2019.05.005] [Citation(s) in RCA: 63] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 02/13/2019] [Accepted: 05/11/2019] [Indexed: 01/01/2023]
Abstract
Large-scale cancer sequencing studies of patient cohorts have statistically implicated many genes driving cancer growth and progression, and their identification has yielded substantial translational impact. However, a remaining challenge is to increase the resolution of driver prediction from the level of genes to mutations because mutation-level predictions are more closely aligned with the goal of precision cancer medicine. Here, we present CHASMplus, a computational method that is uniquely capable of identifying driver missense mutations, including those specific to a cancer type, as evidenced by significantly superior performance on diverse benchmarks. Applied to 8,657 tumor samples across 32 cancer types in The Cancer Genome Atlas (TCGA), CHASMplus identifies over 4,000 unique driver missense mutations in 240 genes, supporting a prominent role for rare driver mutations. We show which TCGA cancer types are likely to yield discovery of new driver missense mutations by additional sequencing, which has important implications for public policy.
Collapse
Affiliation(s)
- Collin Tokheim
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Rachel Karchin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Oncology, Johns Hopkins University, Baltimore, MD 21204, USA.
| |
Collapse
|
24
|
Shim JE, Kim JH, Shin J, Lee JE, Lee I. Pathway-specific protein domains are predictive for human diseases. PLoS Comput Biol 2019; 15:e1007052. [PMID: 31075101 PMCID: PMC6530867 DOI: 10.1371/journal.pcbi.1007052] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 05/22/2019] [Accepted: 04/19/2019] [Indexed: 01/04/2023] Open
Abstract
Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes. Protein domains are basic functional units of proteins, yet domain-based pathway annotations for proteins are challenging tasks because many domains are pervasive among diverse pathways. Therefore, we developed a network-based scoring scheme to measure pathway specificity of domains, and then used it to identify pathway-specific domains. Surprisingly, we observed substantially more disease mutations in pathway-specific domains than non-specific domains. We found evidences that mutations of pathway-specific domains tend to perturb pathway integrity via disrupting within-pathway protein-protein interactions. We also demonstrated prediction capacity of pathway-specific domains for complex diseases with experimental validations. Our study demonstrated the usefulness of pathway information for protein domains in interpreting non-random distribution of disease mutations among domains and identification of disease genes and variants.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Ji Hyun Kim
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
| | - Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Ji Eun Lee
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
- Samsung Biomedical Research Institute, Samsung Medical Center, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
- * E-mail:
| |
Collapse
|
25
|
Ashford P, Pang CSM, Moya-García AA, Adeyelu T, Orengo CA. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci Rep 2019; 9:263. [PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/13/2018] [Indexed: 12/31/2022] Open
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.
Collapse
Affiliation(s)
- Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Aurelio A Moya-García
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.,Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, Málaga, Spain
| | - Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
26
|
Studying how genetic variants affect mechanism in biological systems. Essays Biochem 2018; 62:575-582. [PMID: 30315099 DOI: 10.1042/ebc20180021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 09/13/2018] [Accepted: 09/14/2018] [Indexed: 11/17/2022]
Abstract
Genetic variants are currently a major component of system-wide investigations into biological function or disease. Approaches to select variants (often out of thousands of candidates) that are responsible for a particular phenomenon have many clinical applications and can help illuminate differences between individuals. Selecting meaningful variants is greatly aided by integration with information about molecular mechanism, whether known from protein structures or interactions or biological pathways. In this review we discuss the nature of genetic variants, and recent studies highlighting what is currently known about the relationship between genetic variation, biomolecular function, and disease.
Collapse
|
27
|
Buljan M, Blattmann P, Aebersold R, Boutros M. Systematic characterization of pan-cancer mutation clusters. Mol Syst Biol 2018; 14:e7974. [PMID: 29572294 PMCID: PMC5866917 DOI: 10.15252/msb.20177974] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Cancer genome sequencing has shown that driver genes can often be distinguished not only by the elevated mutation frequency but also by specific nucleotide positions that accumulate changes at a high rate. However, properties associated with a residue's potential to drive tumorigenesis when mutated have not yet been systematically investigated. Here, using a novel methodological approach, we identify and characterize a compendium of 180 hotspot residues within 160 human proteins which occur with a significant frequency and are likely to have functionally relevant impact. We find that such mutations (i) are more prominent in proteins that can exist in the on and off state, (ii) reflect the identity of a tumor of origin, and (iii) often localize within interfaces which mediate interactions with other proteins or ligands. Following, we further examine structural data for human protein complexes and identify a number of additional protein interfaces that accumulate cancer mutations at a high rate. Jointly, these analyses suggest that disruption and dysregulation of protein interactions can be instrumental in switching functions of cancer proteins and activating downstream changes.
Collapse
Affiliation(s)
- Marija Buljan
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland.,Division Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Peter Blattmann
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland
| | - Ruedi Aebersold
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, Switzerland .,Faculty of Science, University of Zurich, Zurich, Switzerland
| | - Michael Boutros
- Division Signaling and Functional Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany .,Department Cell and Molecular Biology, Faculty of Medicine Mannheim, Heidelberg University, Heidelberg, Germany.,German Cancer Consortium (DKTK), Heidelberg, Germany
| |
Collapse
|
28
|
González-Sánchez JC, Raimondi F, Russell RB. Cancer genetics meets biomolecular mechanism-bridging an age-old gulf. FEBS Lett 2018; 592:463-474. [PMID: 29364530 DOI: 10.1002/1873-3468.12988] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2017] [Revised: 01/15/2018] [Accepted: 01/19/2018] [Indexed: 12/21/2022]
Abstract
Increasingly available genomic sequencing data are exploited to identify genes and variants contributing to diseases, particularly cancer. Traditionally, methods to find such variants have relied heavily on allele frequency and/or familial history, often neglecting to consider any mechanistic understanding of their functional consequences. Thus, while the set of known cancer-related genes has increased, for many, their mechanistic role in the disease is not completely understood. This issue highlights a wide gap between the disciplines of genetics, which largely aims to correlate genetic events with phenotype, and molecular biology, which ultimately aims at a mechanistic understanding of biological processes. Fortunately, new methods and several systematic studies have proved illuminating for many disease genes and variants by integrating sequencing with mechanistic data, including biomolecular structures and interactions. These have provided new interpretations for known mutations and suggested new disease-relevant variants and genes. Here, we review these approaches and discuss particular examples where these have had a profound impact on the understanding of human cancers.
Collapse
Affiliation(s)
| | - Francesco Raimondi
- Bioquant, Heidelberg University, Germany.,Heidelberg University Biochemistry Center (BZH), Germany
| | - Robert B Russell
- Bioquant, Heidelberg University, Germany.,Heidelberg University Biochemistry Center (BZH), Germany
| |
Collapse
|
29
|
Baeissa H, Benstead-Hume G, Richardson CJ, Pearl FMG. Identification and analysis of mutational hotspots in oncogenes and tumour suppressors. Oncotarget 2017; 8:21290-21304. [PMID: 28423505 PMCID: PMC5400584 DOI: 10.18632/oncotarget.15514] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 02/07/2017] [Indexed: 01/25/2023] Open
Abstract
Background The key to interpreting the contribution of a disease-associated mutation in the development and progression of cancer is an understanding of the consequences of that mutation both on the function of the affected protein and on the pathways in which that protein is involved. Protein domains encapsulate function and position-specific domain based analysis of mutations have been shown to help elucidate their phenotypes. Results In this paper we examine the domain biases in oncogenes and tumour suppressors, and find that their domain compositions substantially differ. Using data from over 30 different cancers from whole-exome sequencing cancer genomic projects we mapped over one million mutations to their respective Pfam domains to identify which domains are enriched in any of three different classes of mutation; missense, indels or truncations. Next, we identified the mutational hotspots within domain families by mapping small mutations to equivalent positions in multiple sequence alignments of protein domains We find that gain of function mutations from oncogenes and loss of function mutations from tumour suppressors are normally found in different domain families and when observed in the same domain families, hotspot mutations are located at different positions within the multiple sequence alignment of the domain. Conclusions By considering hotspots in tumour suppressors and oncogenes independently, we find that there are different specific positions within domain families that are particularly suited to accommodate either a loss or a gain of function mutation. The position is also dependent on the class of mutation. We find rare mutations co-located with well-known functional mutation hotspots, in members of homologous domain superfamilies, and we detect novel mutation hotspots in domain families previously unconnected with cancer. The results of this analysis can be accessed through the MOKCa database (http://strubiol.icr.ac.uk/extra/MOKCa).
Collapse
Affiliation(s)
- Hanadi Baeissa
- School of Life Sciences, University of Sussex, Falmer, Brighton, UK
| | | | | | | |
Collapse
|
30
|
Hashemi S, Nowzari Dalini A, Jalali A, Banaei-Moghaddam AM, Razaghi-Moghadam Z. Cancerouspdomains: comprehensive analysis of cancer type-specific recurrent somatic mutations in proteins and domains. BMC Bioinformatics 2017; 18:370. [PMID: 28814324 PMCID: PMC5559820 DOI: 10.1186/s12859-017-1779-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 08/02/2017] [Indexed: 01/19/2023] Open
Abstract
Background Discriminating driver mutations from the ones that play no role in cancer is a severe bottleneck in elucidating molecular mechanisms underlying cancer development. Since protein domains are representatives of functional regions within proteins, mutations on them may disturb the protein functionality. Therefore, studying mutations at domain level may point researchers to more accurate assessment of the functional impact of the mutations. Results This article presents a comprehensive study to map mutations from 29 cancer types to both sequence- and structure-based domains. Statistical analysis was performed to identify candidate domains in which mutations occur with high statistical significance. For each cancer type, the corresponding type-specific domains were distinguished among all candidate domains. Subsequently, cancer type-specific domains facilitated the identification of specific proteins for each cancer type. Besides, performing interactome analysis on specific proteins of each cancer type showed high levels of interconnectivity among them, which implies their functional relationship. To evaluate the role of mitochondrial genes, stem cell-specific genes and DNA repair genes in cancer development, their mutation frequency was determined via further analysis. Conclusions This study has provided researchers with a publicly available data repository for studying both CATH and Pfam domain regions on protein-coding genes. Moreover, the associations between different groups of genes/domains and various cancer types have been clarified. The work is available at http://www.cancerouspdomains.ir. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1779-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Adrin Jalali
- Max Planck Institute for Informatics, Saarland Informatics, Campus, 66123, Saarbrücken, Germany
| | | | - Zahra Razaghi-Moghadam
- Faculty of New Sciences and Technologies, University of Tehran, North Kargar St, Tehran, Tehran, 1439957131, Iran.
| |
Collapse
|
31
|
Climente-González H, Porta-Pardo E, Godzik A, Eyras E. The Functional Impact of Alternative Splicing in Cancer. Cell Rep 2017; 20:2215-2226. [DOI: 10.1016/j.celrep.2017.08.012] [Citation(s) in RCA: 355] [Impact Index Per Article: 50.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2017] [Revised: 07/15/2017] [Accepted: 07/26/2017] [Indexed: 12/29/2022] Open
|
32
|
Comparison of algorithms for the detection of cancer drivers at subgene resolution. Nat Methods 2017; 14:782-788. [PMID: 28714987 DOI: 10.1038/nmeth.4364] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 06/16/2017] [Indexed: 12/19/2022]
Abstract
Understanding genetic events that lead to cancer initiation and progression remains one of the biggest challenges in cancer biology. Traditionally, most algorithms for cancer-driver identification look for genes that have more mutations than expected from the average background mutation rate. However, there is now a wide variety of methods that look for nonrandom distribution of mutations within proteins as a signal for the driving role of mutations in cancer. Here we classify and review such subgene-resolution algorithms, compare their findings on four distinct cancer data sets from The Cancer Genome Atlas and discuss how predictions from these algorithms can be interpreted in the emerging paradigms that challenge the simple dichotomy between driver and passenger genes.
Collapse
|
33
|
Yang F, Sun S, Tan G, Costanzo M, Hill DE, Vidal M, Andrews BJ, Boone C, Roth FP. Identifying pathogenicity of human variants via paralog-based yeast complementation. PLoS Genet 2017; 13:e1006779. [PMID: 28542158 PMCID: PMC5466341 DOI: 10.1371/journal.pgen.1006779] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2016] [Revised: 06/09/2017] [Accepted: 04/25/2017] [Indexed: 11/21/2022] Open
Abstract
To better understand the health implications of personal genomes, we now face a largely unmet challenge to identify functional variants within disease-associated genes. Functional variants can be identified by trans-species complementation, e.g., by failure to rescue a yeast strain bearing a mutation in an orthologous human gene. Although orthologous complementation assays are powerful predictors of pathogenic variation, they are available for only a few percent of human disease genes. Here we systematically examine the question of whether complementation assays based on paralogy relationships can expand the number of human disease genes with functional variant detection assays. We tested over 1,000 paralogous human-yeast gene pairs for complementation, yielding 34 complementation relationships, of which 33 (97%) were novel. We found that paralog-based assays identified disease variants with success on par with that of orthology-based assays. Combining all homology-based assay results, we found that complementation can often identify pathogenic variants outside the homologous sequence region, presumably because of global effects on protein folding or stability. Within our search space, paralogy-based complementation more than doubled the number of human disease genes with a yeast-based complementation assay for disease variation. Functional complementation assays of human disease-associated gene variants can reveal many more human disease variants at high confidence than current computational approaches, even using highly-diverged model organisms. However, this has generally only been possible for a minority of human disease genes for which orthologous complementation is known in the relevant model organism, so that alternative assays are urgently needed. Here we show that complementation relationships can be found for many additional human disease genes by exploiting paralogous human-yeast gene relationships, and that disease variant identification using paralogy-based assays performs on par with orthology-based assays.
Collapse
Affiliation(s)
- Fan Yang
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
| | - Song Sun
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Guihong Tan
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Michael Costanzo
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - David E. Hill
- Center for Cancer Systems Biology (CCSB), Dana- Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB), Dana- Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Brenda J. Andrews
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Charles Boone
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| | - Frederick P. Roth
- Donnelly Centre, Toronto, Ontario, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada
- Center for Cancer Systems Biology (CCSB), Dana- Farber Cancer Institute, Boston, Massachusetts, United States of America
- Canadian Institute for Advanced Research, Toronto, Ontario, Canada
- * E-mail:
| |
Collapse
|
34
|
Peterson TA, Gauran IIM, Park J, Park D, Kann MG. Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples. PLoS Comput Biol 2017; 13:e1005428. [PMID: 28426665 PMCID: PMC5398485 DOI: 10.1371/journal.pcbi.1005428] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 02/28/2017] [Indexed: 12/28/2022] Open
Abstract
The fight against cancer is hindered by its highly heterogeneous nature. Genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare somatic variants present only in a small fraction of lesions. Such rare somatic variants dominate the landscape of genomic mutations in cancer, yet efforts to correlate somatic mutations found in one or few individuals with functional roles have been largely unsuccessful. Traditional methods for identifying somatic variants that drive cancer are 'gene-centric' in that they consider only somatic variants within a particular gene and make no comparison to other similar genes in the same family that may play a similar role in cancer. In this work, we present oncodomain hotspots, a new 'domain-centric' method for identifying clusters of somatic mutations across entire gene families using protein domain models. Our analysis confirms that our approach creates a framework for leveraging structural and functional information encapsulated by protein domains into the analysis of somatic variants in cancer, enabling the assessment of even rare somatic variants by comparison to similar genes. Our results reveal a vast landscape of somatic variants that act at the level of domain families altering pathways known to be involved with cancer such as protein phosphorylation, signaling, gene regulation, and cell metabolism. Due to oncodomain hotspots' unique ability to assess rare variants, we expect our method to become an important tool for the analysis of sequenced tumor genomes, complementing existing methods.
Collapse
Affiliation(s)
- Thomas A. Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
- University of California, San Francisco, Institute for Computational Health Science, San Francisco, California, United States of America
| | - Iris Ivy M. Gauran
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Junyong Park
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - DoHwan Park
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Maricel G. Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| |
Collapse
|
35
|
Raimondi F, Singh G, Betts MJ, Apic G, Vukotic R, Andreone P, Stein L, Russell RB. Insights into cancer severity from biomolecular interaction mechanisms. Sci Rep 2016; 6:34490. [PMID: 27698488 PMCID: PMC5048291 DOI: 10.1038/srep34490] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 09/14/2016] [Indexed: 12/11/2022] Open
Abstract
To attain a deeper understanding of diseases like cancer, it is critical to couple genetics with biomolecular mechanisms. High-throughput sequencing has identified thousands of somatic mutations across dozens of cancers, and there is a pressing need to identify the few that are pathologically relevant. Here we use protein structure and interaction data to interrogate nonsynonymous somatic cancer mutations, identifying a set of 213 molecular interfaces (protein-protein, -small molecule or –nucleic acid) most often perturbed in cancer, highlighting several potentially novel cancer genes. Over half of these interfaces involve protein-small-molecule interactions highlighting their overall importance in cancer. We found distinct differences in the predominance of perturbed interfaces between cancers and histological subtypes and presence or absence of certain interfaces appears to correlate with cancer severity.
Collapse
Affiliation(s)
- Francesco Raimondi
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, University of Heidelberg, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg, Im Neuenheimer Feld 328, University of Heidelberg, 69120 Heidelberg, Germany
| | - Gurdeep Singh
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, University of Heidelberg, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg, Im Neuenheimer Feld 328, University of Heidelberg, 69120 Heidelberg, Germany
| | - Matthew J Betts
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, University of Heidelberg, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg, Im Neuenheimer Feld 328, University of Heidelberg, 69120 Heidelberg, Germany
| | - Gordana Apic
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, University of Heidelberg, 69120 Heidelberg, Germany.,Cambridge Cell Networks, St. John's Innovation Centre, Cowley Road, Cambridge CB4 0WS, UK
| | - Ranka Vukotic
- Department of Medical and Surgical Sciences, University of Bologna and Azienda Ospedaliero-Universitaria di Bologna, Policlinico Sant'Orsola Malpighi, 40138 Bologna, Italy
| | - Pietro Andreone
- Department of Medical and Surgical Sciences, University of Bologna and Azienda Ospedaliero-Universitaria di Bologna, Policlinico Sant'Orsola Malpighi, 40138 Bologna, Italy
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - Robert B Russell
- CellNetworks, Bioquant, Im Neuenheimer Feld 267, University of Heidelberg, 69120 Heidelberg, Germany.,Biochemie Zentrum Heidelberg, Im Neuenheimer Feld 328, University of Heidelberg, 69120 Heidelberg, Germany
| |
Collapse
|
36
|
Qi H, Dong C, Chung WK, Wang K, Shen Y. Deep Genetic Connection Between Cancer and Developmental Disorders. Hum Mutat 2016; 37:1042-50. [PMID: 27363847 DOI: 10.1002/humu.23040] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Revised: 06/15/2016] [Accepted: 06/23/2016] [Indexed: 12/19/2022]
Abstract
Cancer and developmental disorders (DDs) share dysregulated cellular processes such as proliferation and differentiation. There are well-known genes implicated in both in cancer and DDs. In this study, we aim to quantify this genetic connection using publicly available data. We found that among DD patients, germline damaging de novo variants are more enriched in cancer driver genes than non-drivers. We estimate that cancer driver genes comprise about a third of DD risk genes. Additionally, de novo likely-gene-disrupting variants are more enriched in tumor suppressors, and about 40% of implicated de novo damaging missense variants are located in cancer somatic mutation hotspots, indicating that many genes have a similar mode of action in cancer and DDs. Our results suggest that we can view tumors as natural laboratories for assessing the deleterious effects of mutations that are applicable to germline variants and identification of causal genes and variants in DDs.
Collapse
Affiliation(s)
- Hongjian Qi
- Department of Applied Physics and Applied Mathematics, Columbia University, New York, New York.,Department of Systems Biology, Columbia University Medical Center, New York, New York
| | - Chengliang Dong
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California.,Biostatistics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Wendy K Chung
- Departments of Pediatrics and Medicine, Columbia University Medical Center, New York, New York
| | - Kai Wang
- Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California.,Biostatistics Division, Department of Preventive Medicine, University of Southern California, Los Angeles, California
| | - Yufeng Shen
- Department of Systems Biology, Columbia University Medical Center, New York, New York. .,Department of Biomedical Informatics, Columbia University Medical Center, New York, New York. .,JP Sulzberger Columbia Genome Center, Columbia University Medical Center, New York, New York.
| |
Collapse
|
37
|
Zhu Z, Ihle NT, Rejto PA, Zarrinkar PP. Outlier analysis of functional genomic profiles enriches for oncology targets and enables precision medicine. BMC Genomics 2016; 17:455. [PMID: 27296290 PMCID: PMC4907009 DOI: 10.1186/s12864-016-2807-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 05/27/2016] [Indexed: 01/22/2023] Open
Abstract
Background Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency. Results Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles. Conclusions The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2807-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhou Zhu
- Oncology Research Unit, Pfizer Worldwide Research & Development, La Jolla Laboratories, 10777 Science Center Drive, San Diego, CA, 92121, USA.
| | - Nathan T Ihle
- Oncology Research Unit, Pfizer Worldwide Research & Development, La Jolla Laboratories, 10777 Science Center Drive, San Diego, CA, 92121, USA
| | - Paul A Rejto
- Oncology Research Unit, Pfizer Worldwide Research & Development, La Jolla Laboratories, 10777 Science Center Drive, San Diego, CA, 92121, USA
| | - Patrick P Zarrinkar
- Oncology Research Unit, Pfizer Worldwide Research & Development, La Jolla Laboratories, 10777 Science Center Drive, San Diego, CA, 92121, USA.
| |
Collapse
|
38
|
Lees JG, Dawson NL, Sillitoe I, Orengo CA. Functional innovation from changes in protein domains and their combinations. Curr Opin Struct Biol 2016; 38:44-52. [DOI: 10.1016/j.sbi.2016.05.016] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 10/21/2022]
|
39
|
Mészáros B, Zeke A, Reményi A, Simon I, Dosztányi Z. Systematic analysis of somatic mutations driving cancer: uncovering functional protein regions in disease development. Biol Direct 2016; 11:23. [PMID: 27150584 PMCID: PMC4858844 DOI: 10.1186/s13062-016-0125-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 04/20/2016] [Indexed: 11/16/2022] Open
Abstract
Background Recent advances in sequencing technologies enable the large-scale identification of genes that are affected by various genetic alterations in cancer. However, understanding tumor development requires insights into how these changes cause altered protein function and impaired network regulation in general and/or in specific cancer types. Results In this work we present a novel method called iSiMPRe that identifies regions that are significantly enriched in somatic mutations and short in-frame insertions or deletions (indels). Applying this unbiased method to the complete human proteome, by using data enriched through various cancer genome projects, we identified around 500 protein regions which could be linked to one or more of 27 distinct cancer types. These regions covered the majority of known cancer genes, surprisingly even tumor suppressors. Additionally, iSiMPRe also identified novel genes and regions that have not yet been associated with cancer. Conclusions While local somatic mutations correspond to only a subset of genetic variations that can lead to cancer, our systematic analyses revealed that they represent an accompanying feature of most cancer driver genes regardless of the primary mechanism by which they are perturbed during tumorigenesis. These results indicate that the accumulation of local somatic mutations can be used to pinpoint genes responsible for cancer formation and can also help to understand the effect of cancer mutations at the level of functional modules in a broad range of cancer driver genes. Reviewers This article was reviewed by Sándor Pongor, Michael Gromiha and Zoltán Gáspári. Electronic supplementary material The online version of this article (doi:10.1186/s13062-016-0125-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bálint Mészáros
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, 2 Magyar Tudósok krt, Budapest, H-1117, Hungary.
| | - András Zeke
- Lendület Protein Interaction Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, 2 Magyar Tudósok krt, Budapest, H-1117, Hungary
| | - Attila Reményi
- Lendület Protein Interaction Group, Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, 2 Magyar Tudósok krt, Budapest, H-1117, Hungary
| | - István Simon
- Institute of Enzymology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, 2 Magyar Tudósok krt, Budapest, H-1117, Hungary
| | - Zsuzsanna Dosztányi
- MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, 11/c Pázmány Péter stny, Budapest, H-1117, Hungary.
| |
Collapse
|
40
|
Zhang N, Liu H, Yue G, Zhang Y, You J, Wang H. Molecular Heterogeneity of Ewing Sarcoma as Detected by Ion Torrent Sequencing. PLoS One 2016; 11:e0153546. [PMID: 27077911 PMCID: PMC4831808 DOI: 10.1371/journal.pone.0153546] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 03/31/2016] [Indexed: 12/26/2022] Open
Abstract
Ewing sarcoma (ES) is the second most common malignant bone and soft tissue tumor in children and adolescents. Despite advances in comprehensive treatment, patients with ES metastases still suffer poor outcomes, thus, emphasizing the need for detailed genetic profiles of ES patients to identify suitable molecular biomarkers for improved prognosis and development of effective and targeted therapies. In this study, the next generation sequencing Ion AmpliSeq™ Cancer Hotspot Panel v2 was used to identify cancer-related gene mutations in the tissue samples from 20 ES patients. This platform targeted 207 amplicons of 2800 loci in 50 cancer-related genes. Among the 20 tissue specimens, 62 nonsynonymous hotspot mutations were identified in 26 cancer-related genes, revealing the molecular heterogeneity of ES. Among these, five novel mutations in cancer-related genes (KDR, STK11, MLH1, KRAS, and PTPN11) were detected in ES, and these mutations were confirmed with traditional Sanger sequencing. ES patients with KDR, STK11, and MLH1 mutations had higher Ki-67 proliferation indices than the ES patients lacking such mutations. Notably, more than half of the ES patients harbored one or two possible ‘druggable’ mutations that have been previously linked to a clinical cancer treatment option. Our results provided the foundation to not only elucidate possible mechanisms involved in ES pathogenesis but also indicated the utility of Ion Torrent sequencing as a sensitive and cost-effective tool to screen key oncogenes and tumor suppressors in order to develop personalized therapy for ES patients.
Collapse
Affiliation(s)
- Nana Zhang
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Department of Pathology, Peking University Third Hospital, Beijing, China
| | - Haijing Liu
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Department of Pathology, Peking University Third Hospital, Beijing, China
| | - Guanjun Yue
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Department of Pathology, Peking University Third Hospital, Beijing, China
| | - Yan Zhang
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Department of Pathology, Peking University Third Hospital, Beijing, China
| | - Jiangfeng You
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Department of Pathology, Peking University Third Hospital, Beijing, China
| | - Hua Wang
- Department of Pathology, School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China.,Department of Pathology, Peking University Third Hospital, Beijing, China
| |
Collapse
|
41
|
Engin HB, Kreisberg JF, Carter H. Structure-Based Analysis Reveals Cancer Missense Mutations Target Protein Interaction Interfaces. PLoS One 2016; 11:e0152929. [PMID: 27043210 PMCID: PMC4820104 DOI: 10.1371/journal.pone.0152929] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Accepted: 03/20/2016] [Indexed: 01/06/2023] Open
Abstract
Recently it has been shown that cancer mutations selectively target protein-protein interactions. We hypothesized that mutations affecting distinct protein interactions involving established cancer genes could contribute to tumor heterogeneity, and that novel mechanistic insights might be gained into tumorigenesis by investigating protein interactions under positive selection in cancer. To identify protein interactions under positive selection in cancer, we mapped over 1.2 million nonsynonymous somatic cancer mutations onto 4,896 experimentally determined protein structures and analyzed their spatial distribution. In total, 20% of mutations on the surface of known cancer genes perturbed protein-protein interactions (PPIs), and this enrichment for PPI interfaces was observed for both tumor suppressors (Odds Ratio 1.28, P-value < 10−4) and oncogenes (Odds Ratio 1.17, P-value < 10−3). To study this further, we constructed a bipartite network representing structurally resolved PPIs from all available human complexes in the Protein Data Bank (2,864 proteins, 3,072 PPIs). Analysis of frequently mutated cancer genes within this network revealed that tumor-suppressors, but not oncogenes, are significantly enriched with functional mutations in homo-oligomerization regions (Odds Ratio 3.68, P-Value < 10−8). We present two important examples, TP53 and beta-2-microglobulin, for which the patterns of somatic mutations at interfaces provide insights into specifically perturbed biological circuits. In patients with TP53 mutations, patient survival correlated with the specific interactions that were perturbed. Moreover, we investigated mutations at the interface of protein-nucleotide interactions and observed an unexpected number of missense mutations but not silent mutations occurring within DNA and RNA binding sites. Finally, we provide a resource of 3,072 PPI interfaces ranked according to their mutation rates. Analysis of this list highlights 282 novel candidate cancer genes that encode proteins participating in interactions that are perturbed recurrently across tumors. In summary, mutation of specific protein interactions is an important contributor to tumor heterogeneity and may have important implications for clinical outcomes.
Collapse
Affiliation(s)
- H. Billur Engin
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, United States of America
| | - Jason F. Kreisberg
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, United States of America
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, 9500 Gilman Dr., La Jolla, CA, 92093, United States of America
- * E-mail:
| |
Collapse
|
42
|
Melloni GEM, de Pretis S, Riva L, Pelizzola M, Céol A, Costanza J, Müller H, Zammataro L. LowMACA: exploiting protein family analysis for the identification of rare driver mutations in cancer. BMC Bioinformatics 2016; 17:80. [PMID: 26860319 PMCID: PMC4748640 DOI: 10.1186/s12859-016-0935-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2015] [Accepted: 02/05/2016] [Indexed: 01/18/2023] Open
Abstract
Background The increasing availability of resequencing data has led to a better understanding of the most important genes in cancer development. Nevertheless, the mutational landscape of many tumor types is heterogeneous and encompasses a long tail of potential driver genes that are systematically excluded by currently available methods due to the low frequency of their mutations. We developed LowMACA (Low frequency Mutations Analysis via Consensus Alignment), a method that combines the mutations of various proteins sharing the same functional domains to identify conserved residues that harbor clustered mutations in multiple sequence alignments. LowMACA is designed to visualize and statistically assess potential driver genes through the identification of their mutational hotspots. Results We analyzed the Ras superfamily exploiting the known driver mutations of the trio K-N-HRAS, identifying new putative driver mutations and genes belonging to less known members of the Rho, Rab and Rheb subfamilies. Furthermore, we applied the same concept to a list of known and candidate driver genes, and observed that low confidence genes show similar patterns of mutation compared to high confidence genes of the same protein family. Conclusions LowMACA is a software for the identification of gain-of-function mutations in putative oncogenic families, increasing the amount of information on functional domains and their possible role in cancer. In this context LowMACA emphasizes the role of genes mutated at low frequency otherwise undetectable by classical single gene analysis. LowMACA is an R package available at http://www.bioconductor.org/packages/release/bioc/html/LowMACA.html. It is also available as a GUI standalone downloadable at: https://cgsb.genomics.iit.it/wiki/projects/LowMACA Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0935-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Giorgio E M Melloni
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Stefano de Pretis
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Laura Riva
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Mattia Pelizzola
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Arnaud Céol
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Jole Costanza
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Heiko Müller
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| | - Luca Zammataro
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), Via Adamello 16, 20139, Milan, Italy.
| |
Collapse
|
43
|
Gauthier NP, Reznik E, Gao J, Sumer SO, Schultz N, Sander C, Miller ML. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer. Nucleic Acids Res 2016; 44:D986-91. [PMID: 26590264 PMCID: PMC4702822 DOI: 10.1093/nar/gkv1132] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Revised: 10/10/2015] [Accepted: 10/15/2015] [Indexed: 12/21/2022] Open
Abstract
The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects.
Collapse
Affiliation(s)
- Nicholas Paul Gauthier
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ed Reznik
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jianjiong Gao
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Selcuk Onur Sumer
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nikolaus Schultz
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Chris Sander
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Martin L Miller
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, CB2 0RE, UK
| |
Collapse
|
44
|
|
45
|
Miller ML, Reznik E, Gauthier NP, Aksoy BA, Korkut A, Gao J, Ciriello G, Schultz N, Sander C. Pan-Cancer Analysis of Mutation Hotspots in Protein Domains. Cell Syst 2015; 1:197-209. [PMID: 27135912 PMCID: PMC4982675 DOI: 10.1016/j.cels.2015.08.014] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Revised: 07/05/2015] [Accepted: 08/28/2015] [Indexed: 02/07/2023]
Abstract
In cancer genomics, recurrence of mutations in independent tumor samples is a strong indicator of functional impact. However, rare functional mutations can escape detection by recurrence analysis owing to lack of statistical power. We enhance statistical power by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. Domain mutation analysis also sharpens the functional interpretation of the impact of mutations, as domains more succinctly embody function than entire genes. By mapping mutations in 22 different tumor types to equivalent positions in multiple sequence alignments of domains, we confirm well-known functional mutation hotspots, identify uncharacterized rare variants in one gene that are equivalent to well-characterized mutations in another gene, detect previously unknown mutation hotspots, and provide hypotheses about molecular mechanisms and downstream effects of domain mutations. With the rapid expansion of cancer genomics projects, protein domain hotspot analysis will likely provide many more leads linking mutations in proteins to the cancer phenotype.
Collapse
Affiliation(s)
- Martin L Miller
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA; Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
| | - Ed Reznik
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Nicholas P Gauthier
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Bülent Arman Aksoy
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Anil Korkut
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Jianjiong Gao
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Giovanni Ciriello
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Nikolaus Schultz
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Chris Sander
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
| |
Collapse
|
46
|
Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc Natl Acad Sci U S A 2015; 112:E5486-95. [PMID: 26392535 DOI: 10.1073/pnas.1516373112] [Citation(s) in RCA: 155] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Large-scale tumor sequencing projects enabled the identification of many new cancer gene candidates through computational approaches. Here, we describe a general method to detect cancer genes based on significant 3D clustering of mutations relative to the structure of the encoded protein products. The approach can also be used to search for proteins with an enrichment of mutations at binding interfaces with a protein, nucleic acid, or small molecule partner. We applied this approach to systematically analyze the PanCancer compendium of somatic mutations from 4,742 tumors relative to all known 3D structures of human proteins in the Protein Data Bank. We detected significant 3D clustering of missense mutations in several previously known oncoproteins including HRAS, EGFR, and PIK3CA. Although clustering of missense mutations is often regarded as a hallmark of oncoproteins, we observed that a number of tumor suppressors, including FBXW7, VHL, and STK11, also showed such clustering. Beside these known cases, we also identified significant 3D clustering of missense mutations in NUF2, which encodes a component of the kinetochore, that could affect chromosome segregation and lead to aneuploidy. Analysis of interaction interfaces revealed enrichment of mutations in the interfaces between FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA, SRSF2-RNA, and others. Together, our results indicate that systematic consideration of 3D structure can assist in the identification of cancer genes and in the understanding of the functional role of their mutations.
Collapse
|
47
|
Cheng F, Zhao J, Zhao Z. Advances in computational approaches for prioritizing driver mutations and significantly mutated genes in cancer genomes. Brief Bioinform 2015; 17:642-56. [PMID: 26307061 DOI: 10.1093/bib/bbv068] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Indexed: 12/27/2022] Open
Abstract
Cancer is often driven by the accumulation of genetic alterations, including single nucleotide variants, small insertions or deletions, gene fusions, copy-number variations, and large chromosomal rearrangements. Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data and catalog somatic mutations in both common and rare cancer types. So far, the somatic mutation landscapes and signatures of >10 major cancer types have been reported; however, pinpointing driver mutations and cancer genes from millions of available cancer somatic mutations remains a monumental challenge. To tackle this important task, many methods and computational tools have been developed during the past several years and, thus, a review of its advances is urgently needed. Here, we first summarize the main features of these methods and tools for whole-exome, whole-genome and whole-transcriptome sequencing data. Then, we discuss major challenges like tumor intra-heterogeneity, tumor sample saturation and functionality of synonymous mutations in cancer, all of which may result in false-positive discoveries. Finally, we highlight new directions in studying regulatory roles of noncoding somatic mutations and quantitatively measuring circulating tumor DNA in cancer. This review may help investigators find an appropriate tool for detecting potential driver or actionable mutations in rapidly emerging precision cancer medicine.
Collapse
|