1
|
Desai S, Ahmad S, Bawaskar B, Rashmi S, Mishra R, Lakhwani D, Dutt A. Singleton mutations in large-scale cancer genome studies: uncovering the tail of cancer genome. NAR Cancer 2024; 6:zcae010. [PMID: 38487301 PMCID: PMC10939354 DOI: 10.1093/narcan/zcae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 02/23/2024] [Indexed: 03/17/2024] Open
Abstract
Singleton or low-frequency driver mutations are challenging to identify. We present a domain driver mutation estimator (DOME) to identify rare candidate driver mutations. DOME analyzes positions analogous to known statistical hotspots and resistant mutations in combination with their functional and biochemical residue context as determined by protein structures and somatic mutation propensity within conserved PFAM domains, integrating the CADD scoring scheme. Benchmarked against seven other tools, DOME exhibited superior or comparable accuracy compared to all evaluated tools in the prediction of functional cancer drivers, with the exception of one tool. DOME identified a unique set of 32 917 high-confidence predicted driver mutations from the analysis of whole proteome missense variants within domain boundaries across 1331 genes, including 1192 noncancer gene census genes, emphasizing its unique place in cancer genome analysis. Additionally, analysis of 8799 TCGA (The Cancer Genome Atlas) and in-house tumor samples revealed 847 potential driver mutations, with mutations in tyrosine kinase members forming the dominant burden, underscoring its higher significance in cancer. Overall, DOME complements current approaches for identifying novel, low-frequency drivers and resistant mutations in personalized therapy.
Collapse
Affiliation(s)
- Sanket Desai
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai 400094, Maharashtra, India
| | - Suhail Ahmad
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai 400094, Maharashtra, India
| | - Bhargavi Bawaskar
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Sonal Rashmi
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Rohit Mishra
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Deepika Lakhwani
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
| | - Amit Dutt
- Integrated Cancer Genomics Laboratory, Advanced Centre for Treatment, Research, and Education in Cancer, Kharghar, Navi Mumbai 410210, Maharashtra, India
- Homi Bhabha National Institute, Training School Complex, Anushakti Nagar, Mumbai 400094, Maharashtra, India
- Department of Genetics, University of Delhi, South Campus, New Delhi 110021, India
| |
Collapse
|
2
|
Wiel L, Hampstead JE, Venselaar H, Vissers LE, Brunner HG, Pfundt R, Vriend G, Veltman JA, Gilissen C. De novo mutation hotspots in homologous protein domains identify function-altering mutations in neurodevelopmental disorders. Am J Hum Genet 2023; 110:92-104. [PMID: 36563679 PMCID: PMC9892778 DOI: 10.1016/j.ajhg.2022.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 12/02/2022] [Indexed: 12/24/2022] Open
Abstract
Variant interpretation remains a major challenge in medical genetics. We developed Meta-Domain HotSpot (MDHS) to identify mutational hotspots across homologous protein domains. We applied MDHS to a dataset of 45,221 de novo mutations (DNMs) from 31,058 individuals with neurodevelopmental disorders (NDDs) and identified three significantly enriched missense DNM hotspots in the ion transport protein domain family (PF00520). The 37 unique missense DNMs that drive enrichment affect 25 genes, 19 of which were previously associated with NDDs. 3D protein structure modeling supports the hypothesis of function-altering effects of these mutations. Hotspot genes have a unique expression pattern in tissue, and we used this pattern alongside in silico predictors and population constraint information to identify candidate NDD-associated genes. We also propose a lenient version of our method, which identifies 32 hotspot positions across 16 different protein domains. These positions are enriched for likely pathogenic variation in clinical databases and DNMs in other genetic disorders.
Collapse
Affiliation(s)
- Laurens Wiel
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands,Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands,Department of Medicine, Division of Cardiovascular Medicine, School of Medicine, Stanford University, Stanford, CA, USA
| | - Juliet E. Hampstead
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands
| | - Hanka Venselaar
- Centre for Molecular and Biomolecular Informatics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands
| | - Lisenka E.L.M. Vissers
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands
| | - Han G. Brunner
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands
| | - Rolph Pfundt
- Department of Human Genetics, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands
| | - Gerrit Vriend
- Baco Institute of Protein Science, Baco, 5201 Mindoro, Philippines
| | - Joris A. Veltman
- Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE1 3BZ, UK
| | - Christian Gilissen
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen 6525 GA, the Netherlands,Corresponding author
| |
Collapse
|
3
|
Grillo E, Ravelli C, Corsini M, Zammataro L, Mitola S. Protein domain-based approaches for the identification and prioritization of therapeutically actionable cancer variants. Biochim Biophys Acta Rev Cancer 2021; 1876:188614. [PMID: 34403770 DOI: 10.1016/j.bbcan.2021.188614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Revised: 08/11/2021] [Accepted: 08/11/2021] [Indexed: 01/04/2023]
Abstract
The tremendous number of cancer variants that can be detected by NGS analyses has required the development of computational approaches to prioritize mutations on the basis of their biological and clinical significance. Standard strategies take a gene-centric approach to the problem, allowing exclusively the identification of highly frequent variants. On the contrary, protein domain (PD)-based approaches allow to identify functionally relevant low frequency variants by searching for mutations that recur on analogous residues across homologous proteins (i.e. containing the same PD). Such approaches enable to transfer information about the effects and druggability from one known mutation to unknown ones. Here we describe how PD-based strategies work, and discuss how they could be exploited for mutation prioritization. The principle that mutations clustered on specific residues of PDs have the same functional consequences and are therapeutically actionable in a similar manner could help the choice of patient-specific targeted drugs, eventually improving the management of cancer patients.
Collapse
Affiliation(s)
- Elisabetta Grillo
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.
| | - Cosetta Ravelli
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Michela Corsini
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy
| | - Luca Zammataro
- Division of Artificial Intelligence Systems for Immunoinformatics, Kiromic BioPharma, Inc., Houston, USA
| | - Stefania Mitola
- Department of Molecular and Translational Medicine, University of Brescia, Brescia, Italy.
| |
Collapse
|
4
|
Kan Y, Jiang L, Tang J, Guo Y, Guo F. A systematic view of computational methods for identifying driver genes based on somatic mutation data. Brief Funct Genomics 2021; 20:333-343. [PMID: 34312663 DOI: 10.1093/bfgp/elab032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 06/16/2021] [Accepted: 06/22/2021] [Indexed: 11/13/2022] Open
Abstract
Abnormal changes of driver genes are serious for human health and biomedical research. Identifying driver genes, exactly from enormous genes with mutations, promotes accurate diagnosis and treatment of cancer. A lot of works about uncovering driver genes have been developed over the past decades. By analyzing previous works, we find that computational methods are more efficient than traditional biological experiments when distinguishing driver genes from massive data. In this study, we summarize eight common computational algorithms only using somatic mutation data. We first group these methods into three categories according to mutation features they apply. Then, we conclude a general process of nominating candidate cancer driver genes. Finally, we evaluate three representative methods on 10 kinds of cancer derived from The Cancer Genome Atlas Program and five Chinese projects from the International Cancer Genome Consortium. In addition, we compare results of methods with various parameters. Evaluation is performed from four perspectives, including CGC, OG/TSG, Q-value and QQQuantile-Quantileplot. To sum up, we present algorithms using somatic mutation data in order to offer a systematic view of various mutation features and lay the foundation of methods based on integration of mutation information and other types of data.
Collapse
Affiliation(s)
- Yingxin Kan
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Limin Jiang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China.,Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.,School of Computational Science and Engineering, University of South Carolina, Columbia, U.S
| | - Yan Guo
- Comprehensive cancer center, Department of Internal Medicine, University of New Mexico, Albuquerque, U.S
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
5
|
Gemović B, Perović V, Davidović R, Drljača T, Veljkovic N. Alignment-free method for functional annotation of amino acid substitutions: Application on epigenetic factors involved in hematologic malignancies. PLoS One 2021; 16:e0244948. [PMID: 33395407 PMCID: PMC7781373 DOI: 10.1371/journal.pone.0244948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 12/21/2020] [Indexed: 11/19/2022] Open
Abstract
For the last couple of decades, there has been a significant growth in sequencing data, leading to an extraordinary increase in the number of gene variants. This places a challenge on the bioinformatics research community to develop and improve computational tools for functional annotation of new variants. Genes coding for epigenetic regulators have important roles in cancer pathogenesis and mutations in these genes show great potential as clinical biomarkers, especially in hematologic malignancies. Therefore, we developed a model that specifically focuses on these genes, with an assumption that it would outperform general models in predicting the functional effects of amino acid substitutions. EpiMut is a standalone software that implements a sequence based alignment-free method. We applied a two-step approach for generating sequence based features, relying on the biophysical and biochemical indices of amino acids and the Fourier Transform as a sequence transformation method. For each gene in the dataset, the machine learning algorithm-Naïve Bayes was used for building a model for prediction of the neutral or disease-related status of variants. EpiMut outperformed state-of-the-art tools used for comparison, PolyPhen-2, SIFT and SNAP2. Additionally, EpiMut showed the highest performance on the subset of variants positioned outside conserved functional domains of analysed proteins, which represents an important group of cancer-related variants. These results imply that EpiMut can be applied as a first choice tool in research of the impact of gene variants in epigenetic regulators, especially in the light of the biomarker role in hematologic malignancies. EpiMut is freely available at https://www.vin.bg.ac.rs/180/tools/epimut.php.
Collapse
Affiliation(s)
- Branislava Gemović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
- * E-mail:
| | - Vladimir Perović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Radoslav Davidović
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Tamara Drljača
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
| | - Nevena Veljkovic
- Laboratory for Bioinformatics and Computational Chemistry, Vinča Institute of Nuclear Sciences, National Institute of the Republic of Serbia, University of Belgrade, Belgrade, Serbia
- Heliant d.o.o., Belgrade, Serbia
| |
Collapse
|
6
|
Tang ZZ, Sliwoski GR, Chen G, Jin B, Bush WS, Li B, Capra JA. PSCAN: Spatial scan tests guided by protein structures improve complex disease gene discovery and signal variant detection. Genome Biol 2020; 21:217. [PMID: 32847609 PMCID: PMC7448521 DOI: 10.1186/s13059-020-02121-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 07/27/2020] [Indexed: 12/25/2022] Open
Abstract
Germline disease-causing variants are generally more spatially clustered in protein 3-dimensional structures than benign variants. Motivated by this tendency, we develop a fast and powerful protein-structure-based scan (PSCAN) approach for evaluating gene-level associations with complex disease and detecting signal variants. We validate PSCAN's performance on synthetic data and two real data sets for lipid traits and Alzheimer's disease. Our results demonstrate that PSCAN performs competitively with existing gene-level tests while increasing power and identifying more specific signal variant sets. Furthermore, PSCAN enables generation of hypotheses about the molecular basis for the associations in the context of protein structures and functional domains.
Collapse
Affiliation(s)
- Zheng-Zheng Tang
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, 53715 WI USA
- Wisconsin Institute for Discovery, Madison, 53715 WI USA
| | - Gregory R. Sliwoski
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, 37232 TN USA
| | - Guanhua Chen
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, 53715 WI USA
| | - Bowen Jin
- Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106 OH USA
| | - William S. Bush
- Department for Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, 44106 OH USA
- Institute for Computational Biology, Case Western Reserve University, Cleveland, 44106 OH USA
| | - Bingshan Li
- Department of Molecular Physiology & Biophysics, Vanderbilt University Medical Center, Nashville, 37232 TN USA
| | - John A. Capra
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, 37232 TN USA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, 37232 TN USA
- Departments of Biological Sciences and Computer Science, Vanderbilt University, Nashville, 37232 TN USA
- Center for Structural Biology, Vanderbilt University, Nashville, 37232 TN USA
| |
Collapse
|
7
|
Shim JE, Kim JH, Shin J, Lee JE, Lee I. Pathway-specific protein domains are predictive for human diseases. PLoS Comput Biol 2019; 15:e1007052. [PMID: 31075101 PMCID: PMC6530867 DOI: 10.1371/journal.pcbi.1007052] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 05/22/2019] [Accepted: 04/19/2019] [Indexed: 01/04/2023] Open
Abstract
Protein domains are basic functional units of proteins. Many protein domains are pervasive among diverse biological processes, yet some are associated with specific pathways. Human complex diseases are generally viewed as pathway-level disorders. Therefore, we hypothesized that pathway-specific domains could be highly informative for human diseases. To test the hypothesis, we developed a network-based scoring scheme to quantify specificity of domain-pathway associations. We first generated domain profiles for human proteins, then constructed a co-pathway protein network based on the associations between domain profiles. Based on the score, we classified human protein domains into pathway-specific domains (PSDs) and non-specific domains (NSDs). We found that PSDs contained more pathogenic variants than NSDs. PSDs were also enriched for disease-associated mutations that disrupt protein-protein interactions (PPIs) and tend to have a moderate number of domain interactions. These results suggest that mutations in PSDs are likely to disrupt within-pathway PPIs, resulting in functional failure of pathways. Finally, we demonstrated the prediction capacity of PSDs for disease-associated genes with experimental validations in zebrafish. Taken together, the network-based quantitative method of modeling domain-pathway associations presented herein suggested underlying mechanisms of how protein domains associated with specific pathways influence mutational impacts on diseases via perturbations in within-pathway PPIs, and provided a novel genomic feature for interpreting genetic variants to facilitate the discovery of human disease genes. Protein domains are basic functional units of proteins, yet domain-based pathway annotations for proteins are challenging tasks because many domains are pervasive among diverse pathways. Therefore, we developed a network-based scoring scheme to measure pathway specificity of domains, and then used it to identify pathway-specific domains. Surprisingly, we observed substantially more disease mutations in pathway-specific domains than non-specific domains. We found evidences that mutations of pathway-specific domains tend to perturb pathway integrity via disrupting within-pathway protein-protein interactions. We also demonstrated prediction capacity of pathway-specific domains for complex diseases with experimental validations. Our study demonstrated the usefulness of pathway information for protein domains in interpreting non-random distribution of disease mutations among domains and identification of disease genes and variants.
Collapse
Affiliation(s)
- Jung Eun Shim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- Yonsei Biomedical Research Institute, Yonsei University College of Medicine, Seoul, Korea
| | - Ji Hyun Kim
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
| | - Junha Shin
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
| | - Ji Eun Lee
- Department of Health Sciences and Technology, SAIHST, Sungkyunkwan University, Seoul, Korea
- Samsung Biomedical Research Institute, Samsung Medical Center, Seoul, Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Korea
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
- * E-mail:
| |
Collapse
|
8
|
Marceau West R, Lu W, Rotroff DM, Kuenemann MA, Chang SM, Wu MC, Wagner MJ, Buse JB, Motsinger-Reif AA, Fourches D, Tzeng JY. Identifying individual risk rare variants using protein structure guided local tests (POINT). PLoS Comput Biol 2019; 15:e1006722. [PMID: 30779729 PMCID: PMC6396946 DOI: 10.1371/journal.pcbi.1006722] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 03/01/2019] [Accepted: 12/17/2018] [Indexed: 01/08/2023] Open
Abstract
Rare variants are of increasing interest to genetic association studies because of their etiological contributions to human complex diseases. Due to the rarity of the mutant events, rare variants are routinely analyzed on an aggregate level. While aggregation analyses improve the detection of global-level signal, they are not able to pinpoint causal variants within a variant set. To perform inference on a localized level, additional information, e.g., biological annotation, is often needed to boost the information content of a rare variant. Following the observation that important variants are likely to cluster together on functional domains, we propose a protein structure guided local test (POINT) to provide variant-specific association information using structure-guided aggregation of signal. Constructed under a kernel machine framework, POINT performs local association testing by borrowing information from neighboring variants in the 3-dimensional protein space in a data-adaptive fashion. Besides merely providing a list of promising variants, POINT assigns each variant a p-value to permit variant ranking and prioritization. We assess the selection performance of POINT using simulations and illustrate how it can be used to prioritize individual rare variants in PCSK9, ANGPTL4 and CETP in the Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial data.
Collapse
Affiliation(s)
- Rachel Marceau West
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Daniel M. Rotroff
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, Ohio, United States of America
| | - Melaine A. Kuenemann
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Sheng-Mao Chang
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Michael C. Wu
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Michael J. Wagner
- Center for Pharmacogenomics and Individualized Therapy, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - John B. Buse
- Department of Medicine, University of North Carolina School of Medicine, Chapel Hill, North Carolina, United States of America
| | - Alison A. Motsinger-Reif
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Denis Fourches
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
9
|
Ashford P, Pang CSM, Moya-García AA, Adeyelu T, Orengo CA. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci Rep 2019; 9:263. [PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/13/2018] [Indexed: 12/31/2022] Open
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.
Collapse
Affiliation(s)
- Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Aurelio A Moya-García
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.,Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, Málaga, Spain
| | - Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
10
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
11
|
Wang MH, Weng H, Sun R, Lee J, Wu WKK, Chong KC, Zee BCY. A Zoom-Focus algorithm (ZFA) to locate the optimal testing region for rare variant association tests. Bioinformatics 2018; 33:2330-2336. [PMID: 28334355 DOI: 10.1093/bioinformatics/btx130] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Accepted: 03/09/2017] [Indexed: 01/24/2023] Open
Abstract
Motivation Increasing amounts of whole exome or genome sequencing data present the challenge of analysing rare variants with extremely small minor allele frequencies. Various statistical tests have been proposed, which are specifically configured to increase power for rare variants by conducting the test within a certain bin, such as a gene or a pathway. However, a gene may contain from several to thousands of markers, and not all of them are related to the phenotype. Combining functional and non-functional variants in an arbitrary genomic region could impair the testing power. Results We propose a Zoom-Focus algorithm (ZFA) to locate the optimal testing region within a given genomic region. It can be applied as a wrapper function in existing rare variant association tests to increase testing power. The algorithm consists of two steps. In the first step, Zooming, a given genomic region is partitioned by an order of two, and the best partition is located. In the second step, Focusing, the boundaries of the zoomed region are refined. Simulation studies showed that ZFA substantially increased the statistical power of rare variants' tests, including the SKAT, SKAT-O, burden test and the W-test. The algorithm was applied on real exome sequencing data of hypertensive disorder, and identified biologically relevant genetic markers to metabolic disorders that were undetectable by a gene-based method. The proposed algorithm is an efficient and powerful tool to enhance the power of association study for whole exome or genome sequencing data. Availability and Implementation The ZFA software is available at: http://www2.ccrb.cuhk.edu.hk/statgene/software.html. Contact maggiew@cuhk.edu.hk or bzee@cuhk.edu.hk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maggie Haitian Wang
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Haoyi Weng
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Rui Sun
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Jack Lee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - William Ka Kei Wu
- Department of Anaesthesia and Intensive Care, The Chinese University of Hong Kong, Hong Kong SAR
| | - Ka Chun Chong
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| | - Benny Chung-Ying Zee
- Division of Biostatistics and Centre for Clinical Research and Biostatistics, JC School of Public Health and Primary Care, The Chinese University of Hong Kong, Shatin, N.T, Hong Kong SAR.,CUHK Shenzhen Research Institute, Shenzhen, China
| |
Collapse
|
12
|
Baeissa H, Benstead-Hume G, Richardson CJ, Pearl FMG. Identification and analysis of mutational hotspots in oncogenes and tumour suppressors. Oncotarget 2017; 8:21290-21304. [PMID: 28423505 PMCID: PMC5400584 DOI: 10.18632/oncotarget.15514] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 02/07/2017] [Indexed: 01/25/2023] Open
Abstract
Background The key to interpreting the contribution of a disease-associated mutation in the development and progression of cancer is an understanding of the consequences of that mutation both on the function of the affected protein and on the pathways in which that protein is involved. Protein domains encapsulate function and position-specific domain based analysis of mutations have been shown to help elucidate their phenotypes. Results In this paper we examine the domain biases in oncogenes and tumour suppressors, and find that their domain compositions substantially differ. Using data from over 30 different cancers from whole-exome sequencing cancer genomic projects we mapped over one million mutations to their respective Pfam domains to identify which domains are enriched in any of three different classes of mutation; missense, indels or truncations. Next, we identified the mutational hotspots within domain families by mapping small mutations to equivalent positions in multiple sequence alignments of protein domains We find that gain of function mutations from oncogenes and loss of function mutations from tumour suppressors are normally found in different domain families and when observed in the same domain families, hotspot mutations are located at different positions within the multiple sequence alignment of the domain. Conclusions By considering hotspots in tumour suppressors and oncogenes independently, we find that there are different specific positions within domain families that are particularly suited to accommodate either a loss or a gain of function mutation. The position is also dependent on the class of mutation. We find rare mutations co-located with well-known functional mutation hotspots, in members of homologous domain superfamilies, and we detect novel mutation hotspots in domain families previously unconnected with cancer. The results of this analysis can be accessed through the MOKCa database (http://strubiol.icr.ac.uk/extra/MOKCa).
Collapse
Affiliation(s)
- Hanadi Baeissa
- School of Life Sciences, University of Sussex, Falmer, Brighton, UK
| | | | | | | |
Collapse
|
13
|
Hashemi S, Nowzari Dalini A, Jalali A, Banaei-Moghaddam AM, Razaghi-Moghadam Z. Cancerouspdomains: comprehensive analysis of cancer type-specific recurrent somatic mutations in proteins and domains. BMC Bioinformatics 2017; 18:370. [PMID: 28814324 PMCID: PMC5559820 DOI: 10.1186/s12859-017-1779-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2017] [Accepted: 08/02/2017] [Indexed: 01/19/2023] Open
Abstract
Background Discriminating driver mutations from the ones that play no role in cancer is a severe bottleneck in elucidating molecular mechanisms underlying cancer development. Since protein domains are representatives of functional regions within proteins, mutations on them may disturb the protein functionality. Therefore, studying mutations at domain level may point researchers to more accurate assessment of the functional impact of the mutations. Results This article presents a comprehensive study to map mutations from 29 cancer types to both sequence- and structure-based domains. Statistical analysis was performed to identify candidate domains in which mutations occur with high statistical significance. For each cancer type, the corresponding type-specific domains were distinguished among all candidate domains. Subsequently, cancer type-specific domains facilitated the identification of specific proteins for each cancer type. Besides, performing interactome analysis on specific proteins of each cancer type showed high levels of interconnectivity among them, which implies their functional relationship. To evaluate the role of mitochondrial genes, stem cell-specific genes and DNA repair genes in cancer development, their mutation frequency was determined via further analysis. Conclusions This study has provided researchers with a publicly available data repository for studying both CATH and Pfam domain regions on protein-coding genes. Moreover, the associations between different groups of genes/domains and various cancer types have been clarified. The work is available at http://www.cancerouspdomains.ir. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1779-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | - Adrin Jalali
- Max Planck Institute for Informatics, Saarland Informatics, Campus, 66123, Saarbrücken, Germany
| | | | - Zahra Razaghi-Moghadam
- Faculty of New Sciences and Technologies, University of Tehran, North Kargar St, Tehran, Tehran, 1439957131, Iran.
| |
Collapse
|
14
|
Peterson TA, Gauran IIM, Park J, Park D, Kann MG. Oncodomains: A protein domain-centric framework for analyzing rare variants in tumor samples. PLoS Comput Biol 2017; 13:e1005428. [PMID: 28426665 PMCID: PMC5398485 DOI: 10.1371/journal.pcbi.1005428] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Accepted: 02/28/2017] [Indexed: 12/28/2022] Open
Abstract
The fight against cancer is hindered by its highly heterogeneous nature. Genome-wide sequencing studies have shown that individual malignancies contain many mutations that range from those commonly found in tumor genomes to rare somatic variants present only in a small fraction of lesions. Such rare somatic variants dominate the landscape of genomic mutations in cancer, yet efforts to correlate somatic mutations found in one or few individuals with functional roles have been largely unsuccessful. Traditional methods for identifying somatic variants that drive cancer are 'gene-centric' in that they consider only somatic variants within a particular gene and make no comparison to other similar genes in the same family that may play a similar role in cancer. In this work, we present oncodomain hotspots, a new 'domain-centric' method for identifying clusters of somatic mutations across entire gene families using protein domain models. Our analysis confirms that our approach creates a framework for leveraging structural and functional information encapsulated by protein domains into the analysis of somatic variants in cancer, enabling the assessment of even rare somatic variants by comparison to similar genes. Our results reveal a vast landscape of somatic variants that act at the level of domain families altering pathways known to be involved with cancer such as protein phosphorylation, signaling, gene regulation, and cell metabolism. Due to oncodomain hotspots' unique ability to assess rare variants, we expect our method to become an important tool for the analysis of sequenced tumor genomes, complementing existing methods.
Collapse
Affiliation(s)
- Thomas A. Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
- University of California, San Francisco, Institute for Computational Health Science, San Francisco, California, United States of America
| | - Iris Ivy M. Gauran
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Junyong Park
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - DoHwan Park
- Department of Mathematics and Statistics, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| | - Maricel G. Kann
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, Maryland, United States of America
| |
Collapse
|
15
|
Gallion J, Wilkins AD, Lichtarge O. HUMAN KINASES DISPLAY MUTATIONAL HOTSPOTS AT COGNATE POSITIONS WITHIN CANCER. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 22:414-425. [PMID: 27896994 DOI: 10.1142/9789813207813_0039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The discovery of driver genes is a major pursuit of cancer genomics, usually based on observing the same mutation in different patients. But the heterogeneity of cancer pathways plus the high background mutational frequency of tumor cells often cloud the distinction between less frequent drivers and innocent passenger mutations. Here, to overcome these disadvantages, we grouped together mutations from close kinase paralogs under the hypothesis that cognate mutations may functionally favor cancer cells in similar ways. Indeed, we find that kinase paralogs often bear mutations to the same substituted amino acid at the same aligned positions and with a large predicted Evolutionary Action. Functionally, these high Evolutionary Action, non-random mutations affect known kinase motifs, but strikingly, they do so differently among different kinase types and cancers, consistent with differences in selective pressures. Taken together, these results suggest that cancer pathways may flexibly distribute a dependence on a given functional mutation among multiple close kinase paralogs. The recognition of this "mutational delocalization" of cancer drivers among groups of paralogs is a new phenomena that may help better identify relevant mechanisms and therefore eventually guide personalized therapy.
Collapse
Affiliation(s)
- Jonathan Gallion
- Structural Computational Biology and Molecular Biophysics, Baylor College of Medicine, One Baylor Plaza Houston, TX, 77030, USA†The authors gratefully acknowledge support from the National Institutes of Health (GM066099 and GM079656), from the National Science Foundation (DBI-1356569), and from DARPA (N66001-15-C-4042),
| | | | | |
Collapse
|
16
|
Mutational patterns in oncogenes and tumour suppressors. Biochem Soc Trans 2016; 44:925-31. [DOI: 10.1042/bst20160001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Indexed: 12/24/2022]
Abstract
All cancers depend upon mutations in critical genes, which confer a selective advantage to the tumour cell. Knowledge of these mutations is crucial to understanding the biology of cancer initiation and progression, and to the development of targeted therapeutic strategies. The key to understanding the contribution of a disease-associated mutation to the development and progression of cancer, comes from an understanding of the consequences of that mutation on the function of the affected protein, and the impact on the pathways in which that protein is involved. In this paper we examine the mutation patterns observed in oncogenes and tumour suppressors, and discuss different approaches that have been developed to identify driver mutations within cancers that contribute to the disease progress. We also discuss the MOKCa database where we have developed an automatic pipeline that structurally and functionally annotates all proteins from the human proteome that are mutated in cancer.
Collapse
|
17
|
Gauthier NP, Reznik E, Gao J, Sumer SO, Schultz N, Sander C, Miller ML. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer. Nucleic Acids Res 2016; 44:D986-91. [PMID: 26590264 PMCID: PMC4702822 DOI: 10.1093/nar/gkv1132] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Revised: 10/10/2015] [Accepted: 10/15/2015] [Indexed: 12/21/2022] Open
Abstract
The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects.
Collapse
Affiliation(s)
- Nicholas Paul Gauthier
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ed Reznik
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Jianjiong Gao
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Selcuk Onur Sumer
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nikolaus Schultz
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Chris Sander
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Martin L Miller
- Cancer Research UK, Cambridge Institute, University of Cambridge, Cambridge, CB2 0RE, UK
| |
Collapse
|
18
|
Li J, Drubay D, Michiels S, Gautheret D. Mining the coding and non-coding genome for cancer drivers. Cancer Lett 2015; 369:307-15. [PMID: 26433158 DOI: 10.1016/j.canlet.2015.09.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 09/24/2015] [Accepted: 09/24/2015] [Indexed: 12/20/2022]
Abstract
Progress in next-generation sequencing provides unprecedented opportunities to fully characterize the spectrum of somatic mutations of cancer genomes. Given the large number of somatic mutations identified by such technologies, the prioritization of cancer-driving events is a consistent bottleneck. Most bioinformatics tools concentrate on driver mutations in the coding fraction of the genome, those causing changes in protein products. As more non-coding pathogenic variants are identified and characterized, the development of computational approaches to effectively prioritize cancer-driving variants within the non-coding fraction of human genome is becoming critical. After a short summary of methods for coding variant prioritization, we here review the highly diverse non-coding elements that may act as cancer drivers and describe recent methods that attempt to evaluate the deleteriousness of sequence variation in these elements. With such tools, the prioritization and identification of cancer-implicated regulatory elements and non-coding RNAs is becoming a reality.
Collapse
Affiliation(s)
- Jia Li
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Université Paris-Sud, Université Paris-Saclay, 91198 Gif sur Yvette, France
| | - Damien Drubay
- Service de Biostatistique et d'Epidemiologie, Gustave Roussy, Villejuif, France; INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, France
| | - Stefan Michiels
- Service de Biostatistique et d'Epidemiologie, Gustave Roussy, Villejuif, France; INSERM U1018, CESP, Université Paris-Sud, Université Paris-Saclay, Villejuif, France
| | - Daniel Gautheret
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Université Paris-Sud, Université Paris-Saclay, 91198 Gif sur Yvette, France.
| |
Collapse
|
19
|
Miller ML, Reznik E, Gauthier NP, Aksoy BA, Korkut A, Gao J, Ciriello G, Schultz N, Sander C. Pan-Cancer Analysis of Mutation Hotspots in Protein Domains. Cell Syst 2015; 1:197-209. [PMID: 27135912 PMCID: PMC4982675 DOI: 10.1016/j.cels.2015.08.014] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Revised: 07/05/2015] [Accepted: 08/28/2015] [Indexed: 02/07/2023]
Abstract
In cancer genomics, recurrence of mutations in independent tumor samples is a strong indicator of functional impact. However, rare functional mutations can escape detection by recurrence analysis owing to lack of statistical power. We enhance statistical power by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. Domain mutation analysis also sharpens the functional interpretation of the impact of mutations, as domains more succinctly embody function than entire genes. By mapping mutations in 22 different tumor types to equivalent positions in multiple sequence alignments of domains, we confirm well-known functional mutation hotspots, identify uncharacterized rare variants in one gene that are equivalent to well-characterized mutations in another gene, detect previously unknown mutation hotspots, and provide hypotheses about molecular mechanisms and downstream effects of domain mutations. With the rapid expansion of cancer genomics projects, protein domain hotspot analysis will likely provide many more leads linking mutations in proteins to the cancer phenotype.
Collapse
Affiliation(s)
- Martin L Miller
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA; Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK.
| | - Ed Reznik
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Nicholas P Gauthier
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Bülent Arman Aksoy
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Anil Korkut
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Jianjiong Gao
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Giovanni Ciriello
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Nikolaus Schultz
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA
| | - Chris Sander
- Computational Biology Program, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
| |
Collapse
|
20
|
Turner TN, Douville C, Kim D, Stenson PD, Cooper DN, Chakravarti A, Karchin R. Proteins linked to autosomal dominant and autosomal recessive disorders harbor characteristic rare missense mutation distribution patterns. Hum Mol Genet 2015; 24:5995-6002. [PMID: 26246501 DOI: 10.1093/hmg/ddv309] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 07/28/2015] [Indexed: 01/27/2023] Open
Abstract
The role of rare missense variants in disease causation remains difficult to interpret. We explore whether the clustering pattern of rare missense variants (MAF < 0.01) in a protein is associated with mode of inheritance. Mutations in genes associated with autosomal dominant (AD) conditions are known to result in either loss or gain of function, whereas mutations in genes associated with autosomal recessive (AR) conditions invariably result in loss-of-function. Loss-of-function mutations tend to be distributed uniformly along protein sequence, whereas gain-of-function mutations tend to localize to key regions. It has not previously been ascertained whether these patterns hold in general for rare missense mutations. We consider the extent to which rare missense variants are located within annotated protein domains and whether they form clusters, using a new unbiased method called CLUstering by Mutation Position. These approaches quantified a significant difference in clustering between AD and AR diseases. Proteins linked to AD diseases exhibited more clustering of rare missense mutations than those linked to AR diseases (Wilcoxon P = 5.7 × 10(-4), permutation P = 8.4 × 10(-4)). Rare missense mutation in proteins linked to either AD or AR diseases was more clustered than controls (1000G) (Wilcoxon P = 2.8 × 10(-15) for AD and P = 4.5 × 10(-4) for AR, permutation P = 3.1 × 10(-12) for AD and P = 0.03 for AR). The differences in clustering patterns persisted even after removal of the most prominent genes. Testing for such non-random patterns may reveal novel aspects of disease etiology in large sample studies.
Collapse
Affiliation(s)
- Tychele N Turner
- Predoctoral Training Program in Human Genetics and Molecular Biology, McKusick-Nathans Institute of Genetic Medicine, Center for Complex Disease Genomics
| | - Christopher Douville
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21210, USA and
| | - Dewey Kim
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21210, USA and
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | | | - Rachel Karchin
- Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21210, USA and Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA,
| |
Collapse
|
21
|
Ferguson BD, Carol Tan YH, Kanteti RS, Liu R, Gayed MJ, Vokes EE, Ferguson MK, John Iafrate A, Gill PS, Salgia R. Novel EPHB4 Receptor Tyrosine Kinase Mutations and Kinomic Pathway Analysis in Lung Cancer. Sci Rep 2015; 5:10641. [PMID: 26073592 PMCID: PMC4466581 DOI: 10.1038/srep10641] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2014] [Accepted: 04/28/2015] [Indexed: 12/11/2022] Open
Abstract
Lung cancer outcomes remain poor despite the identification of several potential therapeutic targets. The EPHB4 receptor tyrosine kinase (RTK) has recently emerged as an oncogenic factor in many cancers, including lung cancer. Mutations of EPHB4 in lung cancers have previously been identified, though their significance remains unknown. Here, we report the identification of novel EPHB4 mutations that lead to putative structural alterations as well as increased cellular proliferation and motility. We also conducted a bioinformatic analysis of these mutations to demonstrate that they are mutually exclusive from other common RTK variants in lung cancer, that they correspond to analogous sites of other RTKs’ variations in cancers, and that they are predicted to be oncogenic based on biochemical, evolutionary, and domain-function constraints. Finally, we show that EPHB4 mutations can induce broad changes in the kinome signature of lung cancer cells. Taken together, these data illuminate the role of EPHB4 in lung cancer and further identify EPHB4 as a potentially important therapeutic target.
Collapse
Affiliation(s)
- Benjamin D Ferguson
- Department of Surgery, University of Chicago, Chicago, Illinois, United States of America
| | - Yi-Hung Carol Tan
- Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, Illinois, United States of America
| | - Rajani S Kanteti
- Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, Illinois, United States of America
| | - Ren Liu
- Department of Medicine, Division of Medical Oncology, University of Southern California, Los Angeles, California, United States of America
| | - Matthew J Gayed
- Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, Illinois, United States of America
| | - Everett E Vokes
- Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, Illinois, United States of America
| | - Mark K Ferguson
- Department of Surgery, University of Chicago, Chicago, Illinois, United States of America.,Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
| | - A John Iafrate
- Department of Pathology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Parkash S Gill
- Department of Medicine, Division of Medical Oncology, University of Southern California, Los Angeles, California, United States of America
| | - Ravi Salgia
- Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, Illinois, United States of America.,Comprehensive Cancer Center, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
22
|
McCallum KJ, Ionita-Laza I. Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies. Biometrics 2015; 71:1111-20. [PMID: 26033425 DOI: 10.1111/biom.12331] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2014] [Revised: 03/01/2015] [Accepted: 03/01/2015] [Indexed: 12/30/2022]
Abstract
Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes.
Collapse
Affiliation(s)
- Kenneth J McCallum
- Department of Biostatistics, Columbia University, New York, New York 10032, U.S.A
| | - Iuliana Ionita-Laza
- Department of Biostatistics, Columbia University, New York, New York 10032, U.S.A
| |
Collapse
|
23
|
Teer JK. An improved understanding of cancer genomics through massively parallel sequencing. Transl Cancer Res 2014; 3:243-259. [PMID: 26146607 PMCID: PMC4486294 DOI: 10.3978/j.issn.2218-676x.2014.05.05] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
DNA sequencing technology advances have enabled genetic investigation of more samples in a shorter time than has previously been possible. Furthermore, the ability to analyze and understand large sequencing datasets has improved due to concurrent advances in sequence data analysis methods and software tools. Constant improvements to both technology and analytic approaches in this fast moving field are evidenced by many recent publications of computational methods, as well as biological results linking genetic events to human disease. Cancer in particular has been the subject of intense investigation, owing to the genetic underpinnings of this complex collection of diseases. New massively-parallel sequencing (MPS) technologies have enabled the investigation of thousands of samples, divided across tens of different tumor types, resulting in new driver gene identification, mutagenic pattern characterization, and other newly uncovered features of tumor biology. This review will focus both on methods and recent results: current analytical approaches to DNA and RNA sequencing will be presented followed by a review of recent pan-cancer sequencing studies. This overview of methods and results will not only highlight the recent advances in cancer genomics, but also the methods and tools used to accomplish these advancements in a constantly and rapidly improving field.
Collapse
Affiliation(s)
- Jamie K Teer
- , H. Lee Moffitt Cancer Center and Research Institute, 12902 Magnolia Dr., Tampa, FL 33612, Tel: 813-745-2650
| |
Collapse
|
24
|
Gemovic B, Perovic V, Glisic S, Veljkovic N. Feature-based classification of amino acid substitutions outside conserved functional protein domains. ScientificWorldJournal 2013; 2013:948617. [PMID: 24348198 PMCID: PMC3855963 DOI: 10.1155/2013/948617] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Accepted: 09/24/2013] [Indexed: 01/01/2023] Open
Abstract
There are more than 500 amino acid substitutions in each human genome, and bioinformatics tools irreplaceably contribute to determination of their functional effects. We have developed feature-based algorithm for the detection of mutations outside conserved functional domains (CFDs) and compared its classification efficacy with the most commonly used phylogeny-based tools, PolyPhen-2 and SIFT. The new algorithm is based on the informational spectrum method (ISM), a feature-based technique, and statistical analysis. Our dataset contained neutral polymorphisms and mutations associated with myeloid malignancies from epigenetic regulators ASXL1, DNMT3A, EZH2, and TET2. PolyPhen-2 and SIFT had significantly lower accuracies in predicting the effects of amino acid substitutions outside CFDs than expected, with especially low sensitivity. On the other hand, only ISM algorithm showed statistically significant classification of these sequences. It outperformed PolyPhen-2 and SIFT by 15% and 13%, respectively. These results suggest that feature-based methods, like ISM, are more suitable for the classification of amino acid substitutions outside CFDs than phylogeny-based tools.
Collapse
Affiliation(s)
- Branislava Gemovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Vladimir Perovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Sanja Glisic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| | - Nevena Veljkovic
- Centre for Multidisciplinary Research and Engineering, Vinca Institute of Nuclear Sciences, University of Belgrade, 12-14 Mihajla Petrovica Alasa, 11001 Belgrade, Serbia
| |
Collapse
|
25
|
Tamborero D, Gonzalez-Perez A, Lopez-Bigas N. OncodriveCLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. ACTA ACUST UNITED AC 2013; 29:2238-44. [PMID: 23884480 DOI: 10.1093/bioinformatics/btt395] [Citation(s) in RCA: 303] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Gain-of-function mutations often cluster in specific protein regions, a signal that those mutations provide an adaptive advantage to cancer cells and consequently are positively selected during clonal evolution of tumours. We sought to determine the overall extent of this feature in cancer and the possibility to use this feature to identify drivers. RESULTS We have developed OncodriveCLUST, a method to identify genes with a significant bias towards mutation clustering within the protein sequence. This method constructs the background model by assessing coding-silent mutations, which are assumed not to be under positive selection and thus may reflect the baseline tendency of somatic mutations to be clustered. OncodriveCLUST analysis of the Catalogue of Somatic Mutations in Cancer retrieved a list of genes enriched by the Cancer Gene Census, prioritizing those with dominant phenotypes but also highlighting some recessive cancer genes, which showed wider but still delimited mutation clusters. Assessment of datasets from The Cancer Genome Atlas demonstrated that OncodriveCLUST selected cancer genes that were nevertheless missed by methods based on frequency and functional impact criteria. This stressed the benefit of combining approaches based on complementary principles to identify driver mutations. We propose OncodriveCLUST as an effective tool for that purpose. AVAILABILITY OncodriveCLUST has been implemented as a Python script and is freely available from http://bg.upf.edu/oncodriveclust CONTACT nuria.lopez@upf.edu or abel.gonzalez@upf.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Tamborero
- Research Unit on Biomedical Informatics, Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Dr. Aiguader 88, 08003 Barcelona and Institució Catalana de Recerca i Estudis Avançats ICREA, Passeig Lluis Companys, 23, 08010 Barcelona, Spain
| | | | | |
Collapse
|
26
|
Peterson TA, Park D, Kann MG. A protein domain-centric approach for the comparative analysis of human and yeast phenotypically relevant mutations. BMC Genomics 2013; 14 Suppl 3:S5. [PMID: 23819456 PMCID: PMC3665522 DOI: 10.1186/1471-2164-14-s3-s5] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background The body of disease mutations with known phenotypic relevance continues to increase and is expected to do so even faster with the advent of new experimental techniques such as whole-genome sequencing coupled with disease association studies. However, genomic association studies are limited by the molecular complexity of the phenotype being studied and the population size needed to have adequate statistical power. One way to circumvent this problem, which is critical for the study of rare diseases, is to study the molecular patterns emerging from functional studies of existing disease mutations. Current gene-centric analyses to study mutations in coding regions are limited by their inability to account for the functional modularity of the protein. Previous studies of the functional patterns of known human disease mutations have shown a significant tendency to cluster at protein domain positions, namely position-based domain hotspots of disease mutations. However, the limited number of known disease mutations remains the main factor hindering the advancement of mutation studies at a functional level. In this paper, we address this problem by incorporating mutations known to be disruptive of phenotypes in other species. Focusing on two evolutionarily distant organisms, human and yeast, we describe the first inter-species analysis of mutations of phenotypic relevance at the protein domain level. Results The results of this analysis reveal that phenotypic mutations from yeast cluster at specific positions on protein domains, a characteristic previously revealed to be displayed by human disease mutations. We found over one hundred domain hotspots in yeast with approximately 50% in the exact same domain position as known human disease mutations. Conclusions We describe an analysis using protein domains as a framework for transferring functional information by studying domain hotspots in human and yeast and relating phenotypic changes in yeast to diseases in human. This first-of-a-kind study of phenotypically relevant yeast mutations in relation to human disease mutations demonstrates the utility of a multi-species analysis for advancing the understanding of the relationship between genetic mutations and phenotypic changes at the organismal level.
Collapse
Affiliation(s)
- Thomas A Peterson
- Department of Biological Sciences, University of Maryland, Baltimore County, Baltimore, MD, USA
| | | | | |
Collapse
|
27
|
Assessment of computational methods for predicting the effects of missense mutations in human cancers. BMC Genomics 2013; 14 Suppl 3:S7. [PMID: 23819521 PMCID: PMC3665581 DOI: 10.1186/1471-2164-14-s3-s7] [Citation(s) in RCA: 125] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Recent advances in sequencing technologies have greatly increased the identification of mutations in cancer genomes. However, it remains a significant challenge to identify cancer-driving mutations, since most observed missense changes are neutral passenger mutations. Various computational methods have been developed to predict the effects of amino acid substitutions on protein function and classify mutations as deleterious or benign. These include approaches that rely on evolutionary conservation, structural constraints, or physicochemical attributes of amino acid substitutions. Here we review existing methods and further examine eight tools: SIFT, PolyPhen2, Condel, CHASM, mCluster, logRE, SNAP, and MutationAssessor, with respect to their coverage, accuracy, availability and dependence on other tools. RESULTS Single nucleotide polymorphisms with high minor allele frequencies were used as a negative (neutral) set for testing, and recurrent mutations from the COSMIC database as well as novel recurrent somatic mutations identified in very recent cancer studies were used as positive (non-neutral) sets. Conservation-based methods generally had moderately high accuracy in distinguishing neutral from deleterious mutations, whereas the performance of machine learning based predictors with comprehensive feature spaces varied between assessments using different positive sets. MutationAssessor consistently provided the highest accuracies. For certain combinations metapredictors slightly improved the performance of included individual methods, but did not outperform MutationAssessor as stand-alone tool. CONCLUSIONS Our independent assessment of existing tools reveals various performance disparities. Cancer-trained methods did not improve upon more general predictors. No method or combination of methods exceeds 81% accuracy, indicating there is still significant room for improvement for driver mutation prediction, and perhaps more sophisticated feature integration is needed to develop a more robust tool.
Collapse
|
28
|
Recurrent R-spondin fusions in colon cancer. Nature 2012; 488:660-4. [PMID: 22895193 DOI: 10.1038/nature11282] [Citation(s) in RCA: 743] [Impact Index Per Article: 61.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 06/06/2012] [Indexed: 12/15/2022]
Abstract
Identifying and understanding changes in cancer genomes is essential for the development of targeted therapeutics. Here we analyse systematically more than 70 pairs of primary human colon tumours by applying next-generation sequencing to characterize their exomes, transcriptomes and copy-number alterations. We have identified 36,303 protein-altering somatic changes that include several new recurrent mutations in the Wnt pathway gene TCF7L2, chromatin-remodelling genes such as TET2 and TET3 and receptor tyrosine kinases including ERBB3. Our analysis for significantly mutated cancer genes identified 23 candidates, including the cell cycle checkpoint kinase ATM. Copy-number and RNA-seq data analysis identified amplifications and corresponding overexpression of IGF2 in a subset of colon tumours. Furthermore, using RNA-seq data we identified multiple fusion transcripts including recurrent gene fusions involving R-spondin family members RSPO2 and RSPO3 that together occur in 10% of colon tumours. The RSPO fusions were mutually exclusive with APC mutations, indicating that they probably have a role in the activation of Wnt signalling and tumorigenesis. Consistent with this we show that the RSPO fusion proteins were capable of potentiating Wnt signalling. The R-spondin gene fusions and several other gene mutations identified in this study provide new potential opportunities for therapeutic intervention in colon cancer.
Collapse
|
29
|
Liu J, Lee W, Jiang Z, Chen Z, Jhunjhunwala S, Haverty PM, Gnad F, Guan Y, Gilbert HN, Stinson J, Klijn C, Guillory J, Bhatt D, Vartanian S, Walter K, Chan J, Holcomb T, Dijkgraaf P, Johnson S, Koeman J, Minna JD, Gazdar AF, Stern HM, Hoeflich KP, Wu TD, Settleman J, de Sauvage FJ, Gentleman RC, Neve RM, Stokoe D, Modrusan Z, Seshagiri S, Shames DS, Zhang Z. Genome and transcriptome sequencing of lung cancers reveal diverse mutational and splicing events. Genome Res 2012; 22:2315-27. [PMID: 23033341 PMCID: PMC3514662 DOI: 10.1101/gr.140988.112] [Citation(s) in RCA: 153] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Lung cancer is a highly heterogeneous disease in terms of both underlying genetic lesions and response to therapeutic treatments. We performed deep whole-genome sequencing and transcriptome sequencing on 19 lung cancer cell lines and three lung tumor/normal pairs. Overall, our data show that cell line models exhibit similar mutation spectra to human tumor samples. Smoker and never-smoker cancer samples exhibit distinguishable patterns of mutations. A number of epigenetic regulators, including KDM6A, ASH1L, SMARCA4, and ATAD2, are frequently altered by mutations or copy number changes. A systematic survey of splice-site mutations identified 106 splice site mutations associated with cancer specific aberrant splicing, including mutations in several known cancer-related genes. RAC1b, an isoform of the RAC1 GTPase that includes one additional exon, was found to be preferentially up-regulated in lung cancer. We further show that its expression is significantly associated with sensitivity to a MAP2K (MEK) inhibitor PD-0325901. Taken together, these data present a comprehensive genomic landscape of a large number of lung cancer samples and further demonstrate that cancer-specific alternative splicing is a widespread phenomenon that has potential utility as therapeutic biomarkers. The detailed characterizations of the lung cancer cell lines also provide genomic context to the vast amount of experimental data gathered for these lines over the decades, and represent highly valuable resources for cancer biology.
Collapse
Affiliation(s)
- Jinfeng Liu
- Department of Bioinformatics and Computational Biology
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Background Large-scale tumor sequencing projects are now underway to identify genetic mutations that drive tumor initiation and development. Most studies take a gene-based approach to identifying driver mutations, highlighting genes mutated in a large percentage of tumor samples as those likely to contain driver mutations. However, this gene-based approach usually does not consider the position of the mutation within the gene or the functional context the position of the mutation provides. Here we introduce a novel method for mapping mutations to distinct protein domains, not just individual genes, in which they occur, thus providing the functional context for how the mutation contributes to disease. Furthermore, aggregating mutations from all genes containing a specific protein domain enables the identification of mutations that are rare at the gene level, but that occur frequently within the specified domain. These highly mutated domains potentially reveal disruptions of protein function necessary for cancer development. Results We mapped somatic mutations from the protein coding regions of 100 colon adenocarcinoma tumor samples to the genes and protein domains in which they occurred, and constructed topographical maps to depict the “mutational landscapes” of gene and domain mutation frequencies. We found significant mutation frequency in a number of genes previously known to be somatically mutated in colon cancer patients including APC, TP53 and KRAS. In addition, we found significant mutation frequency within specific domains located in these genes, as well as within other domains contained in genes having low mutation frequencies. These domain “peaks” were enriched with functions important to cancer development including kinase activity, DNA binding and repair, and signal transduction. Conclusions Using our method to create the domain landscapes of mutations in colon cancer, we were able to identify somatic mutations with high potential to drive cancer development. Interestingly, the majority of the genes involved have a low mutation frequency. Therefore, themethod shows good potential for identifying rare driver mutations in current, large-scale tumor sequencing projects. In addition, mapping mutations to specific domains provides the necessary functional context for understanding how the mutations contribute to the disease, and may reveal novel or more refined gene and domain target regions for drug development.
Collapse
Affiliation(s)
- Nathan L Nehrt
- Department of Biological Sciences, University of Maryland, Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, USA.
| | | | | | | |
Collapse
|
31
|
Ionita-Laza I, Makarov V, Buxbaum JD. Scan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets. Am J Hum Genet 2012; 90:1002-13. [PMID: 22578327 DOI: 10.1016/j.ajhg.2012.04.010] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 02/27/2012] [Accepted: 04/19/2012] [Indexed: 01/20/2023] Open
Abstract
Cluster-detection approaches, commonly used in epidemiology and astronomy, can be applied in the context of genetic sequence data for the identification of genetic regions significantly enriched with rare disease-risk variants (DRVs). Unlike existing association tests for sequence data, the goal of cluster-detection methods is to localize significant disease mutation clusters within a gene or region of interest. Here, we focus on a chromosome 2q replicated linkage region that is associated with autism spectrum disorder (ASD) and that has been sequenced in three independent datasets. We found that variants in one gene, LRP2, residing on 2q are associated with ASD in two datasets (the combined variable-threshold-test p value is 1.2 × 10(-5)). Using a cluster-detection method, we show that in the discovery and replication datasets, variants associated with ASD cluster preponderantly in 25 kb windows (adjusted p values are p(1) = 0.003 and p(2) = 0.002), and the two windows are highly overlapping. Furthermore, for the third dataset, a 25 kb region similar to those in the other two datasets shows significant evidence of enrichment of rare DRVs. The region implicated by all three studies is involved in ligand binding, suggesting that subtle alterations in either LRP2 expression or LRP2 primary sequence modulate the uptake of LRP2 ligands. BMP4 is a ligand of particular interest given its role in forebrain development, and modest changes in BMP4 binding, which binds to LRP2 near the mutation cluster, might subtly affect development and could lead to autism-associated phenotypes.
Collapse
|
32
|
Peterson TA, Nehrt NL, Park D, Kann MG. Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer. J Am Med Inform Assoc 2012; 19:275-83. [PMID: 22319177 DOI: 10.1136/amiajnl-2011-000655] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND AND OBJECTIVE With recent breakthroughs in high-throughput sequencing, identifying deleterious mutations is one of the key challenges for personalized medicine. At the gene and protein level, it has proven difficult to determine the impact of previously unknown variants. A statistical method has been developed to assess the significance of disease mutation clusters on protein domains by incorporating domain functional annotations to assist in the functional characterization of novel variants. METHODS Disease mutations aggregated from multiple databases were mapped to domains, and were classified as either cancer- or non-cancer-related. The statistical method for identifying significantly disease-associated domain positions was applied to both sets of mutations and to randomly generated mutation sets for comparison. To leverage the known function of protein domain regions, the method optionally distributes significant scores to associated functional feature positions. RESULTS Most disease mutations are localized within protein domains and display a tendency to cluster at individual domain positions. The method identified significant disease mutation hotspots in both the cancer and non-cancer datasets. The domain significance scores (DS-scores) for cancer form a bimodal distribution with hotspots in oncogenes forming a second peak at higher DS-scores than non-cancer, and hotspots in tumor suppressors have scores more similar to non-cancers. In addition, on an independent mutation benchmarking set, the DS-score method identified mutations known to alter protein function with very high precision. CONCLUSION By aggregating mutations with known disease association at the domain level, the method was able to discover domain positions enriched with multiple occurrences of deleterious mutations while incorporating relevant functional annotations. The method can be incorporated into translational bioinformatics tools to characterize rare and novel variants within large-scale sequencing studies.
Collapse
Affiliation(s)
- Thomas A Peterson
- University of Maryland, Baltimore County, Baltimore, Maryland 21250, USA
| | | | | | | |
Collapse
|
33
|
Greenfield EM, Tatro JM, Smith MV, Schnaser EA, Wu D. PI3Kγ deletion reduces variability in the in vivo osteolytic response induced by orthopaedic wear particles. J Orthop Res 2011; 29:1649-53. [PMID: 21538508 PMCID: PMC3338193 DOI: 10.1002/jor.21440] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Accepted: 03/31/2011] [Indexed: 02/06/2023]
Abstract
Orthopedic wear particles activate a number of intracellular signaling pathways associated with inflammation in macrophages and we have previously shown that the phosphoinositol-3-kinase (PI3K)/Akt pathway is one of the signal transduction pathways that mediates the in vitro activation of macrophages by orthopedic wear particles. Since PI3Kγ is primarily responsible for PI3K activity during inflammation, we hypothesized that PI3Kγ mediates particle-induced osteolysis in vivo. Our results do not strongly support the hypothesis that PI3Kγ regulates the overall amount of particle-induced osteolysis in the murine calvarial model. However, our results strongly support the conclusion that variability in the amount of particle-induced osteolysis between individual mice is reduced in the PI3Kγ(-/-) mice. These results suggest that PI3Kγ contributes to osteolysis to different degrees in individual mice and that the mice, and patients, that are most susceptible to osteolysis may be so, in part, due to an increased contribution from PI3Kγ.
Collapse
Affiliation(s)
- Edward M. Greenfield
- Department of Orthopaedics, Case Western Reserve University, University Hospitals Case Medical Center, Biomedical Research Building, Room 331, 2109 Adelbert Road, Cleveland, Ohio 44106,Department of Pathology, Case Western Reserve University, Cleveland, Ohio
| | - Joscelyn M. Tatro
- Department of Orthopaedics, Case Western Reserve University, University Hospitals Case Medical Center, Biomedical Research Building, Room 331, 2109 Adelbert Road, Cleveland, Ohio 44106
| | - Matthew V. Smith
- Department of Orthopaedics, Washington University, St. Louis, Missouri
| | - Erik A. Schnaser
- Department of Orthopaedics, Case Western Reserve University, University Hospitals Case Medical Center, Biomedical Research Building, Room 331, 2109 Adelbert Road, Cleveland, Ohio 44106
| | - Dianqing Wu
- Vascular Biology and Therapeutics Program, Yale University, New Haven, Connecticut,Department of Pharmacology, Yale University, New Haven, Connecticut
| |
Collapse
|
34
|
Shi Z, Moult J. Structural and functional impact of cancer-related missense somatic mutations. J Mol Biol 2011; 413:495-512. [PMID: 21763698 DOI: 10.1016/j.jmb.2011.06.046] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2011] [Revised: 05/13/2011] [Accepted: 06/28/2011] [Indexed: 01/11/2023]
Abstract
A number of large-scale cancer somatic genome sequencing projects are now identifying genetic alterations in cancers. Evaluation of the effects of these mutations is essential for understanding their contribution to tumorigenesis. We have used SNPs3D, a software suite originally developed for analyzing nonsynonymous germ-line variants, to identify single-base mutations with a high impact on protein structure and function. Two machine learning methods are used: one identifying mutations that destabilize protein three-dimensional structure and the other utilizing sequence conservation and detecting all types of effects on in vivo protein function. Incorporation of detailed structure information into the analysis allows detailed interpretation of the functional effects of mutations in specific cases. Data from a set of breast and colorectal tumors were analyzed. In known cancer genes, mutations approaching 100% of mutations are found to impact protein function, supporting the view that these methods are appropriate for identifying driver mutations. Overall, 50-60% of all somatic missense mutations are predicted to have a high impact on structural stability or to more generally affect the function of the corresponding proteins. This value is similar to the fraction of all possible missense mutations that have a high impact and is much higher than the corresponding one for human population single-nucleotide polymorphisms, at about 30%. The majority of mutations in tumor suppressors destabilize protein structure, while mutations in oncogenes operate in more varied ways, including destabilization of less active conformational states. The set of high-impact mutations encompasses the possible drivers.
Collapse
Affiliation(s)
- Zhen Shi
- Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, MD 20850, USA
| | | |
Collapse
|
35
|
Stehr H, Jang SHJ, Duarte JM, Wierling C, Lehrach H, Lappe M, Lange BMH. The structural impact of cancer-associated missense mutations in oncogenes and tumor suppressors. Mol Cancer 2011; 10:54. [PMID: 21575214 PMCID: PMC3123651 DOI: 10.1186/1476-4598-10-54] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2011] [Accepted: 05/16/2011] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Current large-scale cancer sequencing projects have identified large numbers of somatic mutations covering an increasing number of different cancer tissues and patients. However, the characterization of these mutations at the structural and functional level remains a challenge. RESULTS We present results from an analysis of the structural impact of frequent missense cancer mutations using an automated method. We find that inactivation of tumor suppressors in cancer correlates frequently with destabilizing mutations preferably in the core of the protein, while enhanced activity of oncogenes is often linked to specific mutations at functional sites. Furthermore, our results show that this alteration of oncogenic activity is often associated with mutations at ATP or GTP binding sites. CONCLUSIONS With our findings we can confirm and statistically validate the hypotheses for the gain-of-function and loss-of-function mechanisms of oncogenes and tumor suppressors, respectively. We show that the distinct mutational patterns can potentially be used to pre-classify newly identified cancer-associated genes with yet unknown function.
Collapse
Affiliation(s)
- Henning Stehr
- Max-Planck Institute for Molecular Genetics, Structural Proteomics/Bioinformatics Group, Otto-Warburg Laboratory, Boltzmannstrasse 12, 14195 Berlin, Germany
| | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
A key goal in cancer research is to find the genomic alterations that underlie malignant cells. Genomics has proved successful in identifying somatic variants at a large scale. However, it has become evident that a typical cancer exhibits a heterogenous mutation pattern across samples. Cases where the same alteration is observed repeatedly seem to be the exception rather than the norm. Thus, pinpointing the key alterations (driver mutations) from a background of variations with no direct causal link to cancer (passenger mutations) is difficult. Here we analyze somatic missense mutations from cancer samples and their healthy tissue counterparts (germline mutations) from the viewpoint of germline fitness. We calibrate a scoring system from protein domain alignments to score mutations and their target loci. We show first that this score predicts to a good degree the rate of polymorphism of the observed germline variation. The scoring is then applied to somatic mutations. We show that candidate cancer genes prone to copy number loss harbor mutations with germline fitness effects that are significantly more deleterious than expected by chance. This suggests that missense mutations play a driving role in tumor suppressor genes. Furthermore, these mutations fall preferably onto loci in sequence neighborhoods that are high scoring in terms of germline fitness. In contrast, for somatic mutations in candidate onco genes we do not observe a statistically significant effect. These results help to inform how to exploit germline fitness predictions in discovering new genes and mutations responsible for cancer.
Collapse
|
37
|
Diverse somatic mutation patterns and pathway alterations in human cancers. Nature 2010; 466:869-73. [PMID: 20668451 DOI: 10.1038/nature09208] [Citation(s) in RCA: 798] [Impact Index Per Article: 57.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2009] [Accepted: 05/27/2010] [Indexed: 12/24/2022]
Abstract
The systematic characterization of somatic mutations in cancer genomes is essential for understanding the disease and for developing targeted therapeutics. Here we report the identification of 2,576 somatic mutations across approximately 1,800 megabases of DNA representing 1,507 coding genes from 441 tumours comprising breast, lung, ovarian and prostate cancer types and subtypes. We found that mutation rates and the sets of mutated genes varied substantially across tumour types and subtypes. Statistical analysis identified 77 significantly mutated genes including protein kinases, G-protein-coupled receptors such as GRM8, BAI3, AGTRL1 (also called APLNR) and LPHN3, and other druggable targets. Integrated analysis of somatic mutations and copy number alterations identified another 35 significantly altered genes including GNAS, indicating an expanded role for galpha subunits in multiple cancer types. Furthermore, our experimental analyses demonstrate the functional roles of mutant GNAO1 (a Galpha subunit) and mutant MAP2K4 (a member of the JNK signalling pathway) in oncogenesis. Our study provides an overview of the mutational spectra across major human cancers and identifies several potential therapeutic targets.
Collapse
|