1
|
Ng JK, Chen Y, Akinwe TM, Heins HB, Mehinovic E, Chang Y, Payne ZL, Manuel JG, Karchin R, Turner TN. Proteome-Wide Assessment of Clustering of Missense Variants in Neurodevelopmental Disorders Versus Cancer. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.02.24302238. [PMID: 38352539 PMCID: PMC10863034 DOI: 10.1101/2024.02.02.24302238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/19/2024]
Abstract
Missense de novo variants (DNVs) and missense somatic variants contribute to neurodevelopmental disorders (NDDs) and cancer, respectively. Proteins with statistical enrichment based on analyses of these variants exhibit convergence in the differing NDD and cancer phenotypes. Herein, the question of why some of the same proteins are identified in both phenotypes is examined through investigation of clustering of missense variation at the protein level. Our hypothesis is that missense variation is present in different protein locations in the two phenotypes leading to the distinct phenotypic outcomes. We tested this hypothesis in 1D protein space using our software CLUMP. Furthermore, we newly developed 3D-CLUMP that uses 3D protein structures to spatially test clustering of missense variation for proteome-wide significance. We examined missense DNVs in 39,883 parent-child sequenced trios with NDDs and missense somatic variants from 10,543 sequenced tumors covering five TCGA cancer types and two COSMIC pan-cancer aggregates of tissue types. There were 57 proteins with proteome-wide significant missense variation clustering in NDDs when compared to cancers and 79 proteins with proteome-wide significant missense clustering in cancers compared to NDDs. While our main objective was to identify differences in patterns of missense variation, we also identified a novel NDD protein BLTP2. Overall, our study is innovative, provides new insights into differential missense variation in NDDs and cancer at the protein-level, and contributes necessary information toward building a framework for thinking about prognostic and therapeutic aspects of these proteins.
Collapse
Affiliation(s)
- Jeffrey K. Ng
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Yilin Chen
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Titilope M. Akinwe
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Molecular Genetics & Genomics Graduate Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Hillary B. Heins
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Elvisa Mehinovic
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Yoonhoo Chang
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Human & Statistical Genetics Graduate Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Zachary L. Payne
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Molecular Genetics & Genomics Graduate Program, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Juana G. Manuel
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Rachel Karchin
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- The Sidney Kimmel Comprehensive Cancer Center, School of Medicine, Johns Hopkins University, Baltimore, MD, USA
- Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Tychele N. Turner
- Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
2
|
Kaczanowski S. Detection of positive selection acting on protein surfaces at the whole-genome scale in the human malaria parasite Plasmodium falciparum. INFECTION, GENETICS AND EVOLUTION : JOURNAL OF MOLECULAR EPIDEMIOLOGY AND EVOLUTIONARY GENETICS IN INFECTIOUS DISEASES 2023; 107:105397. [PMID: 36572055 DOI: 10.1016/j.meegid.2022.105397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/20/2022] [Accepted: 12/21/2022] [Indexed: 12/24/2022]
Abstract
The host-parasite evolutionary arms race is a fundamental process with medical implications. During this process, the host develops parasite resistance, and the parasite develops host immune evasion strategies. Thus, this process accelerates relevant protein evolution. This study test hypothesizes that proteins subject to sequence evolution structural constraints play a crucial role and that these constraints hinder the modification of such proteins in this process. These hypotheses were tested using Plasmodium falciparum model and evaluated protein structures predicted for the entire proteome by the AlphaFold method. Based on dN/dS test results and P. falciparum and P. reichenowi comparisons, the presented approach identified proteins subject to purifying selection acting on the whole sequence and buried residues (dN < dS) and positive selection on nonburied residues. Of the 26 proteins, some known antigens (ring-exported protein 3, RAP protein, erythrocyte binding antigen-140, and protein P47) targeted by the host immune system are promising vaccine candidates. The set also contained 11 enzymes, including FIKK kinase, which modifies host proteins. This set was compared with genes for which the dN/dS test suggested that positive selection acts on the whole gene (i.e., dN > dS). The present study found that such genes encode enzymes and antigenic vaccine candidates less frequently than genes for which evolution is not subject to selection constraints and positive selection acts on only exposed residues. The analysis was repeated comparing P. falciparum with P. alderi, which is more distantly related. The study discusses the potential implications of the presented methodology for rational vaccine design and the parasitology and evolutionary biology fields.
Collapse
Affiliation(s)
- Szymon Kaczanowski
- Department of Bioinformatics, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland.
| |
Collapse
|
3
|
Spatial homogeneity pursuit of regression coefficients for hand, foot and mouth disease in Xinjiang Uygur Autonomous Region in 2018. Sci Rep 2022; 12:21439. [PMID: 36509834 PMCID: PMC9744827 DOI: 10.1038/s41598-022-26003-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
To explore the complex spatial pattern between the incidence of hand, foot, and mouth disease (HFMD) and meteorological factors [average temperature (AT), average relative humidity (ARH), average air pressure (AP), average wind speed (AW)], this paper constructed a Spatial Clustering coefficient (SCC) regression model to detect spatial clustering patterns of each regression coefficients in different seasons. The results revealed that compared with geographically weighted regression (GWR), the coefficients estimated by SCC method were more smooth with clearly identified spatial and improved edge effects. Therefore, interesting spatial patterns were easy to identify in the SCC estimated coefficients. And then, the SCC method had better estimation accuracy in estimating the relationship between potential meteorological factors and HFMD cases. Meteorological factors had different significance in their effect on HFMD incidence depending on the season. Specifically, the influence of AT on HFMD was negatively correlated in summer and winter, especially in the Altay region, Bayingoleng Mongolian Autonomous Prefecture, Turpan region and Hami region. Second, AW had positive effects with HFMD in summer, but the AW played a negative role in the whole Xinjiang in winter. In Tianshan district, Shayibake district, Shuimogou district, etc. in summer, ARH showed a strong negative correlation, but in Alar city it had a high positive correlation, however, in winter ARH showed a high negative correlation in Altay regions, Aksu region and other places had negative effects, and it showed a strong positive correlation in Shayibak district. Finally, AP had a strong positive correlation with HFMD in summer in Shaybak district, but in winter, AP showed a strong negative correlation in Altay district and Buxel Mongolia Autonomous county. In summary, Xinjiang should adapt measures to local conditions, and formulate appropriate HFMD prevention strategies according to the characteristics of different regions, time, and meteorological factors.
Collapse
|
4
|
Bruce SA, Huang YH, Kamath PL, van Heerden H, Turner WC. The roles of antimicrobial resistance, phage diversity, isolation source and selection in shaping the genomic architecture of Bacillus anthracis. Microb Genom 2021; 7. [PMID: 34402777 PMCID: PMC8549369 DOI: 10.1099/mgen.0.000616] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Bacillus anthracis, the causative agent of anthrax disease, is a worldwide threat to livestock, wildlife and public health. While analyses of genetic data from across the globe have increased our understanding of this bacterium’s population genomic structure, the influence of selective pressures on this successful pathogen is not well understood. In this study, we investigate the effects of antimicrobial resistance, phage diversity, geography and isolation source in shaping population genomic structure. We also identify a suite of candidate genes potentially under selection, driving patterns of diversity across 356 globally extant B. anthracis genomes. We report ten antimicrobial resistance genes and 11 different prophage sequences, resulting in the first large-scale documentation of these genetic anomalies for this pathogen. Results of random forest classification suggest genomic structure may be driven by a combination of antimicrobial resistance, geography and isolation source, specific to the population cluster examined. We found strong evidence that a recombination event linked to a gene involved in protein synthesis may be responsible for phenotypic differences between comparatively disparate populations. We also offer a list of genes for further examination of B. anthracis evolution, based on high-impact single nucleotide polymorphisms (SNPs) and clustered mutations. The information presented here sheds new light on the factors driving genomic structure in this notorious pathogen and may act as a road map for future studies aimed at understanding functional differences in terms of B. anthracis biogeography, virulence and evolution.
Collapse
Affiliation(s)
- Spencer A Bruce
- Department of Biological Sciences, University at Albany - State University of New York, Albany, NY 12222, USA
| | - Yen-Hua Huang
- Wisconsin Cooperative Wildlife Research Unit, Department of Forest and Wildlife Ecology, University of Wisconsin-Madison, Madison, WI, USA
| | - Pauline L Kamath
- School of Food and Agriculture, University of Maine, Orono, ME 04469, USA
| | - Henriette van Heerden
- Department of Veterinary Tropical Diseases, University of Pretoria, Onderstepoort, South Africa
| | - Wendy C Turner
- U.S. Geological Survey, Wisconsin Cooperative Wildlife Research Unit, Department of Forest and Wildlife Ecology, University of Wisconsin-Madison, Madison, WI, USA
| |
Collapse
|
5
|
Cui Y, Schmid BV, Cao H, Dai X, Du Z, Ryan Easterday W, Fang H, Guo C, Huang S, Liu W, Qi Z, Song Y, Tian H, Wang M, Wu Y, Xu B, Yang C, Yang J, Yang X, Zhang Q, Jakobsen KS, Zhang Y, Stenseth NC, Yang R. Evolutionary selection of biofilm-mediated extended phenotypes in Yersinia pestis in response to a fluctuating environment. Nat Commun 2020; 11:281. [PMID: 31941912 PMCID: PMC6962365 DOI: 10.1038/s41467-019-14099-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2018] [Accepted: 12/04/2019] [Indexed: 12/16/2022] Open
Abstract
Yersinia pestis is transmitted from fleas to rodents when the bacterium develops an extensive biofilm in the foregut of a flea, starving it into a feeding frenzy, or, alternatively, during a brief period directly after feeding on a bacteremic host. These two transmission modes are in a trade-off regulated by the amount of biofilm produced by the bacterium. Here by investigating 446 global isolated Y. pestis genomes, including 78 newly sequenced isolates sampled over 40 years from a plague focus in China, we provide evidence for strong selection pressures on the RNA polymerase ω-subunit encoding gene rpoZ. We demonstrate that rpoZ variants have an increased rate of biofilm production in vitro, and that they evolve in the ecosystem during colder and drier periods. Our results support the notion that the bacterium is constantly adapting-through extended phenotype changes in the fleas-in response to climate-driven changes in the niche.
Collapse
Affiliation(s)
- Yujun Cui
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Boris V Schmid
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Blindern, N-0316, Oslo, Norway
| | - Hanli Cao
- The Center for Disease Control and Prevention of Xinjiang Uygur Autonomous Region, Urumqi, 830002, China
| | - Xiang Dai
- The Center for Disease Control and Prevention of Xinjiang Uygur Autonomous Region, Urumqi, 830002, China
| | - Zongmin Du
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - W Ryan Easterday
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Blindern, N-0316, Oslo, Norway
| | - Haihong Fang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Chenyi Guo
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Shanqian Huang
- State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science, Beijing Normal University, Beijing, 100875, China
| | - Wanbing Liu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Zhizhen Qi
- Key Laboratory for Plague Prevention and Control of Qinghai Province, Qinghai Institute for Endemic Diseases Prevention and Control, Xining, 811602, China
| | - Yajun Song
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Huaiyu Tian
- State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science, Beijing Normal University, Beijing, 100875, China
| | - Min Wang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Yarong Wu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Bing Xu
- State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science, Beijing Normal University, Beijing, 100875, China
| | - Chao Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Jing Yang
- State Key Laboratory of Remote Sensing Science, College of Global Change and Earth System Science, Beijing Normal University, Beijing, 100875, China
| | - Xianwei Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China
| | - Qingwen Zhang
- Key Laboratory for Plague Prevention and Control of Qinghai Province, Qinghai Institute for Endemic Diseases Prevention and Control, Xining, 811602, China
| | - Kjetill S Jakobsen
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Blindern, N-0316, Oslo, Norway.
| | - Yujiang Zhang
- The Center for Disease Control and Prevention of Xinjiang Uygur Autonomous Region, Urumqi, 830002, China.
| | - Nils Chr Stenseth
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Blindern, N-0316, Oslo, Norway. .,Ministry of Education Key Laboratory for Earth System Modeling, Department of Earth System Science, Tsinghua University, Beijing, 100084, China.
| | - Ruifu Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing, 100071, China.
| |
Collapse
|
6
|
Ryslik GA, Cheng Y, Modis Y, Zhao H. Leveraging protein quaternary structure to identify oncogenic driver mutations. BMC Bioinformatics 2016; 17:137. [PMID: 27001666 PMCID: PMC4802602 DOI: 10.1186/s12859-016-0963-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2015] [Accepted: 02/18/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Identifying key "driver" mutations which are responsible for tumorigenesis is critical in the development of new oncology drugs. Due to multiple pharmacological successes in treating cancers that are caused by such driver mutations, a large body of methods have been developed to differentiate these mutations from the benign "passenger" mutations which occur in the tumor but do not further progress the disease. Under the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of algorithms that identify these clusters has become a critical area of research. RESULTS We have developed a novel methodology, QuartPAC (Quaternary Protein Amino acid Clustering), that identifies non-random mutational clustering while utilizing the protein quaternary structure in 3D space. By integrating the spatial information in the Protein Data Bank (PDB) and the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC), QuartPAC is able to identify clusters which are otherwise missed in a variety of proteins. The R package is available on Bioconductor at: http://bioconductor.jp/packages/3.1/bioc/html/QuartPAC.html . CONCLUSION QuartPAC provides a unique tool to identify mutational clustering while accounting for the complete folded protein quaternary structure.
Collapse
Affiliation(s)
- Gregory A. Ryslik
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
| | - Yuwei Cheng
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| | - Yorgo Modis
- />Department of Medicine, University of Cambridge, MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge, CB2 0QH UK
| | - Hongyu Zhao
- />Department of Biostatistics, Yale School of Public Health, New Haven, CT USA
- />Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT USA
| |
Collapse
|
7
|
Meyer MJ, Lapcevic R, Romero AE, Yoon M, Das J, Beltrán JF, Mort M, Stenson PD, Cooper DN, Paccanaro A, Yu H. mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome. Hum Mutat 2016; 37:447-56. [PMID: 26841357 DOI: 10.1002/humu.22963] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 01/14/2016] [Indexed: 12/20/2022]
Abstract
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported.
Collapse
Affiliation(s)
- Michael J Meyer
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853.,Tri-Institutional Training Program in Computational Biology and Medicine, New York, New York, 10065
| | - Ryan Lapcevic
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Alfonso E Romero
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Mark Yoon
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Juan Felipe Beltrán
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| | - Matthew Mort
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Peter D Stenson
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - David N Cooper
- Institute of Medical Genetics, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK
| | - Alberto Paccanaro
- Department of Computer Science and Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham TW20 0EX, UK
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York, 14853.,Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, 14853
| |
Collapse
|
8
|
A spatial simulation approach to account for protein structure when identifying non-random somatic mutations. BMC Bioinformatics 2014; 15:231. [PMID: 24990767 PMCID: PMC4227039 DOI: 10.1186/1471-2105-15-231] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2013] [Accepted: 05/27/2014] [Indexed: 02/08/2023] Open
Abstract
Background Current research suggests that a small set of “driver” mutations are responsible for tumorigenesis while a larger body of “passenger” mutations occur in the tumor but do not progress the disease. Due to recent pharmacological successes in treating cancers caused by driver mutations, a variety of methodologies that attempt to identify such mutations have been developed. Based on the hypothesis that driver mutations tend to cluster in key regions of the protein, the development of cluster identification algorithms has become critical. Results We have developed a novel methodology, SpacePAC (Spatial Protein Amino acid Clustering), that identifies mutational clustering by considering the protein tertiary structure directly in 3D space. By combining the mutational data in the Catalogue of Somatic Mutations in Cancer (COSMIC) and the spatial information in the Protein Data Bank (PDB), SpacePAC is able to identify novel mutation clusters in many proteins such as FGFR3 and CHRM2. In addition, SpacePAC is better able to localize the most significant mutational hotspots as demonstrated in the cases of BRAF and ALK. The R package is available on Bioconductor at: http://www.bioconductor.org/packages/release/bioc/html/SpacePAC.html. Conclusion SpacePAC adds a valuable tool to the identification of mutational clusters while considering protein tertiary structure.
Collapse
|
9
|
Tusche C, Steinbrück L, McHardy AC. Detecting patches of protein sites of influenza A viruses under positive selection. Mol Biol Evol 2012; 29:2063-71. [PMID: 22427709 PMCID: PMC3408068 DOI: 10.1093/molbev/mss095] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Influenza A viruses are single-stranded RNA viruses capable of evolving rapidly to adapt to environmental conditions. Examples include the establishment of a virus in a novel host or an adaptation to increasing immunity within the host population due to prior infection or vaccination against a circulating strain. Knowledge of the viral protein regions under positive selection is therefore crucial for surveillance. We have developed a method for detecting positively selected patches of sites on the surface of viral proteins, which we assume to be relevant for adaptive evolution. We measure positive selection based on dN/dS ratios of genetic changes inferred by considering the phylogenetic structure of the data and suggest a graph-cut algorithm to identify such regions. Our algorithm searches for dense and spatially distinct clusters of sites under positive selection on the protein surface. For the hemagglutinin protein of human influenza A viruses of the subtypes H3N2 and H1N1, our predicted sites significantly overlap with known antigenic and receptor-binding sites. From the structure and sequence data of the 2009 swine-origin influenza A/H1N1 hemagglutinin and PB2 protein, we identified regions that provide evidence of evolution under positive selection since introduction of the virus into the human population. The changes in PB2 overlap with sites reported to be associated with mammalian adaptation of the influenza A virus. Application of our technique to the protein structures of viruses of yet unknown adaptive behavior could identify further candidate regions that are important for host–virus interaction.
Collapse
Affiliation(s)
- Christina Tusche
- Max Planck Research Group for Computational Genomics and Epidemiology, Max Planck Institute for Informatics, Saarbrücken, Germany
| | | | | |
Collapse
|
10
|
McFerrin LG, Stone EA. The non-random clustering of non-synonymous substitutions and its relationship to evolutionary rate. BMC Genomics 2011; 12:415. [PMID: 21846337 PMCID: PMC3176261 DOI: 10.1186/1471-2164-12-415] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2011] [Accepted: 08/16/2011] [Indexed: 01/11/2023] Open
Abstract
Background Protein sequences are subject to a mosaic of constraint. Changes to functional domains and buried residues, for example, are more apt to disrupt protein structure and function than are changes to residues participating in loops or exposed to solvent. Regions of constraint on the tertiary structure of a protein often result in loose segmentation of its primary structure into stretches of slowly- and rapidly-evolving amino acids. This clustering can be exploited, and existing methods have done so by relying on local sequence conservation as a signature of selection to help identify functionally important regions within proteins. We invert this paradigm by leveraging the regional nature of protein structure and function to both illuminate and make use of genome-wide patterns of local sequence conservation. Results Our hypothesis is that the regional nature of structural and functional constraints will assert a positive autocorrelation on the evolutionary rates of neighboring sites, which, in a pairwise comparison of orthologous proteins, will manifest itself as the clustering of non-synonymous changes across the amino acid sequence. We introduce a dispersion ratio statistic to test this and related hypotheses. Using genome-wide interspecific comparisons of orthologous protein pairs, we reveal a strong log-linear relationship between the degree of clustering and the intensity of constraint. We further demonstrate how this relationship varies with the evolutionary distance between the species being compared. We provide some evidence that proteins with a history of positive selection deviate from genome-wide trends. Conclusions We find a significant association between the evolutionary rate of a protein and the degree to which non-synonymous changes cluster along its primary sequence. We show that clustering is a non-redundant predictor of evolutionary rate, and we speculate that conflicting signals of clustering and constraint may be indicative of a historical period of relaxed selection.
Collapse
Affiliation(s)
- Lisa G McFerrin
- Graduate program in Bioinformatics, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | |
Collapse
|
11
|
Huzurbazar S, Kolesov G, Massey SE, Harris KC, Churbanov A, Liberles DA. Lineage-specific differences in the amino acid substitution process. J Mol Biol 2010; 396:1410-21. [PMID: 20004669 DOI: 10.1016/j.jmb.2009.11.075] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2009] [Revised: 11/25/2009] [Accepted: 11/30/2009] [Indexed: 11/19/2022]
Abstract
In Darwinian evolution, mutations occur approximately at random in a gene, turned into amino acid mutations by the genetic code. Some mutations are fixed to become substitutions and some are eliminated from the population. Partitioning pairs of closely related species with complete genome sequences by average population size of each pair, we looked at the substitution matrices generated for these partitions and compared the substitution patterns between species. We estimated a population genetic model that relates the relative fixation probabilities of different types of mutations to the selective pressure and population size. Parameterizations of the average and distribution of selective pressures for different amino acid substitution types in different population size comparisons were generated with a Bayesian framework. We found that partitions in population size as well as in substitution type are required to explain the substitution data. Selection coefficients were found to decrease with increasingly radical amino acid substitution and with increasing effective population size. To further explore the role of underlying processes in amino acid substitution, we analyzed embryophyte (plant) gene families from TAED (The Adaptive Evolution Database), where solved structures for at least one member exist in the Protein Data Bank. Using PAML, we assigned branches to three categories: strong negative selection, moderate negative selection/neutrality, and positive diversifying selection. Focusing on the first and third categories, we identified sites changing along gene family lineages and observed the spatial patterns of substitution. Selective sweeps were expected to create primary sequence clustering under positive diversifying selection. Co-evolution through direct physical interaction was expected to cause tertiary structural clustering. Under both positive and negative selection, the substitution patterns were found to be nonrandom. Under positive diversifying selection, significant independent signals were found for primary and tertiary sequence clustering, suggesting roles for both selective sweeps and direct physical interaction. Under strong negative selection, the signals were not found to be independent. All together, a complex interplay of population genetic and protein thermodynamics forces is suggested.
Collapse
|