1
|
Konecki DM, Hamrick S, Wang C, Agosto MA, Wensel TG, Lichtarge O. CovET: A covariation-evolutionary trace method that identifies protein structure-function modules. J Biol Chem 2023; 299:104896. [PMID: 37290531 PMCID: PMC10338321 DOI: 10.1016/j.jbc.2023.104896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open
Abstract
Measuring the relative effect that any two sequence positions have on each other may improve protein design or help better interpret coding variants. Current approaches use statistics and machine learning but rarely consider phylogenetic divergences which, as shown by Evolutionary Trace studies, provide insight into the functional impact of sequence perturbations. Here, we reframe covariation analyses in the Evolutionary Trace framework to measure the relative tolerance to perturbation of each residue pair during evolution. This approach (CovET) systematically accounts for phylogenetic divergences: at each divergence event, we penalize covariation patterns that belie evolutionary coupling. We find that while CovET approximates the performance of existing methods to predict individual structural contacts, it performs significantly better at finding structural clusters of coupled residues and ligand binding sites. For example, CovET found more functionally critical residues when we examined the RNA recognition motif and WW domains. It correlates better with large-scale epistasis screen data. In the dopamine D2 receptor, top CovET residue pairs recovered accurately the allosteric activation pathway characterized for Class A G protein-coupled receptors. These data suggest that CovET ranks highest the sequence position pairs that play critical functional roles through epistatic and allosteric interactions in evolutionarily relevant structure-function motifs. CovET complements current methods and may shed light on fundamental molecular mechanisms of protein structure and function.
Collapse
Affiliation(s)
- Daniel M Konecki
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Spencer Hamrick
- Chemical, Physical, and Structural Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Chen Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Theodore G Wensel
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
2
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
3
|
Weaver RJ, Rabinowitz S, Thueson K, Havird JC. Genomic Signatures of Mitonuclear Coevolution in Mammals. Mol Biol Evol 2022; 39:6775223. [PMID: 36288802 PMCID: PMC9641969 DOI: 10.1093/molbev/msac233] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Mitochondrial (mt) and nuclear-encoded proteins are integrated in aerobic respiration, requiring co-functionality among gene products from fundamentally different genomes. Different evolutionary rates, inheritance mechanisms, and selection pressures set the stage for incompatibilities between interacting products of the two genomes. The mitonuclear coevolution hypothesis posits that incompatibilities may be avoided if evolution in one genome selects for complementary changes in interacting genes encoded by the other genome. Nuclear compensation, in which deleterious mtDNA changes are offset by compensatory nuclear changes, is often invoked as the primary mechanism for mitonuclear coevolution. Yet, direct evidence supporting nuclear compensation is rare. Here, we used data from 58 mammalian species representing eight orders to show strong correlations between evolutionary rates of mt and nuclear-encoded mt-targeted (N-mt) proteins, but not between mt and non-mt-targeted nuclear proteins, providing strong support for mitonuclear coevolution across mammals. N-mt genes with direct mt interactions also showed the strongest correlations. Although most N-mt genes had elevated dN/dS ratios compared to mt genes (as predicted under nuclear compensation), N-mt sites in close contact with mt proteins were not overrepresented for signs of positive selection compared to noncontact N-mt sites (contrary to predictions of nuclear compensation). Furthermore, temporal patterns of N-mt and mt amino acid substitutions did not support predictions of nuclear compensation, even in positively selected, functionally important residues with direct mitonuclear contacts. Overall, our results strongly support mitonuclear coevolution across ∼170 million years of mammalian evolution but fail to support nuclear compensation as the major mode of mitonuclear coevolution.
Collapse
Affiliation(s)
- Ryan J Weaver
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA.,Department of Natural Resource Ecology and Management, Iowa State University, Ames, IA
| | | | - Kiley Thueson
- Department of Integrative Biology, University of Texas, Austin, TX
| | - Justin C Havird
- Department of Integrative Biology, University of Texas, Austin, TX
| |
Collapse
|
4
|
Robins WP, Mekalanos JJ. Covariance predicts conserved protein residue interactions important for the emergence and continued evolution of SARS-CoV-2 as a human pathogen. PLoS One 2022; 17:e0270276. [PMID: 35895734 PMCID: PMC9328546 DOI: 10.1371/journal.pone.0270276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 06/07/2022] [Indexed: 12/03/2022] Open
Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that likely emerged from animal reservoirs. Differences in nucleotide and protein sequence composition within related β-coronaviruses are often used to better understand CoV evolution, host adaptation, and their emergence as human pathogens. Here we report the comprehensive analysis of amino acid residue changes that have occurred in lineage B β-coronaviruses that show covariance with each other. This analysis revealed patterns of covariance within conserved viral proteins that potentially define conserved interactions within and between core proteins encoded by SARS-CoV-2 related β-coronaviruses. We identified not only individual pairs but also networks of amino acid residues that exhibited statistically high frequencies of covariance with each other using an independent pair model followed by a tandem model approach. Using 149 different CoV genomes that vary in their relatedness, we identified networks of unique combinations of alleles that can be incrementally traced genome by genome within different phylogenic lineages. Remarkably, covariant residues and their respective regions most abundantly represented are implicated in the emergence of SARS-CoV-2 and are also enriched in dominant SARS-CoV-2 variants.
Collapse
Affiliation(s)
- William P. Robins
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - John J. Mekalanos
- Department of Microbiology, Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
5
|
Abstract
Compensatory substitutions happen when one mutation is advantageously selected because it restores the loss of fitness induced by a previous deleterious mutation. How frequent such mutations occur in evolution and what is the structural and functional context permitting their emergence remain open questions. We built an atlas of intra-protein compensatory substitutions using a phylogenetic approach and a dataset of 1,630 bacterial protein families for which high-quality sequence alignments and experimentally derived protein structures were available. We identified more than 51,000 positions coevolving by the mean of predicted compensatory mutations. Using the evolutionary and structural properties of the analyzed positions, we demonstrate that compensatory mutations are scarce (typically only a few in the protein history) but widespread (the majority of proteins experienced at least one). Typical coevolving residues are evolving slowly, are located in the protein core outside secondary structure motifs, and are more often in contact than expected by chance, even after accounting for their evolutionary rate and solvent exposure. An exception to this general scheme is residues coevolving for charge compensation, which are evolving faster than noncoevolving sites, in contradiction with predictions from simple coevolutionary models, but similar to stem pairs in RNA. While sites with a significant pattern of coevolution by compensatory mutations are rare, the comparative analysis of hundreds of structures ultimately permits a better understanding of the link between the three-dimensional structure of a protein and its fitness landscape.
Collapse
Affiliation(s)
- Shilpi Chaurasia
- RG Molecular Systems Evolution, Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany.,Excelra Knowledge Solutions Pvt Ltd, Hyderabad, India
| | - Julien Y Dutheil
- RG Molecular Systems Evolution, Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Biology, August-Thienemann-Straße 2, 24306 Plön, Germany.,Institute of Evolution Sciences of Montpellier (ISEM), CNRS, University of Montpellier, IRD, EPHE, 34095 Montpellier, France
| |
Collapse
|
6
|
Rallapalli KL, Ranzau BL, Ganapathy KR, Paesani F, Komor AC. Combined Theoretical, Bioinformatic, and Biochemical Analyses of RNA Editing by Adenine Base Editors. CRISPR J 2022; 5:294-310. [PMID: 35353638 DOI: 10.1089/crispr.2021.0131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Adenine base editors (ABEs) have been subjected to multiple rounds of mutagenesis with the goal of optimizing their function as efficient and precise genome editing agents. Despite an ever-expanding data set of ABE mutants and their corresponding DNA or RNA-editing activity, the molecular mechanisms defining these changes remain to be elucidated. In this study, we provide a systematic interpretation of the nature of these mutations using an entropy-based classification model that relies on evolutionary data from extant protein sequences. Using this model in conjunction with experimental analyses, we identify two previously reported mutations that form an epistatic pair in the RNA-editing functional landscape of ABEs. Molecular dynamics simulations reveal the atomistic details of how these two mutations affect substrate-binding and catalytic activity, via both individual and cooperative effects, hence providing insights into the mechanisms through which these two mutations are epistatically coupled.
Collapse
Affiliation(s)
- Kartik L Rallapalli
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, USA; University of California San Diego, La Jolla, California, USA
| | - Brodie L Ranzau
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, USA; University of California San Diego, La Jolla, California, USA
| | - Kaushik R Ganapathy
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, California, USA; University of California San Diego, La Jolla, California, USA
| | - Francesco Paesani
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, USA; University of California San Diego, La Jolla, California, USA.,Materials Science and Engineering, University of California San Diego, La Jolla, California, USA; and University of California San Diego, La Jolla, California, USA.,San Diego Supercomputer Center, University of California San Diego, La Jolla, California, USA
| | - Alexis C Komor
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, California, USA; University of California San Diego, La Jolla, California, USA
| |
Collapse
|
7
|
Robins WP, Mekalanos JJ. Covariance predicts conserved protein residue interactions important to the emergence and continued evolution of SARS-CoV-2 as a human pathogen. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2022:2022.01.13.476204. [PMID: 35169805 PMCID: PMC8845505 DOI: 10.1101/2022.01.13.476204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21st century and that likely emerged from animal reservoirs. Differences in nucleotide and protein sequence composition within related β-coronaviruses are often used to better understand CoV evolution, host adaptation, and their emergence as human pathogens. Here we report the comprehensive analysis of amino acid residue changes that have occurred in lineage B β-coronaviruses that show covariance with each other. This analysis revealed patterns of covariance within conserved viral proteins that potentially define conserved interactions within and between core proteins encoded by SARS-CoV-2 related β-coranaviruses. We identified not only individual pairs but also networks of amino acid residues that exhibited statistically high frequencies of covariance with each other using an independent pair model followed by a tandem model approach. Using 149 different CoV genomes that vary in their relatedness, we identified networks of unique combinations of alleles that can be incrementally traced genome by genome within different phylogenic lineages. Remarkably, covariant residues and their respective regions most abundantly represented are implicated in the emergence of SARS-CoV-2 are also enriched in dominant SARS-CoV-2 variants.
Collapse
Affiliation(s)
- William P Robins
- Department of Microbiology, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115
| | - John J Mekalanos
- Department of Microbiology, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115
| |
Collapse
|
8
|
Camenares D. ACES: A co-evolution simulator generates co-varying protein and nucleic acid sequences. J Bioinform Comput Biol 2020; 18:2050039. [PMID: 33215964 DOI: 10.1142/s0219720020500390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Sequence-specific and consequential interactions within or between proteins and/or RNAs can be predicted by identifying co-evolution of residues in these molecules. Different algorithms have been used to detect co-evolution, often using biological data to benchmark a methods ability to discriminate against indirect co-evolution. Such a benchmark is problematic, because not all the interactions and evolutionary constraints underlying real data can be known a priori. Instead, sequences generated in silico to simulate co-evolution would be preferable, and can be obtained using aCES, the software tool presented here. Conservation and co-evolution constraints can be specified for any residue across a number of molecules, allowing the user to capture a complex, realistic set of interactions. Resulting alignments were used to benchmark several co-evolution detection tools for their ability to separate signal from background as well as discriminating direct from indirect signals. This approach can aid in refinement of these algorithms. In addition, systematic tuning of these constraints sheds new light on how they drive co-evolution between residues. Better understanding how to detect co-evolution and the residue interactions they predict can lead to a wide range of insights important for synthetic biologists interested in engineering new, orthogonal interactions between two macromolecules.
Collapse
Affiliation(s)
- Devin Camenares
- Department of Biochemistry, Alma College, 614 West Superior St, Alma, Michigan 48801, USA
| |
Collapse
|
9
|
Meyer X, Dib L, Salamin N. CoevDB: a database of intramolecular coevolution among protein-coding genes of the bony vertebrates. Nucleic Acids Res 2020; 47:D50-D54. [PMID: 30357342 PMCID: PMC6324051 DOI: 10.1093/nar/gky986] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/10/2018] [Indexed: 01/15/2023] Open
Abstract
The study of molecular coevolution, due to its potential to identify gene regions under functional or structural constraints, has recently been subject to numerous scientific inquiries. Particular efforts have been conducted to develop methods predicting the presence of coevolution in molecular sequences. Among these methods, a few aim to model the underlying evolutionary process of coevolution, which enable to differentiate the shared history of genes to coevolution and thus improve their accuracy. However, the usage of such methods remains sparse due to their expensive computational cost and the lack of resources alleviating this issue. Here we present CoevDB (http://phylodb.unil.ch/CoevDB), a database containing the result of a large-scale analysis of intramolecular coevolution of 8201 protein-coding genes of bony vertebrates. The web interface of CoevDB gives access to the results to 800 millions of statistical tests corresponding to all the pairs of sites analyzed. Several type of queries enable users to explore the database by either targeting specific genes or by discovering genes having promising estimations of coevolution.
Collapse
Affiliation(s)
- Xavier Meyer
- Department of Computational Biology, University of Lausanne, Biophore, 1015 Lausanne, Switzerland.,Department of Integrative Biology, University of California, 3060 Valley Life Sciences Bldg, Berkeley, CA 94720-3140, USA
| | - Linda Dib
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Nicolas Salamin
- Department of Computational Biology, University of Lausanne, Biophore, 1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| |
Collapse
|
10
|
Statistical characteristics of amino acid covariance as possible descriptors of viral genomic complexity. Sci Rep 2019; 9:18410. [PMID: 31804522 PMCID: PMC6895170 DOI: 10.1038/s41598-019-54720-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2017] [Accepted: 07/08/2019] [Indexed: 12/25/2022] Open
Abstract
At the sequence level it is hard to describe the complexity of viruses which allows them to challenge host immune system, some for a few weeks and others up to a complete compromise. Paradoxically, viral genomes are both complex and simple. Complex because amino acid mutation rates are very high, and yet viruses remain functional. Simple because they have barely around 10 types of proteins, so viral protein-protein interaction networks are not insightful. In this work we use fine-grained amino acid level information and their evolutionary characteristics obtained from large-scale genomic data to develop a statistical panel, towards the goal of developing quantitative descriptors for the biological complexity of viruses. Networks were constructed from pairwise covariation of amino acids and were statistically analyzed. Three differentiating factors arise: predominantly intra- vs inter-protein covariance relations, the nature of the node degree distribution and network density. Interestingly, the covariance relations were primarily intra-protein in avian influenza and inter-protein in HIV. The degree distributions showed two universality classes: a power-law with exponent −1 in HIV and avian-influenza, random behavior in human flu and dengue. The calculated covariance network density correlates well with the mortality strengths of viruses on the viral-Richter scale. These observations suggest the potential utility of the statistical metrics for describing the covariance patterns in viruses. Our host-virus interaction analysis point to the possibility that host proteins which can interact with multiple viral proteins may be responsible for shaping the inter-protein covariance relations. With the available data, it appears that network density might be a surrogate for the virus Richter scale, however the hypothesis needs a re-examination when large scale complete genome data for more viruses becomes available.
Collapse
|
11
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
12
|
Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals. Sci Rep 2019; 9:12603. [PMID: 31471543 PMCID: PMC6717244 DOI: 10.1038/s41598-019-48913-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 08/14/2019] [Indexed: 11/08/2022] Open
Abstract
Proteins often work as oligomers or multimers in vivo. Therefore, elucidating their oligomeric or multimeric form (quaternary structure) is crucially important to ascertain their function. X-ray crystal structures of numerous proteins have been accumulated, providing information related to their biological units. Extracting information of biological units from protein crystal structures represents a meaningful task for modern biology. Nevertheless, although many methods have been proposed for identifying biological units appearing in protein crystal structures, it is difficult to distinguish biological protein-protein interfaces from crystallographic ones. Therefore, our simple but highly accurate classifier was developed to infer biological units in protein crystal structures using large amounts of protein sequence information and a modern contact prediction method to exploit covariation signals (CSs) in proteins. We demonstrate that our proposed method is promising even for weak signals of biological interfaces. We also discuss the relation between classification accuracy and conservation of biological units, and illustrate how the selection of sequences included in multiple sequence alignments as sources for obtaining CSs affects the results. With increased amounts of sequence data, the proposed method is expected to become increasingly useful.
Collapse
|
13
|
Pathogenicity of the H1N1 influenza virus enhanced by functional synergy between the NPV100I and NAD248N pair. PLoS One 2019; 14:e0217691. [PMID: 31150476 PMCID: PMC6544299 DOI: 10.1371/journal.pone.0217691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Accepted: 05/16/2019] [Indexed: 11/20/2022] Open
Abstract
By comparing and measuring covariations of viral protein sequences from isolates of the 2009 pH1N1 influenza A virus (IAV), specific substitutions that co-occur in the NP-NA pair were identified. To investigate the effect of these co-occurring substitution pairs, the V100I substitution in NP and the D248N substitution in NA were introduced into laboratory-adapted WSN IAVs. The recombinant WSN with the covarying NPV100I-NAD248N pair exhibited enhanced pathogenicity, as characterized by increased viral production, increased death and inflammation of host cells, and high mortality in infected mice. Although direct interactions between the NPV100I and NAD248N proteins were not detected, the RNA-binding ability of NPV100I was increased, which was further strengthened by NAD248N, in expression-plasmid-transfected cells. Additionally, the NAD248N protein was frequently recruited within lipid rafts, indirectly affecting the RNA-binding ability of NP as well as viral release. Altogether, our data indicate that the covarying NPV100I-NAD248N pair obtained from 2009 pH1N1 IAV sequence information function together to synergistically augment viral assembly and release, which may explain the observed enhanced viral pathogenicity.
Collapse
|
14
|
Simultaneous Bayesian inference of phylogeny and molecular coevolution. Proc Natl Acad Sci U S A 2019; 116:5027-5036. [PMID: 30808804 DOI: 10.1073/pnas.1813836116] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Patterns of molecular coevolution can reveal structural and functional constraints within or among organic molecules. These patterns are better understood when considering the underlying evolutionary process, which enables us to disentangle the signal of the dependent evolution of sites (coevolution) from the effects of shared ancestry of genes. Conversely, disregarding the dependent evolution of sites when studying the history of genes negatively impacts the accuracy of the inferred phylogenetic trees. Although molecular coevolution and phylogenetic history are interdependent, analyses of the two processes are conducted separately, a choice dictated by computational convenience, but at the expense of accuracy. We present a Bayesian method and associated software to infer how many and which sites of an alignment evolve according to an independent or a pairwise dependent evolutionary process, and to simultaneously estimate the phylogenetic relationships among sequences. We validate our method on synthetic datasets and challenge our predictions of coevolution on the 16S rRNA molecule by comparing them with its known molecular structure. Finally, we assess the accuracy of phylogenetic trees inferred under the assumption of independence among sites using synthetic datasets, the 16S rRNA molecule and 10 additional alignments of protein-coding genes of eukaryotes. Our results demonstrate that inferring phylogenetic trees while accounting for dependent site evolution significantly impacts the estimates of the phylogeny and the evolutionary process.
Collapse
|
15
|
Abstract
The comparative study of homologous proteins can provide abundant information about the functional and structural constraints on protein evolution. For example, an amino acid substitution that is deleterious may become permissive in the presence of another substitution at a second site of the protein. A popular approach for detecting coevolving residues is by looking for correlated substitution events on branches of the molecular phylogeny relating the protein-coding sequences. Here we describe a machine learning method (Bayesian graphical models) implemented in the open-source phylogenetic software package HyPhy, http://hyphy.org , for extracting a network of coevolving residues from a sequence alignment.
Collapse
|
16
|
Koehl P, Orland H, Delarue M. Numerical Encodings of Amino Acids in Multivariate Gaussian Modeling of Protein Multiple Sequence Alignments. Molecules 2018; 24:E104. [PMID: 30597916 PMCID: PMC6337344 DOI: 10.3390/molecules24010104] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2018] [Revised: 12/21/2018] [Accepted: 12/24/2018] [Indexed: 11/17/2022] Open
Abstract
Residues in proteins that are in close spatial proximity are more prone to covariate as their interactions are likely to be preserved due to structural and evolutionary constraints. If we can detect and quantify such covariation, physical contacts may then be predicted in the structure of a protein solely from the sequences that decorate it. To carry out such predictions, and following the work of others, we have implemented a multivariate Gaussian model to analyze correlation in multiple sequence alignments. We have explored and tested several numerical encodings of amino acids within this model. We have shown that 1D encodings based on amino acid biochemical and biophysical properties, as well as higher dimensional encodings computed from the principal components of experimentally derived mutation/substitution matrices, do not perform as well as a simple twenty dimensional encoding with each amino acid represented with a vector of one along its own dimension and zero elsewhere. The optimum obtained from representations based on substitution matrices is reached by using 10 to 12 principal components; the corresponding performance is less than the performance obtained with the 20-dimensional binary encoding. We highlight also the importance of the prior when constructing the multivariate Gaussian model of a multiple sequence alignment.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Computer Science, University of California, Davis, CA 95211, USA.
| | - Henri Orland
- Institut de Physique Théorique, CEA Saclay, 91191 Gif-sur-Yvette CEDEX, France.
| | - Marc Delarue
- Department of Structural Biology and Chemistry and UMR 3528 du CNRS, Institut Pasteur, 75015 Paris, France.
| |
Collapse
|
17
|
Castiglione GM, Chang BS. Functional trade-offs and environmental variation shaped ancient trajectories in the evolution of dim-light vision. eLife 2018; 7:35957. [PMID: 30362942 PMCID: PMC6203435 DOI: 10.7554/elife.35957] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Accepted: 09/09/2018] [Indexed: 12/11/2022] Open
Abstract
Trade-offs between protein stability and activity can restrict access to evolutionary trajectories, but widespread epistasis may facilitate indirect routes to adaptation. This may be enhanced by natural environmental variation, but in multicellular organisms this process is poorly understood. We investigated a paradoxical trajectory taken during the evolution of tetrapod dim-light vision, where in the rod visual pigment rhodopsin, E122 was fixed 350 million years ago, a residue associated with increased active-state (MII) stability but greatly diminished rod photosensitivity. Here, we demonstrate that high MII stability could have likely evolved without E122, but instead, selection appears to have entrenched E122 in tetrapods via epistatic interactions with nearby coevolving sites. In fishes by contrast, selection may have exploited these epistatic effects to explore alternative trajectories, but via indirect routes with low MII stability. Our results suggest that within tetrapods, E122 and high MII stability cannot be sacrificed-not even for improvements to rod photosensitivity.
Collapse
Affiliation(s)
- Gianni M Castiglione
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada
| | - Belinda Sw Chang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada.,Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Canada
| |
Collapse
|
18
|
Zahradník J, Kolářová L, Pařízková H, Kolenko P, Schneider B. Interferons type II and their receptors R1 and R2 in fish species: Evolution, structure, and function. FISH & SHELLFISH IMMUNOLOGY 2018; 79:140-152. [PMID: 29742458 DOI: 10.1016/j.fsi.2018.05.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 04/27/2018] [Accepted: 05/02/2018] [Indexed: 06/08/2023]
Abstract
Interferon gamma (IFN-γ) is one of the key players in the immune system of vertebrates. The evolution and properties of IFN-γ and its receptors in fish species are of special interest as they point to the origin of innate immunity in vertebrates. We studied the phylogeny, biophysical and structural properties of IFN-γ and its receptors. Our phylogeny analysis suggests the existence of two groups of IFN-γ related proteins, one specific for Acanthomorpha, the other for Cypriniformes, Characiformes and Siluriformes. The analysis further shows an ancient duplication of the gene for IFN-γ receptor 1 (IFN- γR1) and the parallel existence of the duplicated genes in all current teleost fish species. In contrast, only one gene can be found for receptor 2, IFN- γR2. The specificity of the interaction between IFN- γ and both types of IFN- γR1 was determined by microscale thermophoresis measurements of the equilibrium dissociation constants for the proteins from three fish species. The measured preference of IFN- γ for one of the two forms of receptor 1agrees with the bioinformatic analysis of the coevolution between IFN- γ and receptor 1. To elucidate structural relationships between IFN-γ of fish and other vertebrate species, we determined the crystal structure of IFN-γ from olive flounder (Paralichthys olivaceus, PoliIFN-γ) at crystallographic resolution of 2.3 Å and the low-resolution structures of Takifugu rubripes, Oreochromis niloticus, and Larimichthys crocea IFN-γ by small angle X-ray diffraction. The overall PoliIFN-γ fold is the same as the fold of the other known IFN- γ structures but there are some significant structural differences, namely the additional C-terminal helix G and a different angle between helices C and D in PoliIFN-γ.
Collapse
Affiliation(s)
- Jiří Zahradník
- Laboratory of Biomolecular Recognition, Institute of Biotechnology of the Czech Academy of Sciences, v. v. i., BIOCEV, Průmyslová 595, CZ-252 42 Vestec, Czech Republic.
| | - Lucie Kolářová
- Laboratory of Biomolecular Recognition, Institute of Biotechnology of the Czech Academy of Sciences, v. v. i., BIOCEV, Průmyslová 595, CZ-252 42 Vestec, Czech Republic
| | - Hana Pařízková
- Laboratory of Biomolecular Recognition, Institute of Biotechnology of the Czech Academy of Sciences, v. v. i., BIOCEV, Průmyslová 595, CZ-252 42 Vestec, Czech Republic
| | - Petr Kolenko
- Laboratory of Biomolecular Recognition, Institute of Biotechnology of the Czech Academy of Sciences, v. v. i., BIOCEV, Průmyslová 595, CZ-252 42 Vestec, Czech Republic
| | - Bohdan Schneider
- Laboratory of Biomolecular Recognition, Institute of Biotechnology of the Czech Academy of Sciences, v. v. i., BIOCEV, Průmyslová 595, CZ-252 42 Vestec, Czech Republic.
| |
Collapse
|
19
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
20
|
Prediction of Structures and Interactions from Genome Information. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:123-152. [DOI: 10.1007/978-981-13-2200-6_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
21
|
Meixenberger K, Yousef KP, Smith MR, Somogyi S, Fiedler S, Bartmeyer B, Hamouda O, Bannert N, von Kleist M, Kücherer C. Molecular evolution of HIV-1 integrase during the 20 years prior to the first approval of integrase inhibitors. Virol J 2017; 14:223. [PMID: 29137637 PMCID: PMC5686839 DOI: 10.1186/s12985-017-0887-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 10/31/2017] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Detailed knowledge of the evolutionary potential of polymorphic sites in a viral protein is important for understanding the development of drug resistance in the presence of an inhibitor. We therefore set out to analyse the molecular evolution of the HIV-1 subtype B integrase at the inter-patient level in Germany during a 20-year period prior to the first introduction of integrase strand inhibitors (INSTIs). METHODS We determined 337 HIV-1 integrase subtype B sequences (amino acids 1-278) from stored plasma samples of antiretroviral treatment-naïve individuals newly diagnosed with HIV-1 between 1986 and 2006. Shannon entropy was calculated to determine the variability at each amino acid position. Time trends in the frequency of amino acid variants were identified by linear regression. Direct coupling analysis was applied to detect covarying sites. RESULTS Twenty-two time trends in the frequency of amino acid variants demonstrated either single amino acid exchanges or variation in the degree of polymorphy. Covariation was observed for 17 amino acid variants with a temporal trend. Some minor INSTI resistance mutations (T124A, V151I, K156 N, T206S, S230 N) and some INSTI-selected mutations (M50I, L101I, T122I, T124 N, T125A, M154I, G193E, V201I) were identified at overall frequencies >5%. Among these, the frequencies of L101I, T122I, and V201I increased over time, whereas the frequency of M154I decreased. Moreover, L101I, T122I, T124A, T125A, M154I, and V201I covaried with non-resistance-associated variants. CONCLUSIONS Time-trending, covarying polymorphisms indicate that long-term evolutionary changes of the HIV-1 integrase involve defined clusters of possibly structurally or functionally associated sites independent of selective pressure through INSTIs at the inter-patient level. Linkage between polymorphic resistance- and non-resistance-associated sites can impact the selection of INSTI resistance mutations in complex ways. Identification of these sites can help in improving genotypic resistance assays, resistance prediction algorithms, and the development of new integrase inhibitors.
Collapse
Affiliation(s)
| | - Kaveh Pouran Yousef
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Maureen Rebecca Smith
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Sybille Somogyi
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Stefan Fiedler
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Barbara Bartmeyer
- HIV/AIDS, STI and Blood-borne Infections, Robert Koch Institute, Berlin, Germany
| | - Osamah Hamouda
- HIV/AIDS, STI and Blood-borne Infections, Robert Koch Institute, Berlin, Germany
| | - Norbert Bannert
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| | - Max von Kleist
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany
| | - Claudia Kücherer
- HIV and other Retroviruses, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
22
|
Ghadie MA, Coulombe-Huntington J, Xia Y. Interactome evolution: insights from genome-wide analyses of protein-protein interactions. Curr Opin Struct Biol 2017; 50:42-48. [PMID: 29112911 DOI: 10.1016/j.sbi.2017.10.012] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/05/2017] [Accepted: 10/12/2017] [Indexed: 12/12/2022]
Abstract
We highlight new evolutionary insights enabled by recent genome-wide studies on protein-protein interaction (PPI) networks ('interactomes'). While most PPIs are mediated by a single sequence region promoting or inhibiting interactions, many PPIs are mediated by multiple sequence regions acting cooperatively. Most PPIs perform important functions maintained by negative selection: we estimate that less than ∼10% of the human interactome is effectively neutral upon perturbation (i.e. 'junk' PPIs), and the rest are deleterious upon perturbation; interfacial sites evolve more slowly than other sites; many conserved PPIs show signatures of co-evolution at the interface; PPIs evolve more slowly than protein sequence. At the same time, many PPIs undergo rewiring during evolution for lineage-specific adaptation. Finally, chaperone-protein and host-pathogen interactomes are governed by distinct evolutionary principles.
Collapse
Affiliation(s)
- Mohamed A Ghadie
- Department of Bioengineering, McGill University, Montreal, Quebec H3C 0C3, Canada
| | - Jasmin Coulombe-Huntington
- Institute for Research in Immunology and Cancer, University of Montreal, Montreal, Quebec H3C 3J7, Canada
| | - Yu Xia
- Department of Bioengineering, McGill University, Montreal, Quebec H3C 0C3, Canada.
| |
Collapse
|
23
|
Shamsi Z, Moffett AS, Shukla D. Enhanced unbiased sampling of protein dynamics using evolutionary coupling information. Sci Rep 2017; 7:12700. [PMID: 28983093 PMCID: PMC5629199 DOI: 10.1038/s41598-017-12874-7] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 09/14/2017] [Indexed: 12/25/2022] Open
Abstract
One of the major challenges in atomistic simulations of proteins is efficient sampling of pathways associated with rare conformational transitions. Recent developments in statistical methods for computation of direct evolutionary couplings between amino acids within and across polypeptide chains have allowed for inference of native residue contacts, informing accurate prediction of protein folds and multimeric structures. In this study, we assess the use of distances between evolutionarily coupled residues as natural choices for reaction coordinates which can be incorporated into Markov state model-based adaptive sampling schemes and potentially used to predict not only functional conformations but also pathways of conformational change, protein folding, and protein-protein association. We demonstrate the utility of evolutionary couplings in sampling and predicting activation pathways of the β 2-adrenergic receptor (β 2-AR), folding of the FiP35 WW domain, and dimerization of the E. coli molybdopterin synthase subunits. We find that the time required for β 2-AR activation and folding of the WW domain are greatly diminished using evolutionary couplings-guided adaptive sampling. Additionally, we were able to identify putative molybdopterin synthase association pathways and near-crystal structure complexes from protein-protein association simulations.
Collapse
Affiliation(s)
- Zahra Shamsi
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA
| | - Alexander S Moffett
- Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA.
- Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA.
- Department of Plant Biology, University of Illinois, Urbana, IL, 61801, USA.
- National Center for Supercomputing Applications, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
24
|
Lopez T, Dalton K, Tomlinson A, Pande V, Frydman J. An information theoretic framework reveals a tunable allosteric network in group II chaperonins. Nat Struct Mol Biol 2017; 24:726-733. [PMID: 28741612 PMCID: PMC5986071 DOI: 10.1038/nsmb.3440] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2016] [Accepted: 06/22/2017] [Indexed: 12/19/2022]
Abstract
ATP-dependent allosteric regulation of the ring-shaped group II chaperonins remains ill defined, in part because their complex oligomeric topology has limited the success of structural techniques in suggesting allosteric determinants. Further, their high sequence conservation has hindered the prediction of allosteric networks using mathematical covariation approaches. Here, we develop an information theoretic strategy that is robust to residue conservation and apply it to group II chaperonins. We identify a contiguous network of covarying residues that connects all nucleotide-binding pockets within each chaperonin ring. An interfacial residue between the networks of neighboring subunits controls positive cooperativity by communicating nucleotide occupancy within each ring. Strikingly, chaperonin allostery is tunable through single mutations at this position. Naturally occurring variants at this position that double the extent of positive cooperativity are less prevalent in nature. We propose that being less cooperative than attainable allows chaperonins to support robust folding over a wider range of metabolic conditions.
Collapse
Affiliation(s)
- Tom Lopez
- Department of Biology, Stanford University, Stanford, California, USA
| | - Kevin Dalton
- Biophysics Program, Stanford University, Stanford, California, USA
| | - Anthony Tomlinson
- Department of Biology, Stanford University, Stanford, California, USA
| | - Vijay Pande
- Biophysics Program, Stanford University, Stanford, California, USA
- Department of Chemistry, Stanford University, Stanford, California, USA
| | - Judith Frydman
- Department of Biology, Stanford University, Stanford, California, USA
- Biophysics Program, Stanford University, Stanford, California, USA
| |
Collapse
|
25
|
Aledo JC. Inferring Methionine Sulfoxidation and serine Phosphorylation crosstalk from Phylogenetic analyses. BMC Evol Biol 2017; 17:171. [PMID: 28750604 PMCID: PMC5530960 DOI: 10.1186/s12862-017-1017-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2017] [Accepted: 07/19/2017] [Indexed: 11/10/2022] Open
Abstract
Background The sulfoxidation of methionine residues within the phosphorylation motif of protein kinase substrates, may provide a mechanism to couple oxidative signals to changes in protein phosphorylation. Herein, we hypothesize that if the residues within a pair of phosphorylatable-sulfoxidable sites are functionally linked, then they might have been coevolving. To test this hypothesis a number of site pairs previously detected on human stress-related proteins has been subjected to analysis using eukaryote ortholog sequences and a phylogenetic approach. Results Overall, the results support the conclusion that in the eIF2α protein, serine phosphorylation at position 218 and methionine oxidation at position 222, belong to the same functional network. First, the observed data were much better fitted by Markovian models that assumed coevolution of both sites, with respect to their counterparts assuming independent evolution (p-value = 0.003). Second, this conclusion was robust with respect to the methods used to reconstruct the phylogenetic relationship between the 233 eukaryotic species analyzed. Third, the co-distribution of phosphorylatable and sulfoxidable residues at these positions showed multiple origins throughout the evolution of eukaryotes, which further supports the view of an adaptive value for this co-occurrence. Fourth, the possibility that the coevolution of these two sites might be due to structure-driven compensatory mutations was evaluated. The results suggested that factors other than those merely structural were behind the observed coevolution. Finally, the relationship detected between other modifiable site pairs from ataxin-2 (S814-M815), ataxin-2-like (S211-M215) and Pumilio homolog 1 (S124-M125), reinforce the view of a role for phosphorylation-sulfoxidation crosstalk. Conclusions For the four stress-related proteins analyzed herein, their respective pairs of PTM sites (phosphorylatable serine and sulfoxidable methionine) were found to be evolving in a correlated fashion, which suggests a relevant role for methionine sulfoxidation and serine phosphorylation crosstalk in the control of protein translation under stress conditions. Electronic supplementary material The online version of this article (doi:10.1186/s12862-017-1017-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, 29071, Málaga, Spain.
| |
Collapse
|
26
|
Bastolla U, Dehouck Y, Echave J. What evolution tells us about protein physics, and protein physics tells us about evolution. Curr Opin Struct Biol 2017; 42:59-66. [DOI: 10.1016/j.sbi.2016.10.020] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Revised: 10/19/2016] [Accepted: 10/24/2016] [Indexed: 12/21/2022]
|
27
|
Stetz G, Verkhivker GM. Computational Analysis of Residue Interaction Networks and Coevolutionary Relationships in the Hsp70 Chaperones: A Community-Hopping Model of Allosteric Regulation and Communication. PLoS Comput Biol 2017; 13:e1005299. [PMID: 28095400 PMCID: PMC5240922 DOI: 10.1371/journal.pcbi.1005299] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/06/2016] [Indexed: 12/28/2022] Open
Abstract
Allosteric interactions in the Hsp70 proteins are linked with their regulatory mechanisms and cellular functions. Despite significant progress in structural and functional characterization of the Hsp70 proteins fundamental questions concerning modularity of the allosteric interaction networks and hierarchy of signaling pathways in the Hsp70 chaperones remained largely unexplored and poorly understood. In this work, we proposed an integrated computational strategy that combined atomistic and coarse-grained simulations with coevolutionary analysis and network modeling of the residue interactions. A novel aspect of this work is the incorporation of dynamic residue correlations and coevolutionary residue dependencies in the construction of allosteric interaction networks and signaling pathways. We found that functional sites involved in allosteric regulation of Hsp70 may be characterized by structural stability, proximity to global hinge centers and local structural environment that is enriched by highly coevolving flexible residues. These specific characteristics may be necessary for regulation of allosteric structural transitions and could distinguish regulatory sites from nonfunctional conserved residues. The observed confluence of dynamics correlations and coevolutionary residue couplings with global networking features may determine modular organization of allosteric interactions and dictate localization of key mediating sites. Community analysis of the residue interaction networks revealed that concerted rearrangements of local interacting modules at the inter-domain interface may be responsible for global structural changes and a population shift in the DnaK chaperone. The inter-domain communities in the Hsp70 structures harbor the majority of regulatory residues involved in allosteric signaling, suggesting that these sites could be integral to the network organization and coordination of structural changes. Using a network-based formalism of allostery, we introduced a community-hopping model of allosteric communication. Atomistic reconstruction of signaling pathways in the DnaK structures captured a direction-specific mechanism and molecular details of signal transmission that are fully consistent with the mutagenesis experiments. The results of our study reconciled structural and functional experiments from a network-centric perspective by showing that global properties of the residue interaction networks and coevolutionary signatures may be linked with specificity and diversity of allosteric regulation mechanisms. The diversity of allosteric mechanisms in the Hsp70 proteins could range from modulation of the inter-domain interactions and conformational dynamics to fine-tuning of the Hsp70 interactions with co-chaperones. The goal of this study is to present a systematic computational analysis of the dynamic and evolutionary factors underlying allosteric structural transformations of the Hsp70 proteins. We investigated the relationship between functional dynamics, residue coevolution, and network organization of residue interactions in the Hsp70 proteins. The results of this study revealed that conformational dynamics of the Hsp70 proteins may be linked with coevolutionary propensities and mutual information dependencies of the protein residues. Modularity and connectivity of allosteric interactions in the Hsp70 chaperones are coordinated by stable functional sites that feature unique coevolutionary signatures and high network centrality. The emergence of the inter-domain communities that are coordinated by functional centers and include highly coevolving residues could facilitate structural transitions through cooperative reorganization of the local interacting modules. We determined that the differences in the modularity of the residue interactions and organization of coevolutionary networks in DnaK may be associated with variations in their allosteric mechanisms. The network signatures of the DnaK structures are characteristic of a population-shift allostery that allows for coordinated structural rearrangements of local communities. A dislocation of mediating centers and insufficient coevolutionary coupling between functional regions may render a reduced cooperativity and promote a limited entropy-driven allostery in the Sse1 chaperone that occurs without structural changes. The results of this study showed that a network-centric framework and a community-hopping model of allosteric communication pathways may provide novel insights into molecular and evolutionary principles of allosteric regulation in the Hsp70 proteins.
Collapse
Affiliation(s)
- Gabrielle Stetz
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
- Chapman University School of Pharmacy, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
28
|
Nshogozabahizi JC, Dench J, Aris-Brosou S. Widespread Historical Contingency in Influenza Viruses. Genetics 2017; 205:409-420. [PMID: 28049709 PMCID: PMC5223518 DOI: 10.1534/genetics.116.193979] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2016] [Accepted: 11/04/2016] [Indexed: 11/18/2022] Open
Abstract
In systems biology and genomics, epistasis characterizes the impact that a substitution at a particular location in a genome can have on a substitution at another location. This phenomenon is often implicated in the evolution of drug resistance or to explain why particular "disease-causing" mutations do not have the same outcome in all individuals. Hence, uncovering these mutations and their locations in a genome is a central question in biology. However, epistasis is notoriously difficult to uncover, especially in fast-evolving organisms. Here, we present a novel statistical approach that replies on a model developed in ecology and that we adapt to analyze genetic data in fast-evolving systems such as the influenza A virus. We validate the approach using a two-pronged strategy: extensive simulations demonstrate a low-to-moderate sensitivity with excellent specificity and precision, while analyses of experimentally validated data recover known interactions, including in a eukaryotic system. We further evaluate the ability of our approach to detect correlated evolution during antigenic shifts or at the emergence of drug resistance. We show that in all cases, correlated evolution is prevalent in influenza A viruses, involving many pairs of sites linked together in chains; a hallmark of historical contingency. Strikingly, interacting sites are separated by large physical distances, which entails either long-range conformational changes or functional tradeoffs, for which we find support with the emergence of drug resistance. Our work paves a new way for the unbiased detection of epistasis in a wide range of organisms by performing whole-genome scans.
Collapse
Affiliation(s)
| | - Jonathan Dench
- Department of Biology, University of Ottawa, Ontario K1N 6N5, Canada
| | - Stéphane Aris-Brosou
- Department of Biology, University of Ottawa, Ontario K1N 6N5, Canada
- Department of Mathematics and Statistics, University of Ottawa, Ontario K1N 6N5, Canada
| |
Collapse
|
29
|
Subtype-specific structural constraints in the evolution of influenza A virus hemagglutinin genes. Sci Rep 2016; 6:38892. [PMID: 27966593 PMCID: PMC5155281 DOI: 10.1038/srep38892] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Accepted: 11/14/2016] [Indexed: 11/08/2022] Open
Abstract
The influenza A virus genome consists of eight RNA segments. RNA structures within these segments and complementary (cRNA) and protein-coding mRNAs may play a role in virus replication. Here, conserved putative secondary structures that impose significant evolutionary constraints on the gene segment encoding the surface glycoprotein hemagglutinin (HA) were investigated using available sequence data on tens of thousands of virus strains. Structural constraints were identified by analysis of covariations of nucleotides suggested to be paired by structure prediction algorithms. The significance of covariations was estimated by mutual information calculations and tracing multiple covariation events during virus evolution. Covariation patterns demonstrated that structured domains in HA RNAs were mostly subtype-specific, whereas some structures were conserved in several subtypes. The influence of RNA folding on virus replication was studied by plaque assays of mutant viruses with disrupted structures. The results suggest that over the whole length of the HA segment there are local structured domains which contribute to the virus fitness but individually are not essential for the virus. Existence of subtype-specific structured regions in the segments of the influenza A virus genome is apparently an important factor in virus evolution and reassortment of its genes.
Collapse
|
30
|
Abstract
A popular and successful strategy in semi-rational design of protein stability is the use of evolutionary information encapsulated in homologous protein sequences. Consensus design is based on the hypothesis that at a given position, the respective consensus amino acid contributes more than average to the stability of the protein than non-conserved amino acids. Here, we review the consensus design approach, its theoretical underpinnings, successes, limitations and challenges, as well as providing a detailed guide to its application in protein engineering.
Collapse
Affiliation(s)
- Benjamin T Porebski
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Clayton, Victoria 3800, Australia Medical Research Council Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | - Ashley M Buckle
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Clayton, Victoria 3800, Australia
| |
Collapse
|
31
|
Vo TV, Das J, Meyer MJ, Cordero NA, Akturk N, Wei X, Fair BJ, Degatano AG, Fragoza R, Liu LG, Matsuyama A, Trickey M, Horibata S, Grimson A, Yamano H, Yoshida M, Roth FP, Pleiss JA, Xia Y, Yu H. A Proteome-wide Fission Yeast Interactome Reveals Network Evolution Principles from Yeasts to Human. Cell 2016; 164:310-323. [PMID: 26771498 DOI: 10.1016/j.cell.2015.11.037] [Citation(s) in RCA: 71] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Revised: 10/12/2015] [Accepted: 11/04/2015] [Indexed: 01/01/2023]
Abstract
Here, we present FissionNet, a proteome-wide binary protein interactome for S. pombe, comprising 2,278 high-quality interactions, of which ∼ 50% were previously not reported in any species. FissionNet unravels previously unreported interactions implicated in processes such as gene silencing and pre-mRNA splicing. We developed a rigorous network comparison framework that accounts for assay sensitivity and specificity, revealing extensive species-specific network rewiring between fission yeast, budding yeast, and human. Surprisingly, although genes are better conserved between the yeasts, S. pombe interactions are significantly better conserved in human than in S. cerevisiae. Our framework also reveals that different modes of gene duplication influence the extent to which paralogous proteins are functionally repurposed. Finally, cross-species interactome mapping demonstrates that coevolution of interacting proteins is remarkably prevalent, a result with important implications for studying human disease in model organisms. Overall, FissionNet is a valuable resource for understanding protein functions and their evolution.
Collapse
Affiliation(s)
- Tommy V Vo
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA; Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Michael J Meyer
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA; Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY 10065, USA
| | - Nicolas A Cordero
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Nurten Akturk
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Xiaomu Wei
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA; Department of Medicine, Weill Cornell College of Medicine, New York, NY 10021, USA
| | - Benjamin J Fair
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Andrew G Degatano
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Robert Fragoza
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA; Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Lisa G Liu
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA
| | - Akihisa Matsuyama
- Chemical Genomics Research Group, RIKEN Center for Sustainable Resource Center, Wako, Saitama 351-0198, Japan
| | - Michelle Trickey
- University College London Cancer Institute, Paul O'Gorman Building, 72 Huntley Street, London WC1E 6BT, UK
| | - Sachi Horibata
- Department of Biomedical Sciences, Baker Institute for Animal Health, Cornell University, Ithaca, NY 14853, USA
| | - Andrew Grimson
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Hiroyuki Yamano
- University College London Cancer Institute, Paul O'Gorman Building, 72 Huntley Street, London WC1E 6BT, UK
| | - Minoru Yoshida
- Chemical Genomics Research Group, RIKEN Center for Sustainable Resource Center, Wako, Saitama 351-0198, Japan
| | - Frederick P Roth
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Donnelly Centre and Departments of Molecular Genetics and Computer Science, University of Toronto, Toronto, ON M5S 3E1, Canada; Canadian Institute for Advanced Research, Toronto, ON M5G 1Z8, Canada; Lunenfeld-Tanenbaum Research Institute, Mt. Sinai Hospital, Toronto, ON M5G 1X5, Canada
| | - Jeffrey A Pleiss
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, USA
| | - Yu Xia
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC H3A 0C3, Canada
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14853, USA; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
32
|
Champeimont R, Laine E, Hu SW, Penin F, Carbone A. Coevolution analysis of Hepatitis C virus genome to identify the structural and functional dependency network of viral proteins. Sci Rep 2016; 6:26401. [PMID: 27198619 PMCID: PMC4873791 DOI: 10.1038/srep26401] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 05/03/2016] [Indexed: 12/20/2022] Open
Abstract
A novel computational approach of coevolution analysis allowed us to reconstruct the protein-protein interaction network of the Hepatitis C Virus (HCV) at the residue resolution. For the first time, coevolution analysis of an entire viral genome was realized, based on a limited set of protein sequences with high sequence identity within genotypes. The identified coevolving residues constitute highly relevant predictions of protein-protein interactions for further experimental identification of HCV protein complexes. The method can be used to analyse other viral genomes and to predict the associated protein interaction networks.
Collapse
Affiliation(s)
- Raphaël Champeimont
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
| | - Elodie Laine
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
| | - Shuang-Wei Hu
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
| | - Francois Penin
- CNRS, UMR5086, Bases Moléculaires et Structurales des Systèmes Infectieux, Institut de Biologie et Chimie des Protéines, 7 Passage du Vercors, Cedex 07, F-69367 Lyon, France
- LABEX Ecofect, Université de Lyon, Lyon, France
| | - Alessandra Carbone
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l’Ecole de Médecine, 75006 Paris, France
- Institut Universitaire de France, 75005, Paris, France
| |
Collapse
|
33
|
Neuwald AF. Gleaning structural and functional information from correlations in protein multiple sequence alignments. Curr Opin Struct Biol 2016; 38:1-8. [PMID: 27179293 DOI: 10.1016/j.sbi.2016.04.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/28/2016] [Accepted: 04/29/2016] [Indexed: 10/24/2022]
Abstract
The availability of vast amounts of protein sequence data facilitates detection of subtle statistical correlations due to imposed structural and functional constraints. Recent breakthroughs using Direct Coupling Analysis (DCA) and related approaches have tapped into correlations believed to be due to compensatory mutations. This has yielded some remarkable results, including substantially improved prediction of protein intra- and inter-domain 3D contacts, of membrane and globular protein structures, of substrate binding sites, and of protein conformational heterogeneity. A complementary approach is Bayesian Partitioning with Pattern Selection (BPPS), which partitions related proteins into hierarchically-arranged subgroups based on correlated residue patterns. These correlated patterns are presumably due to structural and functional constraints associated with evolutionary divergence rather than to compensatory mutations. Hence joint application of DCA- and BPPS-based approaches should help sort out the structural and functional constraints contributing to sequence correlations.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, 801 West Baltimore St., BioPark II, Room 617, Baltimore, MD 21201, United States.
| |
Collapse
|
34
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 158] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
35
|
Bywater RP. Comparison of Algorithms for Prediction of Protein Structural Features from Evolutionary Data. PLoS One 2016; 11:e0150769. [PMID: 26963911 PMCID: PMC4786192 DOI: 10.1371/journal.pone.0150769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Accepted: 02/17/2016] [Indexed: 11/18/2022] Open
Abstract
Proteins have many functions and predicting these is still one of the major challenges in theoretical biophysics and bioinformatics. Foremost amongst these functions is the need to fold correctly thereby allowing the other genetically dictated tasks that the protein has to carry out to proceed efficiently. In this work, some earlier algorithms for predicting protein domain folds are revisited and they are compared with more recently developed methods. In dealing with intractable problems such as fold prediction, when different algorithms show convergence onto the same result there is every reason to take all algorithms into account such that a consensus result can be arrived at. In this work it is shown that the application of different algorithms in protein structure prediction leads to results that do not converge as such but rather they collude in a striking and useful way that has never been considered before.
Collapse
|
36
|
Jeong CS, Kim D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinformatics 2016; 17:99. [PMID: 26911566 PMCID: PMC4765150 DOI: 10.1186/s12859-016-0948-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 11/10/2022] Open
Abstract
Background Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. Results In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. Conclusions The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
37
|
Goldstone JV, Sundaramoorthy M, Zhao B, Waterman MR, Stegeman JJ, Lamb DC. Genetic and structural analyses of cytochrome P450 hydroxylases in sex hormone biosynthesis: Sequential origin and subsequent coevolution. Mol Phylogenet Evol 2016; 94:676-687. [PMID: 26432395 PMCID: PMC4801120 DOI: 10.1016/j.ympev.2015.09.012] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2015] [Revised: 07/27/2015] [Accepted: 09/14/2015] [Indexed: 12/14/2022]
Abstract
Biosynthesis of steroid hormones in vertebrates involves three cytochrome P450 hydroxylases, CYP11A1, CYP17A1 and CYP19A1, which catalyze sequential steps in steroidogenesis. These enzymes are conserved in the vertebrates, but their origin and existence in other chordate subphyla (Tunicata and Cephalochordata) have not been clearly established. In this study, selected protein sequences of CYP11A1, CYP17A1 and CYP19A1 were compiled and analyzed using multiple sequence alignment and phylogenetic analysis. Our analyses show that cephalochordates have sequences orthologous to vertebrate CYP11A1, CYP17A1 or CYP19A1, and that echinoderms and hemichordates possess CYP11-like but not CYP19 genes. While the cephalochordate sequences have low identity with the vertebrate sequences, reflecting evolutionary distance, the data show apparent origin of CYP11 prior to the evolution of CYP19 and possibly CYP17, thus indicating a sequential origin of these functionally related steroidogenic CYPs. Co-occurrence of the three CYPs in early chordates suggests that the three genes may have coevolved thereafter, and that functional conservation should be reflected in functionally important residues in the proteins. CYP19A1 has the largest number of conserved residues while CYP11A1 sequences are less conserved. Structural analyses of human CYP11A1, CYP17A1 and CYP19A1 show that critical substrate binding site residues are highly conserved in each enzyme family. The results emphasize that the steroidogenic pathways producing glucocorticoids and reproductive steroids are several hundred million years old and that the catalytic structural elements of the enzymes have been conserved over the same period of time. Analysis of these elements may help to identify when precursor functions linked to these enzymes first arose.
Collapse
Affiliation(s)
- Jared V Goldstone
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA
| | | | - Bin Zhao
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37232-0146, USA
| | - Michael R Waterman
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN 37232-0146, USA
| | - John J Stegeman
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA.
| | - David C Lamb
- Institute of Life Science, Medical School, Swansea University, Singleton Park, Swansea SA2 8PP, UK.
| |
Collapse
|
38
|
Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015; 83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]
Abstract
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.
Collapse
Affiliation(s)
- Daniel J Parente
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| | - J Christian J Ray
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, 66047
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| |
Collapse
|
39
|
Capitani G, Duarte JM, Baskaran K, Bliven S, Somody JC. Understanding the fabric of protein crystals: computational classification of biological interfaces and crystal contacts. Bioinformatics 2015; 32:481-9. [PMID: 26508758 PMCID: PMC4743631 DOI: 10.1093/bioinformatics/btv622] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2015] [Accepted: 10/16/2015] [Indexed: 11/20/2022] Open
Abstract
Modern structural biology still draws the vast majority of information from crystallography, a technique where the objects being investigated are embedded in a crystal lattice. Given the complexity and variety of those objects, it becomes fundamental to computationally assess which of the interfaces in the lattice are biologically relevant and which are simply crystal contacts. Since the mid-1990s, several approaches have been applied to obtain high-accuracy classification of crystal contacts and biological protein–protein interfaces. This review provides an overview of the concepts and main approaches to protein interface classification: thermodynamic estimation of interface stability, evolutionary approaches based on conservation of interface residues, and co-occurrence of the interface across different crystal forms. Among the three categories, evolutionary approaches offer the strongest promise for improvement, thanks to the incessant growth in sequence knowledge. Importantly, protein interface classification algorithms can also be used on multimeric structures obtained using other high-resolution techniques or for protein assembly design or validation purposes. A key issue linked to protein interface classification is the identification of the biological assembly of a crystal structure and the analysis of its symmetry. Here, we highlight the most important concepts and problems to be overcome in assembly prediction. Over the next few years, tools and concepts of interface classification will probably become more frequently used and integrated in several areas of structural biology and structural bioinformatics. Among the main challenges for the future are better addressing of weak interfaces and the application of interface classification concepts to prediction problems like protein–protein docking. Supplementary information: Supplementary data are available at Bioinformatics online. Contact:guido.capitani@psi.ch
Collapse
Affiliation(s)
- Guido Capitani
- Laboratory of Biomolecular Research, Paul Scherrer Institute, OFLC/110, 5232 Villigen PSI, Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Jose M Duarte
- Laboratory of Biomolecular Research, Paul Scherrer Institute, OFLC/110, 5232 Villigen PSI, Department of Biology, ETH Zurich, 8093 Zurich, Switzerland
| | - Kumaran Baskaran
- Laboratory of Biomolecular Research, Paul Scherrer Institute, OFLC/110, 5232 Villigen PSI
| | - Spencer Bliven
- Laboratory of Biomolecular Research, Paul Scherrer Institute, OFLC/110, 5232 Villigen PSI, Bioinformatics and Systems Biology Program, UC San Diego, La Jolla, CA 92093, National Center for Biotechnology Information, NIH, Bethesda, MD 20894, USA and
| | - Joseph C Somody
- Laboratory of Biomolecular Research, Paul Scherrer Institute, OFLC/110, 5232 Villigen PSI, Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
| |
Collapse
|