1
|
Yehorova D, Crean RM, Kasson PM, Kamerlin SCL. Key interaction networks: Identifying evolutionarily conserved non-covalent interaction networks across protein families. Protein Sci 2024; 33:e4911. [PMID: 38358258 PMCID: PMC10868456 DOI: 10.1002/pro.4911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/08/2024] [Accepted: 01/10/2024] [Indexed: 02/16/2024]
Abstract
Protein structure (and thus function) is dictated by non-covalent interaction networks. These can be highly evolutionarily conserved across protein families, the members of which can diverge in sequence and evolutionary history. Here we present KIN, a tool to identify and analyze conserved non-covalent interaction networks across evolutionarily related groups of proteins. KIN is available for download under a GNU General Public License, version 2, from https://www.github.com/kamerlinlab/KIN. KIN can operate on experimentally determined structures, predicted structures, or molecular dynamics trajectories, providing insight into both conserved and missing interactions across evolutionarily related proteins. This provides useful insight both into protein evolution, as well as a tool that can be exploited for protein engineering efforts. As a showcase system, we demonstrate applications of this tool to understanding the evolutionary-relevant conserved interaction networks across the class A β-lactamases.
Collapse
Affiliation(s)
- Dariia Yehorova
- School of Chemistry and Biochemistry, Georgia Institute of TechnologyAtlantaGeorgiaUSA
| | - Rory M. Crean
- Department of Chemistry—BMCUppsala UniversityUppsalaSweden
| | - Peter M. Kasson
- Department of Molecular PhysiologyUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department Biomedical EngineeringUniversity of VirginiaCharlottesvilleVirginiaUSA
- Department of Cell and Molecular BiologyUppsala UniversityUppsalaSweden
| | - Shina C. L. Kamerlin
- School of Chemistry and Biochemistry, Georgia Institute of TechnologyAtlantaGeorgiaUSA
- Department of Chemistry—BMCUppsala UniversityUppsalaSweden
| |
Collapse
|
2
|
Kim D, Ha D, Lee K, Lee H, Kim I, Kim S. An evolution-based machine learning to identify cancer type-specific driver mutations. Brief Bioinform 2023; 24:6961611. [PMID: 36575568 DOI: 10.1093/bib/bbac593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 11/18/2022] [Accepted: 12/03/2022] [Indexed: 12/29/2022] Open
Abstract
Identifying cancer type-specific driver mutations is crucial for illuminating distinct pathologic mechanisms across various tumors and providing opportunities of patient-specific treatment. However, although many computational methods were developed to predict driver mutations in a type-specific manner, the methods still have room to improve. Here, we devise a novel feature based on sequence co-evolution analysis to identify cancer type-specific driver mutations and construct a machine learning (ML) model with state-of-the-art performance. Specifically, relying on 28 000 tumor samples across 66 cancer types, our ML framework outperformed current leading methods of detecting cancer driver mutations. Interestingly, the cancer mutations identified by sequence co-evolution feature are frequently observed in interfaces mediating tissue-specific protein-protein interactions that are known to associate with shaping tissue-specific oncogenesis. Moreover, we provide pre-calculated potential oncogenicity on available human proteins with prediction scores of all possible residue alterations through user-friendly website (http://sbi.postech.ac.kr/w/cancerCE). This work will facilitate the identification of cancer type-specific driver mutations in newly sequenced tumor samples.
Collapse
Affiliation(s)
| | | | | | | | - Inhae Kim
- ImmunoBiome Inc., Pohang, South Korea
| | - Sanguk Kim
- Department of Life Sciences.,Artificial Intelligence Graduate Program, Pohang University of Science and Technology, Pohang 790-784, South Korea.,Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 120-149, South Korea
| |
Collapse
|
3
|
Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022; 50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]
Abstract
The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.
Collapse
Affiliation(s)
- Emily N. Kennedy
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Clay A. Foster
- Department of Pediatrics, Section Hematology/Oncology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| | - Sarah A. Barr
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| | - Robert B. Bourret
- Department of Microbiology & Immunology, University of North Carolina, Chapel Hill, NC, United States of America
| |
Collapse
|
4
|
Akand EH, Maher SJ, Murray JM. Mutational networks of escape from transmitted HIV-1 infection. PLoS One 2020; 15:e0243391. [PMID: 33284837 PMCID: PMC7721145 DOI: 10.1371/journal.pone.0243391] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/19/2020] [Indexed: 02/08/2023] Open
Abstract
Human immunodeficiency virus (HIV) is subject to immune selective pressure soon after it establishes infection at the founder stage. As an individual progresses from the founder to chronic stage of infection, immune pressure forces a history of mutations that are embedded in envelope sequences. Determining this pathway of coevolving mutations can assist in understanding what is different with the founder virus and the essential pathways it takes to maintain infection. We have combined operations research and bioinformatics methods to extract key networks of mutations that differentiate founder and chronic stages for 156 subtype B and 107 subtype C envelope (gp160) sequences. The chronic networks for both subtypes revealed strikingly different hub-and-spoke topologies compared to the less structured transmission networks. This suggests that the hub nodes are impacted by the immune response and the resulting loss of fitness is compensated by mutations at the spoke positions. The major hubs in the chronic C network occur at positions 12, 137 (within the N136 glycan), and 822, and at position 306 for subtype B. While both founder networks had a more heterogeneous connected network structure, interestingly founder B subnetworks around positions 640 and 837 preferentially contained CD4 and coreceptor binding domains. Finally, we observed a differential effect of glycosylation between founder and chronic subtype B where the latter had mutational pathways significantly driven by N-glycosylation. Our study provides insights into the mutational pathways HIV takes to evade the immune response, and presents features more likely to establish founder infection, valuable for effective vaccine design.
Collapse
Affiliation(s)
- Elma H. Akand
- School of Mathematics and Statistics, UNSW Sydney, Kensington, NSW, Australia
| | - Stephen J. Maher
- College of Engineering, Mathematical and Physical Sciences, University of Exeter, Exeter, United Kingdom
| | - John M. Murray
- School of Mathematics and Statistics, UNSW Sydney, Kensington, NSW, Australia
| |
Collapse
|
5
|
Verkhivker G. Coevolution, Dynamics and Allostery Conspire in Shaping Cooperative Binding and Signal Transmission of the SARS-CoV-2 Spike Protein with Human Angiotensin-Converting Enzyme 2. Int J Mol Sci 2020; 21:ijms21218268. [PMID: 33158276 PMCID: PMC7672574 DOI: 10.3390/ijms21218268] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 02/07/2023] Open
Abstract
Binding to the host receptor is a critical initial step for the coronavirus SARS-CoV-2 spike protein to enter into target cells and trigger virus transmission. A detailed dynamic and energetic view of the binding mechanisms underlying virus entry is not fully understood and the consensus around the molecular origins behind binding preferences of SARS-CoV-2 for binding with the angiotensin-converting enzyme 2 (ACE2) host receptor is yet to be established. In this work, we performed a comprehensive computational investigation in which sequence analysis and modeling of coevolutionary networks are combined with atomistic molecular simulations and comparative binding free energy analysis of the SARS-CoV and SARS-CoV-2 spike protein receptor binding domains with the ACE2 host receptor. Different from other computational studies, we systematically examine the molecular and energetic determinants of the binding mechanisms between SARS-CoV-2 and ACE2 proteins through the lens of coevolution, conformational dynamics, and allosteric interactions that conspire to drive binding interactions and signal transmission. Conformational dynamics analysis revealed the important differences in mobility of the binding interfaces for the SARS-CoV-2 spike protein that are not confined to several binding hotspots, but instead are broadly distributed across many interface residues. Through coevolutionary network analysis and dynamics-based alanine scanning, we established linkages between the binding energy hotspots and potential regulators and carriers of signal communication in the virus-host receptor complexes. The results of this study detailed a binding mechanism in which the energetics of the SARS-CoV-2 association with ACE2 may be determined by cumulative changes of a number of residues distributed across the entire binding interface. The central findings of this study are consistent with structural and biochemical data and highlight drug discovery challenges of inhibiting large and adaptive protein-protein interfaces responsible for virus entry and infection transmission.
Collapse
Affiliation(s)
- Gennady Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, CA 92866, USA; ; Tel.: +1-714-516-4586
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, USA
| |
Collapse
|
6
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
7
|
Santos YLDL, Chew-Fajardo YL, Brault G, Doucet N. Dissecting the evolvability landscape of the CalB active site toward aromatic substrates. Sci Rep 2019; 9:15588. [PMID: 31666622 PMCID: PMC6821916 DOI: 10.1038/s41598-019-51940-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 10/07/2019] [Indexed: 01/17/2023] Open
Abstract
A key event in the directed evolution of enzymes is the systematic use of mutagenesis and selection, a process that can give rise to mutant libraries containing millions of protein variants. To this day, the functional analysis and identification of active variants among such high numbers of mutational possibilities is not a trivial task. Here, we describe a combinatorial semi-rational approach to partly overcome this challenge and help design smaller and smarter mutant libraries. By adapting a liquid medium transesterification assay in organic solvent conditions with a combination of virtual docking, iterative saturation mutagenesis, and residue interaction network (RIN) analysis, we engineered lipase B from P. antarctica (CalB) to improve enzyme recognition and activity against the bulky aromatic substrates and flavoring agents methyl cinnamate and methyl salicylate. Substrate-imprinted docking was used to target active-site positions involved in enzyme-substrate and enzyme-product complexes, in addition to identifying 'hot spots' most likely to yield active variants. This iterative semi-rational design strategy allowed selection of CalB variants exhibiting increased activity in just two rounds of site-saturation mutagenesis. Beneficial replacements were observed by screening only 0.308% of the theoretical library size, illustrating how semi-rational approaches with targeted diversity can quickly facilitate the discovery of improved activity variants relevant to a number of biotechnological applications.
Collapse
Affiliation(s)
- Yossef López de Los Santos
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, 531 Boulevard des Prairies, Laval, QC, H7V 1B7, Canada
| | - Ying Lian Chew-Fajardo
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, 531 Boulevard des Prairies, Laval, QC, H7V 1B7, Canada
| | - Guillaume Brault
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, 531 Boulevard des Prairies, Laval, QC, H7V 1B7, Canada
| | - Nicolas Doucet
- Centre Armand-Frappier Santé Biotechnologie, Institut National de la Recherche Scientifique (INRS), Université du Québec, 531 Boulevard des Prairies, Laval, QC, H7V 1B7, Canada.
- PROTEO, the Québec Network for Research on Protein Function, Engineering, and Applications, 1045 Avenue de la Médecine, Université Laval, Quebec City, QC, G1V 0A6, Canada.
| |
Collapse
|
8
|
Astl L, Verkhivker GM. Data-driven computational analysis of allosteric proteins by exploring protein dynamics, residue coevolution and residue interaction networks. Biochim Biophys Acta Gen Subj 2019:S0304-4165(19)30179-5. [PMID: 31330173 DOI: 10.1016/j.bbagen.2019.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 07/15/2019] [Accepted: 07/17/2019] [Indexed: 02/07/2023]
Abstract
BACKGROUND Computational studies of allosteric interactions have witnessed a recent renaissance fueled by the growing interest in modeling of the complex molecular assemblies and biological networks. Allosteric interactions in protein structures allow for molecular communication in signal transduction networks. METHODS In this work, we performed a large scale comprehensive and multi-faceted analysis of >300 diverse allosteric proteins and complexes with allosteric modulators. By modeling and exploring coarse-grained dynamics, residue coevolution, and residue interaction networks for allosteric proteins, we have determined unifying molecular signatures shared by allosteric systems. RESULTS The results of this study have suggested that allosteric inhibitors and allosteric activators may differentially affect global dynamics and network organization of protein systems, leading to diverse allosteric mechanisms. By using structural and functional data on protein kinases, we present a detailed case study that that included atomic-level analysis of coevolutionary networks in kinases bound with allosteric inhibitors and activators. CONCLUSIONS We have found that coevolutionary networks can form direct communication pathways connecting functional regions and can recapitulate key regulatory sites and interactions responsible for allosteric signaling in the studied protein systems. The results of this computational investigation are compared with the experimental studies and reveal molecular signatures of known regulatory hotspots in protein kinases. GENERAL SIGNIFICANCE This study has shown that allosteric inhibitors and allosteric activators can have a different effect on residue interaction networks and can exploit distinct regulatory mechanisms, which could open up opportunities for probing allostery and new drug combinations with broad range of activities.
Collapse
Affiliation(s)
- Lindy Astl
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America
| | - Gennady M Verkhivker
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, United States of America.
| |
Collapse
|
9
|
A Single Mutation Increases the Thermostability and Activity of Aspergillus terreus Amine Transaminase. Molecules 2019; 24:molecules24071194. [PMID: 30934681 PMCID: PMC6479498 DOI: 10.3390/molecules24071194] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 03/21/2019] [Accepted: 03/23/2019] [Indexed: 11/17/2022] Open
Abstract
Enhancing the thermostability of (R)-selective amine transaminases (AT-ATA) will expand its application in the asymmetric synthesis of chiral amines. In this study, mutual information and coevolution networks of ATAs were analyzed by the Mutual Information Server to Infer Coevolution (MISTIC). Subsequently, the amino acids most likely to influence the stability and function of the protein were investigated by alanine scanning and saturation mutagenesis. Four stabilized mutants (L118T, L118A, L118I, and L118V) were successfully obtained. The best mutant, L118T, exhibited an improved thermal stability with a 3.7-fold enhancement in its half-life (t1/2) at 40 °C and a 5.3 °C increase in T5010 compared to the values for the wild-type protein. By the differential scanning fluorimetry (DSF) analysis, the best mutant, L118T, showed a melting temperature (Tm) of 46.4 °C, which corresponded to a 5.0 °C increase relative to the wild-type AT-ATA (41.4 °C). Furthermore, the most stable mutant L118T displayed the highest catalytic efficiency among the four stabilized mutants.
Collapse
|
10
|
Beleva Guthrie V, Masica DL, Fraser A, Federico J, Fan Y, Camps M, Karchin R. Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations. Mol Biol Evol 2019. [PMID: 29522102 PMCID: PMC5967520 DOI: 10.1093/molbev/msy036] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure.
Collapse
Affiliation(s)
- Violeta Beleva Guthrie
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - David L Masica
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Andrew Fraser
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Joseph Federico
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Yunfan Fan
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD
| | - Manel Camps
- Department of Environmental Toxicology, University of California Santa Cruz, Santa Cruz, CA
| | - Rachel Karchin
- Department of Biomedical Engineering and Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD.,Department of Oncology, Johns Hopkins University Medicine, Baltimore, MD
| |
Collapse
|
11
|
Astl L, Tse A, Verkhivker GM. Interrogating Regulatory Mechanisms in Signaling Proteins by Allosteric Inhibitors and Activators: A Dynamic View Through the Lens of Residue Interaction Networks. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2019; 1163:187-223. [DOI: 10.1007/978-981-13-8719-7_9] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
12
|
Murray JM, Maher S, Mota T, Suzuki K, Kelleher AD, Center RJ, Purcell D. Differentiating founder and chronic HIV envelope sequences. PLoS One 2017; 12:e0171572. [PMID: 28187204 PMCID: PMC5302377 DOI: 10.1371/journal.pone.0171572] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 01/23/2017] [Indexed: 11/27/2022] Open
Abstract
Significant progress has been made in characterizing broadly neutralizing antibodies against the HIV envelope glycoprotein Env, but an effective vaccine has proven elusive. Vaccine development would be facilitated if common features of early founder virus required for transmission could be identified. Here we employ a combination of bioinformatic and operations research methods to determine the most prevalent features that distinguish 78 subtype B and 55 subtype C founder Env sequences from an equal number of chronic sequences. There were a number of equivalent optimal networks (based on the fewest covarying amino acid (AA) pairs or a measure of maximal covariance) that separated founders from chronics: 13 pairs for subtype B and 75 for subtype C. Every subtype B optimal solution contained the founder pairs 178–346 Asn-Val, 232–236 Thr-Ser, 240–340 Lys-Lys, 279–315 Asp-Lys, 291–792 Ala-Ile, 322–347 Asp-Thr, 535–620 Leu-Asp, 742–837 Arg-Phe, and 750–836 Asp-Ile; the most common optimal pairs for subtype C were 644–781 Lys-Ala (74 of 75 networks), 133–287 Ala-Gln (73/75) and 307–337 Ile-Gln (73/75). No pair was present in all optimal subtype C solutions highlighting the difficulty in targeting transmission with a single vaccine strain. Relative to the size of its domain (0.35% of Env), the α4β7 binding site occurred most frequently among optimal pairs, especially for subtype C: 4.2% of optimal pairs (1.2% for subtype B). Early sequences from 5 subtype B pre-seroconverters each exhibited at least one clone containing an optimal feature 553–624 (Ser-Asn), 724–747 (Arg-Arg), or 46–293 (Arg-Glu).
Collapse
Affiliation(s)
- John M. Murray
- School of Mathematics and Statistics, UNSW Sydney, Sydney, New South Wales, Australia
- * E-mail:
| | - Stephen Maher
- School of Mathematics and Statistics, UNSW Sydney, Sydney, New South Wales, Australia
- Zuse Institute Berlin, Berlin, Germany
| | - Talia Mota
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Kazuo Suzuki
- The Kirby Institute, UNSW Sydney, Sydney, New South Wales, Australia
| | | | - Rob J. Center
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| | - Damian Purcell
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia
| |
Collapse
|
13
|
Jeong CS, Kim D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinformatics 2016; 17:99. [PMID: 26911566 PMCID: PMC4765150 DOI: 10.1186/s12859-016-0948-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 11/10/2022] Open
Abstract
Background Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. Results In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. Conclusions The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
14
|
Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015; 83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]
Abstract
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.
Collapse
Affiliation(s)
- Daniel J Parente
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| | - J Christian J Ray
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, 66047
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| |
Collapse
|
15
|
Tse A, Verkhivker GM. Molecular Determinants Underlying Binding Specificities of the ABL Kinase Inhibitors: Combining Alanine Scanning of Binding Hot Spots with Network Analysis of Residue Interactions and Coevolution. PLoS One 2015; 10:e0130203. [PMID: 26075886 PMCID: PMC4468085 DOI: 10.1371/journal.pone.0130203] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 05/17/2015] [Indexed: 12/20/2022] Open
Abstract
Quantifying binding specificity and drug resistance of protein kinase inhibitors is of fundamental importance and remains highly challenging due to complex interplay of structural and thermodynamic factors. In this work, molecular simulations and computational alanine scanning are combined with the network-based approaches to characterize molecular determinants underlying binding specificities of the ABL kinase inhibitors. The proposed theoretical framework unveiled a relationship between ligand binding and inhibitor-mediated changes in the residue interaction networks. By using topological parameters, we have described the organization of the residue interaction networks and networks of coevolving residues in the ABL kinase structures. This analysis has shown that functionally critical regulatory residues can simultaneously embody strong coevolutionary signal and high network centrality with a propensity to be energetic hot spots for drug binding. We have found that selective (Nilotinib) and promiscuous (Bosutinib, Dasatinib) kinase inhibitors can use their energetic hot spots to differentially modulate stability of the residue interaction networks, thus inhibiting or promoting conformational equilibrium between inactive and active states. According to our results, Nilotinib binding may induce a significant network-bridging effect and enhance centrality of the hot spot residues that stabilize structural environment favored by the specific kinase form. In contrast, Bosutinib and Dasatinib can incur modest changes in the residue interaction network in which ligand binding is primarily coupled only with the identity of the gate-keeper residue. These factors may promote structural adaptability of the active kinase states in binding with these promiscuous inhibitors. Our results have related ligand-induced changes in the residue interaction networks with drug resistance effects, showing that network robustness may be compromised by targeted mutations of key mediating residues. This study has outlined mechanisms by which inhibitor binding could modulate resilience and efficiency of allosteric interactions in the kinase structures, while preserving structural topology required for catalytic activity and regulation.
Collapse
Affiliation(s)
- Amanda Tse
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
- Chapman University School of Pharmacy, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
16
|
Viswanathan K, Shriver Z, Babcock GJ. Amino acid interaction networks provide a new lens for therapeutic antibody discovery and anti-viral drug optimization. Curr Opin Virol 2015; 11:122-9. [DOI: 10.1016/j.coviro.2015.03.019] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2014] [Revised: 03/16/2015] [Accepted: 03/31/2015] [Indexed: 11/24/2022]
|
17
|
Shih ESC, Hwang MJ. NPPD: A Protein-Protein Docking Scoring Function Based on Dyadic Differences in Networks of Hydrophobic and Hydrophilic Amino Acid Residues. BIOLOGY 2015; 4:282-97. [PMID: 25811640 PMCID: PMC4498300 DOI: 10.3390/biology4020282] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 03/16/2015] [Indexed: 11/16/2022]
Abstract
Protein-protein docking (PPD) predictions usually rely on the use of a scoring function to rank docking models generated by exhaustive sampling. To rank good models higher than bad ones, a large number of scoring functions have been developed and evaluated, but the methods used for the computation of PPD predictions remain largely unsatisfactory. Here, we report a network-based PPD scoring function, the NPPD, in which the network consists of two types of network nodes, one for hydrophobic and the other for hydrophilic amino acid residues, and the nodes are connected when the residues they represent are within a certain contact distance. We showed that network parameters that compute dyadic interactions and those that compute heterophilic interactions of the amino acid networks thus constructed allowed NPPD to perform well in a benchmark evaluation of 115 PPD scoring functions, most of which, unlike NPPD, are based on some sort of protein-protein interaction energy. We also showed that NPPD was highly complementary to these energy-based scoring functions, suggesting that the combined use of conventional scoring functions and NPPD might significantly improve the accuracy of current PPD predictions.
Collapse
Affiliation(s)
- Edward S C Shih
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan.
| | - Ming-Jing Hwang
- Institute of Biomedical Sciences, Academia Sinica, Nankang, Taipei 115, Taiwan.
| |
Collapse
|
18
|
Janda JO, Popal A, Bauer J, Busch M, Klocke M, Spitzer W, Keller J, Merkl R. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 2014; 15:118. [PMID: 24766829 PMCID: PMC4021312 DOI: 10.1186/1471-2105-15-118] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to. RESULTS To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de. CONCLUSIONS Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany.
| |
Collapse
|
19
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
20
|
Murray JM, Moenne-Loccoz R, Velay A, Habersetzer F, Doffoël M, Gut JP, Fofana I, Zeisel MB, Stoll-Keller F, Baumert TF, Schvoerer E. Genotype 1 hepatitis C virus envelope features that determine antiviral response assessed through optimal covariance networks. PLoS One 2013; 8:e67254. [PMID: 23840641 PMCID: PMC3688619 DOI: 10.1371/journal.pone.0067254] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2013] [Accepted: 05/14/2013] [Indexed: 01/25/2023] Open
Abstract
The poor response to the combined antiviral therapy of pegylated alfa-interferon and ribavarin for hepatitis C virus (HCV) infection may be linked to mutations in the viral envelope gene E1E2 (env), which can result in escape from the immune response and higher efficacy of viral entry. Mutations that result in failure of therapy most likely require compensatory mutations to achieve sufficient change in envelope structure and function. Compensatory mutations were investigated by determining positions in the E1E2 gene where amino acids (aa) covaried across groups of individuals. We assessed networks of covarying positions in E1E2 sequences that differentiated sustained virological response (SVR) from non-response (NR) in 43 genotype 1a (17 SVR), and 49 genotype 1b (25 SVR) chronically HCV-infected individuals. Binary integer programming over covariance networks was used to extract aa combinations that differed between response groups. Genotype 1a E1E2 sequences exhibited higher degrees of covariance and clustered into 3 main groups while 1b sequences exhibited no clustering. Between 5 and 9 aa pairs were required to separate SVR from NR in each genotype. aa in hypervariable region 1 were 6 times more likely than chance to occur in the optimal networks. The pair 531-626 (EI) appeared frequently in the optimal networks and was present in 6 of 9 NR in one of the 1a clusters. The most frequent pairs representing SVR were 431-481 (EE), 500-522 (QA) in 1a, and 407-434 (AQ) in 1b. Optimal networks based on covarying aa pairs in HCV envelope can indicate features that are associated with failure or success to antiviral therapy.
Collapse
Affiliation(s)
- John M Murray
- School of Mathematics and Statistics, University of New South Wales, Sydney, NSW, Australia.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Molecular dynamics simulations and statistical coupling analysis reveal functional coevolution network of oncogenic mutations in the CDKN2A-CDK6 complex. FEBS Lett 2012. [PMID: 23178718 DOI: 10.1016/j.febslet.2012.11.001] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Coevolution between proteins is crucial for understanding protein-protein interaction. Simultaneous changes allow a protein complex to maintain its overall structural-functional integrity. In this study, we combined statistical coupling analysis (SCA) and molecular dynamics simulations on the CDK6-CDKN2A protein complex to evaluate coevolution between proteins. We reconstructed an inter-protein residue coevolution network, consisting of 37 residues and 37 interactions. It shows that most of the coevolved residue pairs are spatially proximal. When the mutations happened, the stable local structures were broken up and thus the protein interaction was decreased or inhibited, with a following increased risk of melanoma. The identification of inter-protein coevolved residues in the CDK6-CDKN2A complex can be helpful for designing protein engineering experiments.
Collapse
|
22
|
Jeong CS, Kim D. Reliable and robust detection of coevolving protein residues†. Protein Eng Des Sel 2012; 25:705-13. [DOI: 10.1093/protein/gzs081] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
23
|
Wang C, Huang R, He B, Du Q. Improving the thermostability of alpha-amylase by combinatorial coevolving-site saturation mutagenesis. BMC Bioinformatics 2012; 13:263. [PMID: 23057711 PMCID: PMC3478181 DOI: 10.1186/1471-2105-13-263] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 09/11/2012] [Indexed: 11/12/2022] Open
Abstract
Background The generation of focused mutant libraries at hotspot residues is an important strategy in directed protein evolution. Existing methods, such as combinatorial active site testing and residual coupling analysis, depend primarily on the evolutionary conserved information to find the hotspot residues. Hardly any attention has been paid to another important functional and structural determinants, the functionally correlated variation information--coevolution. Results In this paper, we suggest a new method, named combinatorial coevolving-site saturation mutagenesis (CCSM), in which the functionally correlated variation sites of proteins are chosen as the hotspot sites to construct focused mutant libraries. The CCSM approach was used to improve the thermal stability of α-amylase from Bacillus subtilis CN7 (Amy7C). The results indicate that the CCSM can identify novel beneficial mutation sites, and enhance the thermal stability of wild-type Amy7C by 8°C (
T5030), which could not be achieved with the ordinarily rational introduction of single or a double point mutation. Conclusions Our method is able to produce more thermostable mutant α-amylases with novel beneficial mutations at new sites. It is also verified that the coevolving sites can be used as the hotspots to construct focused mutant libraries in protein engineering. This study throws new light on the active researches of the molecular coevolution.
Collapse
Affiliation(s)
- Chenghua Wang
- Nanjing University of Technology, Nanjing, Jiangsu, China
| | | | | | | |
Collapse
|
24
|
Sen L, Fares M, Su YJ, Wang T. Molecular evolution of psbA gene in ferns: unraveling selective pressure and co-evolutionary pattern. BMC Evol Biol 2012; 12:145. [PMID: 22899792 PMCID: PMC3499216 DOI: 10.1186/1471-2148-12-145] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Accepted: 08/08/2012] [Indexed: 01/20/2023] Open
Abstract
Background The photosynthetic oxygen-evolving photo system II (PS II) produces almost the entire oxygen in the atmosphere. This unique biochemical system comprises a functional core complex that is encoded by psbA and other genes. Unraveling the evolutionary dynamics of this gene is of particular interest owing to its direct role in oxygen production. psbA underwent gene duplication in leptosporangiates, in which both copies have been preserved since. Because gene duplication is often followed by the non-fictionalization of one of the copies and its subsequent erosion, preservation of both psbA copies pinpoint functional or regulatory specialization events. The aim of this study was to investigate the molecular evolution of psbA among fern lineages. Results We sequenced psbA , which encodes D1 protein in the core complex of PSII, in 20 species representing 8 orders of extant ferns; then we searched for selection and convolution signatures in psbA across the 11 fern orders. Collectively, our results indicate that: (1) selective constraints among D1 protein relaxed after the duplication in 4 leptosporangiate orders; (2) a handful positively selected codons were detected within species of single copy psbA, but none in duplicated ones; (3) a few sites among D1 protein were involved in co-evolution process which may intimate significant functional/structural communications between them. Conclusions The strong competition between ferns and angiosperms for light may have been the main cause for a continuous fixation of adaptive amino acid changes in psbA , in particular after its duplication. Alternatively, a single psbA copy may have undergone bursts of adaptive changes at the molecular level to overcome angiosperms competition. The strong signature of positive Darwinian selection in a major part of D1 protein is testament to this. At the same time, species own two psbA copies hardly have positive selection signals among the D1 protein coding sequences. In this study, eleven co-evolving sites have been detected via different molecules, which may be more important than others.
Collapse
Affiliation(s)
- Lin Sen
- State Key Laboratory of Virology, College of Life Sciences, Wuhan University, Wuhan, China
| | | | | | | |
Collapse
|
25
|
Aguilar D, Oliva B, Marino Buslje C. Mapping the mutual information network of enzymatic families in the protein structure to unveil functional features. PLoS One 2012; 7:e41430. [PMID: 22848494 PMCID: PMC3405127 DOI: 10.1371/journal.pone.0041430] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 06/26/2012] [Indexed: 11/24/2022] Open
Abstract
Amino acids committed to a particular function correlate tightly along evolution and tend to form clusters in the 3D structure of the protein. Consequently, a protein can be seen as a network of co-evolving clusters of residues. The goal of this work is two-fold: first, we have combined mutual information and structural data to describe the amino acid networks within a protein and their interactions. Second, we have investigated how this information can be used to improve methods of prediction of functional residues by reducing the search space. As a main result, we found that clusters of co-evolving residues related to the catalytic site of an enzyme have distinguishable topological properties in the network. We also observed that these clusters usually evolve independently, which could be related to a fail-safe mechanism. Finally, we discovered a significant enrichment of functional residues (e.g. metal binding, susceptibility to detrimental mutations) in the clusters, which could be the foundation of new prediction tools.
Collapse
Affiliation(s)
- Daniel Aguilar
- Structural Bioinformatics Group, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona Biomedical Research Park, Barcelona, Spain.
| | | | | |
Collapse
|
26
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
27
|
Feng X, Sanchis J, Reetz MT, Rabitz H. Enhancing the efficiency of directed evolution in focused enzyme libraries by the adaptive substituent reordering algorithm. Chemistry 2012; 18:5646-54. [PMID: 22434591 DOI: 10.1002/chem.201103811] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Indexed: 11/11/2022]
Abstract
Directed evolution is a broadly successful strategy for protein engineering in the quest to enhance the stereoselectivity, activity, and thermostability of enzymes. To increase the efficiency of directed evolution based on iterative saturation mutagenesis, the adaptive substituent reordering algorithm (ASRA) is introduced here as an alternative to traditional quantitative structure-activity relationship (QSAR) methods for identifying potential protein mutants with desired properties from minimal sampling of focused libraries. The operation of ASRA depends on identifying the underlying regularity of the protein property landscape, allowing it to make predictions without explicit knowledge of the structure-property relationships. In a proof-of-principle study, ASRA identified all or most of the best enantioselective mutants among the synthesized epoxide hydrolase from Aspergillus niger, in the absence of peptide seeds with high E-values. ASRA even revealed a laboratory error from irregularities of the reordered E-value landscape alone.
Collapse
Affiliation(s)
- Xiaojiang Feng
- Department of Chemistry, Princeton University, New Jersey 08544, USA
| | | | | | | |
Collapse
|
28
|
Cheng CP, Lee PF, Liu WC, Wu IC, Chin CY, Chang TT, Tseng VS. Analysis of precore/core covariances associated with viral kinetics and genotypes in hepatitis B e antigen-positive chronic hepatitis B patients. PLoS One 2012; 7:e32553. [PMID: 22384271 PMCID: PMC3288105 DOI: 10.1371/journal.pone.0032553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2011] [Accepted: 02/01/2012] [Indexed: 12/17/2022] Open
Abstract
Hepatitis B virus (HBV) is one of the most common DNA viruses that can cause aggressive hepatitis, cirrhosis and hepatocellular carcinoma. Although many people are persistently infected with HBV, the kinetics in serum levels of viral loads and the host immune responses vary from person to person. HBV precore/core open reading frame (ORF) encoding proteins, hepatitis B e antigen (HBeAg) and core antigen (HBcAg), are two indicators of active viral replication. The aim of this study was to discover a variety of amino acid covariances in responses to viral kinetics, seroconversion and genotypes during the course of HBV infection. A one year follow-up study was conducted with a total number of 1,694 clones from 23 HBeAg-positive chronic hepatitis B patients. Serum alanine aminotransferase, HBV DNA and HBeAg levels were measured monthly as criteria for clustering patients into several different subgroups. Monthly derived multiple precore/core ORFs were directly sequenced and translated into amino acid sequences. For each subgroup, time-dependent covariances were identified from their time-varying sequences over the entire follow-up period. The fluctuating, wavering, HBeAg-nonseroconversion and genotype C subgroups showed greater degrees of covariances than the stationary, declining, HBeAg-seroconversion and genotype B. Referring to literature, mutation hotspots within our identified covariances were associated with the infection process. Remarkably, hotspots were predominant in genotype C. Moreover, covariances were also identified at early stage (spanning from baseline to a peak of serum HBV DNA) in order to determine the intersections with aforementioned time-dependent covariances. Preserved covariances, namely representative covariances, of each subgroup are visually presented using a tree-based structure. Our results suggested that identified covariances were strongly associated with viral kinetics, seroconversion and genotypes. Moreover, representative covariances may benefit clinicians to prescribe a suitable treatment for patients even if they have no obvious symptoms at the early stage of HBV infection.
Collapse
Affiliation(s)
- Chun-Pei Cheng
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Pei-Fen Lee
- Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
| | - Wen-Chun Liu
- Department of Biotechnology, Ming Dao University, Changhua, Taiwan
| | - I-Chin Wu
- Department of Internal Medicine, National Cheng Kung University Hospital, Tainan, Taiwan
- Graduate Institute of Clinical Medicine, National Cheng Kung University, Tainan, Taiwan
- Infectious Disease and Signaling Research Center, National Cheng Kung University, Tainan, Taiwan
| | - Chu-Yu Chin
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
| | - Ting-Tsung Chang
- Department of Internal Medicine, National Cheng Kung University Hospital, Tainan, Taiwan
- Institute of Basic Medical Sciences, National Cheng Kung University, Tainan, Taiwan
- Infectious Disease and Signaling Research Center, National Cheng Kung University, Tainan, Taiwan
| | - Vincent S. Tseng
- Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
- Institute of Medical Informatics, National Cheng Kung University, Tainan, Taiwan
- * E-mail:
| |
Collapse
|
29
|
Network models of TEM β-lactamase mutations coevolving under antibiotic selection show modular structure and anticipate evolutionary trajectories. PLoS Comput Biol 2011; 7:e1002184. [PMID: 21966264 PMCID: PMC3178621 DOI: 10.1371/journal.pcbi.1002184] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Accepted: 07/19/2011] [Indexed: 01/13/2023] Open
Abstract
Understanding how novel functions evolve (genetic adaptation) is a critical goal of evolutionary biology. Among asexual organisms, genetic adaptation involves multiple mutations that frequently interact in a non-linear fashion (epistasis). Non-linear interactions pose a formidable challenge for the computational prediction of mutation effects. Here we use the recent evolution of β-lactamase under antibiotic selection as a model for genetic adaptation. We build a network of coevolving residues (possible functional interactions), in which nodes are mutant residue positions and links represent two positions found mutated together in the same sequence. Most often these pairs occur in the setting of more complex mutants. Focusing on extended-spectrum resistant sequences, we use network-theoretical tools to identify triple mutant trajectories of likely special significance for adaptation. We extrapolate evolutionary paths (n = 3) that increase resistance and that are longer than the units used to build the network (n = 2). These paths consist of a limited number of residue positions and are enriched for known triple mutant combinations that increase cefotaxime resistance. We find that the pairs of residues used to build the network frequently decrease resistance compared to their corresponding singlets. This is a surprising result, given that their coevolution suggests a selective advantage. Thus, β-lactamase adaptation is highly epistatic. Our method can identify triplets that increase resistance despite the underlying rugged fitness landscape and has the unique ability to make predictions by placing each mutant residue position in its functional context. Our approach requires only sequence information, sufficient genetic diversity, and discrete selective pressures. Thus, it can be used to analyze recent evolutionary events, where coevolution analysis methods that use phylogeny or statistical coupling are not possible. Improving our ability to assess evolutionary trajectories will help predict the evolution of clinically relevant genes and aid in protein design. Understanding how new biological activities evolve on the molecular level has critical implications for biotechnology and for human health. Here we collect a database of mutations that contribute to the evolution of β-lactamase resistance to inhibitors and to new β-lactam antibiotics in bacterial pathogens, such as Escherichia coli. We compiled a database of TEM β-lactamase sequences evolved under antibiotic pressure and identified functional interactions between individual residue positions. We visualized these complex molecular interactions as a network and used network theory to derive information regarding the origin of individual mutations and their contribution to the observed resistance. Our approach should help interpret sequence databases for clinically relevant proteins undergoing high mutation rates and under selective (drug, immune) pressure, such as surface proteins of pathogens (particularly of RNA viruses such as HIV) or targets for chemotherapy in microbial pathogen or tumor cells. Notably, our approach only requires sequence data; detailed phylogenetic or tertiary structure information for the target gene is not necessary. Our analysis of how individual mutations work together to produce new biological activities should help anticipate evolution driven by a variety of clinically-relevant selections such as drug resistance, virulence, and immunity.
Collapse
|
30
|
Kumar S, Dudley JT, Filipski A, Liu L. Phylomedicine: an evolutionary telescope to explore and diagnose the universe of disease mutations. Trends Genet 2011; 27:377-86. [PMID: 21764165 PMCID: PMC3272884 DOI: 10.1016/j.tig.2011.06.004] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2011] [Revised: 06/10/2011] [Accepted: 06/13/2011] [Indexed: 12/30/2022]
Abstract
Modern technologies have made the sequencing of personal genomes routine. They have revealed thousands of nonsynonymous (amino acid altering) single nucleotide variants (nSNVs) of protein-coding DNA per genome. What do these variants foretell about an individual's predisposition to diseases? The experimental technologies required to carry out such evaluations at a genomic scale are not yet available. Fortunately, the process of natural selection has lent us an almost infinite set of tests in nature. During long-term evolution, new mutations and existing variations have been evaluated for their biological consequences in countless species, and outcomes are readily revealed by multispecies genome comparisons. We review studies that have investigated evolutionary characteristics and in silico functional diagnoses of nSNVs found in thousands of disease-associated genes. We conclude that the patterns of long-term evolutionary conservation and permissible sequence divergence are essential and instructive modalities for functional assessment of human genetic variations.
Collapse
Affiliation(s)
- Sudhir Kumar
- School of Life Sciences, Arizona State University, Tempe, AZ 85287-4501, USA.
| | | | | | | |
Collapse
|
31
|
Tungtur S, Parente DJ, Swint-Kruse L. Functionally important positions can comprise the majority of a protein's architecture. Proteins 2011; 79:1589-608. [PMID: 21374721 DOI: 10.1002/prot.22985] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 12/08/2010] [Accepted: 12/15/2010] [Indexed: 01/13/2023]
Abstract
Concomitant with the genomic era, many bioinformatics programs have been developed to identify functionally important positions from sequence alignments of protein families. To evaluate these analyses, many have used the LacI/GalR family and determined whether positions predicted to be "important" are validated by published experiments. However, we previously noted that predictions do not identify all of the experimentally important positions present in the linker regions of these homologs. In an attempt to reconcile these differences, we corrected and expanded the LacI/GalR sequence set commonly used in sequence/function analyses. Next, a variety of analyses were carried out (1) for the entire LacI/GalR sequence set and (2) for a subset of homologs with functionally-important "YxPxxxAxxL" motifs in their linkers. This strategy was devised to determine whether predictions could be improved by knowledge-based sequence sorting and-for some analyses-did increase the number of linker positions identified. However, two functionally important linker positions were not reliably identified by any analysis. Finally, we compared the new predictions to all known experimental data for E. coli LacI and three homologous linkers. From these, we estimate that >50% of positions are important to the functions of the LacI/GalR homologs. In corollary, neutral positions might occur less frequently and might be easier to detect in sequence analyses. Although analyses have successfully guided mutations that partially exchange protein functions, a better experimental understanding of the sequence/function relationships in protein families would be helpful for uncovering the remaining rules used by nature to evolve new protein functions.
Collapse
Affiliation(s)
- Sudheer Tungtur
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, MSN 3030, Kansas City, Kansas 66160, USA
| | | | | |
Collapse
|
32
|
Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010; 6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Collapse
|
33
|
Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol 2010; 6. [PMID: 20862353 PMCID: PMC2940720 DOI: 10.1371/journal.pcbi.1000923] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 08/09/2010] [Indexed: 11/18/2022] Open
Abstract
Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/. Point mutations (i.e., changes of a single sequence element) can have a severe impact on protein function. Many diseases are caused by such minute defects. On the other hand, the majority of such mutations does not lead to noticeable effects. Although previous research has revealed important aspects that influence or predict the chance of a mutation to cause disease, much remains to be learned before we fully understand this complex problem. In our work, we use the observation that sometimes certain positions in a protein mutate in an apparently correlated fashion and analyze this correlation with respect to mutation vulnerability. Our results show that positions exhibiting evolutionary correlation are significantly more likely to be vulnerable to mutation than average positions. On one hand, our data further support the concept of correlated positions to not only be associated with protein contacts but also functional sites and/or disease positions (as introduced by others). On the other hand, this could be useful to further improve the understanding and prediction of the consequences of mutations. Our work is the first to attempt a large-scale quantitation of this relationship.
Collapse
Affiliation(s)
- Andreas Kowarsch
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Angelika Fuchs
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Philipp Pagel
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
34
|
Jeong CS, Kim D. Linear predictive coding representation of correlated mutation for protein sequence alignment. BMC Bioinformatics 2010; 11 Suppl 2:S2. [PMID: 20406500 PMCID: PMC3165164 DOI: 10.1186/1471-2105-11-s2-s2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background Although both conservation and correlated mutation (CM) are important information reflecting the different sorts of context in multiple sequence alignment, most of alignment methods use sequence profiles that only represent conservation. There is no general way to represent correlated mutation and incorporate it with sequence alignment yet. Methods We develop a novel method, CM profile, to represent correlated mutation as the spectral feature derived by using linear predictive coding where correlated mutations among different positions are represented by a fixed number of values. We combine CM profile with conventional sequence profile to improve alignment quality. Results For distantly related protein pairs, using CM profile improves the profile-profile alignment with or without predicted secondary structure. Especially, at superfamily level, combining CM profile with sequence profile improves profile-profile alignment by 9.5% while predicted secondary structure does by 6.0%. More significantly, using both of them improves profile-profile alignment by 13.9%. We also exemplify the effectiveness of CM profile by demonstrating that the resulting alignment preserves share coevolution and contacts. Conclusions In this work, we introduce a novel method, CM profile, which represents correlated mutation information as paralleled form, and apply it to the protein sequence alignment problem. When combined with conventional sequence profile, CM profile improves alignment quality significantly better than predicted secondary structure information, which should be beneficial for target-template alignment in protein structure prediction. Because of the generality of CM profile, it can be used for other bioinformatics applications in the same way of using sequence profile.
Collapse
Affiliation(s)
- Chan-seok Jeong
- Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea
| | | |
Collapse
|
35
|
Chakrabarti S, Panchenko AR. Structural and functional roles of coevolved sites in proteins. PLoS One 2010; 5:e8591. [PMID: 20066038 PMCID: PMC2797611 DOI: 10.1371/journal.pone.0008591] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 10/19/2009] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification. METHODOLOGY In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution. CONCLUSION Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| |
Collapse
|
36
|
Comparing the functional roles of nonconserved sequence positions in homologous transcription repressors: implications for sequence/function analyses. J Mol Biol 2009; 395:785-802. [PMID: 19818797 DOI: 10.1016/j.jmb.2009.10.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Revised: 10/01/2009] [Accepted: 10/02/2009] [Indexed: 11/21/2022]
Abstract
The explosion of protein sequences deduced from genetic code has led to both a problem and a potential resource: Efficient data use requires interpreting the functional impact of sequence change without experimentally characterizing each protein variant. Several groups have hypothesized that interpretation could be aided by analyzing the sequences of naturally occurring homologues. To that end, myriad sequence/function analyses have been developed to predict which conserved, semi-conserved, and nonconserved positions are functionally important. These positions must be discriminated from the nonconserved positions that are functionally silent. However, the assumptions that underlie sequence analyses are based on experimental results that are sparse and usually designed to address different questions. Here, we use three homologues from a test family common to bioinformatics-the LacI/GalR transcription repressors-to test a common assumption: If a position is functionally important for one family member, it has similar importance in all homologues. We generated experimental sequence/function information for each nonconserved position in the 18 amino acids that link the DNA-binding and regulatory domains of three LacI/GalR homologues. We find that the functional importance of each position is preserved among the three linkers, albeit to different degrees. We also find that every linker position contributes to function, which has twofold implications. (1) Since the linker positions range from highly conserved to semi-conserved to nonconserved and contribute to affinity, selectivity, and allosteric response, we assert that sequence/function analyses must identify positions in the LacI/GalR linkers to be qualified as "successful". Many analyses overlook this region since most of the residues do not directly contact ligand. (2) No position in the LacI/GalR linker is functionally silent. This finding is inconsistent with another underlying principle of many analyses: Using sequence sets to discriminate important from non-contributing positions obligates silent positions, which denotes that most homologues tolerate a variety of amino acid substitutions at the position without functional change. Instead, additional combinatorial mutants in the LacI/GalR linkers show that particular substitutions can be silent in a context-dependent manner. Thus, specific permutations of sequence change (rather than change at silent positions) would facilitate neutral drift during evolution. Finally, the combinatorial mutants also reveal functional synergy between semi- and nonconserved positions. Such functional relationships would be missed by analyses that rely primarily upon co-evolution.
Collapse
|
37
|
Reetz M, Soni P, Acevedo J, Sanchis J. Creation of an Amino Acid Network of Structurally Coupled Residues in the Directed Evolution of a Thermostable Enzyme. Angew Chem Int Ed Engl 2009; 48:8268-72. [DOI: 10.1002/anie.200904209] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
38
|
Reetz M, Soni P, Acevedo J, Sanchis J. Creation of an Amino Acid Network of Structurally Coupled Residues in the Directed Evolution of a Thermostable Enzyme. Angew Chem Int Ed Engl 2009. [DOI: 10.1002/ange.200904209] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
39
|
Lee BC, Kim D. A new method for revealing correlated mutations under the structural and functional constraints in proteins. ACTA ACUST UNITED AC 2009; 25:2506-13. [PMID: 19628501 DOI: 10.1093/bioinformatics/btp455] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Diverse studies have shown that correlated mutation (CM) is an important molecular evolutionary process alongside conservation. However, attempts to find the residue pairs that co-evolve under the structural and/or functional constraints are complicated by the fact that a large portion of covariance signals found in multiple sequence alignments arise from correlations due to common ancestry and stochastic noise. RESULTS Assuming that the background noise can be estimated from the coevolutionary relationships among residues, we propose a new measure for background noise called the normalized coevolutionary pattern similarity (NCPS) score. By subtracting NCPS scores from raw CM scores and combining the results with an entropy factor, we show that these new scores effectively reduce the background noise. To test the effectiveness of this method in detecting residue pairs coevolving under the structural constraints, two independent test sets were performed, showing that this new method performs better than the most accurate method currently available. In addition, we also applied our method to double mutant cycle experiments and protein-protein interactions. Although more rigorous tests are required, we obtained promising results that our method tended to explain those data better than other methods. These results suggest that the new noise-reduced CM scores developed in this study can be a valuable tool for the study of correlated mutations under the structural and/or functional constraints in proteins. AVAILABILITY http://pbil.kaist.ac.kr
Collapse
Affiliation(s)
- Byung-Chul Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Korea
| | | |
Collapse
|
40
|
Buslje CM, Santos J, Delfino JM, Nielsen M. Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information. Bioinformatics 2009; 25:1125-31. [PMID: 19276150 DOI: 10.1093/bioinformatics/btp135] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Mutual information (MI) theory is often applied to predict positional correlations in a multiple sequence alignment (MSA) to make possible the analysis of those positions structurally or functionally important in a given fold or protein family. Accurate identification of coevolving positions in protein sequences is difficult due to the high background signal imposed by phylogeny and noise. Several methods have been proposed using MI to identify coevolving amino acids in protein families. RESULTS After evaluating two current methods, we demonstrate how the use of sequence-weighting techniques to reduce sequence redundancy and low-count corrections to account for small number of observations in limited size sequence families, can significantly improve the predictability of MI. The evaluation is made on large sets of both in silico-generated alignments as well as on biological sequence data. The methods included in the analysis are the APC (average product correction) and RCW (row-column weighting) methods. The best performing method was APC including sequence-weighting and low-count corrections. The use of sequence-permutations to calculate a MI rescaling is shown to significantly improve the prediction accuracy and allows for direct comparison of information values across protein families. Finally, we demonstrate how a lower bound of 400 sequences <62% identical is needed in an MSA in order to achieve meaningful predictive performances. With our contribution, we achieve a noteworthy improvement on the current procedures to determine coevolution and residue contacts, and we believe that this will have potential impacts on the understanding of protein structure, function and folding.
Collapse
Affiliation(s)
- Cristina Marino Buslje
- Department of Biological Chemistry and Institute of Biochemistry and Biophysics (IQUIFIB), School of Pharmacy and Biochemistry, University of Buenos Aires, Junín 956, 1113 Buenos Aires, Argentina.
| | | | | | | |
Collapse
|
41
|
Aurora R, Donlin MJ, Cannon NA, Tavis JE. Genome-wide hepatitis C virus amino acid covariance networks can predict response to antiviral therapy in humans. J Clin Invest 2008; 119:225-36. [PMID: 19104147 DOI: 10.1172/jci37085] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2008] [Accepted: 10/22/2008] [Indexed: 12/17/2022] Open
Abstract
Hepatitis C virus (HCV) is a common RNA virus that causes hepatitis and liver cancer. Infection is treated with IFN-alpha and ribavirin, but this expensive and physically demanding therapy fails in half of patients. The genomic sequences of independent HCV isolates differ by approximately 10%, but the effects of this variation on the response to therapy are unknown. To address this question, we analyzed amino acid covariance within the full viral coding region of pretherapy HCV sequences from 94 participants in the Viral Resistance to Antiviral Therapy of Chronic Hepatitis C (Virahep-C) clinical study. Covarying positions were common and linked together into networks that differed by response to therapy. There were 3-fold more hydrophobic amino acid pairs in HCV from nonresponding patients, and these hydrophobic interactions were predicted to contribute to failure of therapy by stabilizing viral protein complexes. Using our analysis to detect patterns within the networks, we could predict the outcome of therapy with greater than 95% coverage and 100% accuracy, raising the possibility of a prognostic test to reduce therapeutic failures. Furthermore, the hub positions in the networks are attractive antiviral targets because of their genetic linkage with many other positions that we predict would suppress evolution of resistant variants. Finally, covariance network analysis could be applicable to any virus with sufficient genetic variation, including most human RNA viruses.
Collapse
Affiliation(s)
- Rajeev Aurora
- Department of Molecular Microbiology and Immunology, Saint Louis University School of Medicine, St. Louis, MO 63104, USA.
| | | | | | | |
Collapse
|
42
|
Perez-Jimenez R, Wiita AP, Rodriguez-Larrea D, Kosuri P, Gavira JA, Sanchez-Ruiz JM, Fernandez JM. Force-clamp spectroscopy detects residue co-evolution in enzyme catalysis. J Biol Chem 2008; 283:27121-9. [PMID: 18687682 DOI: 10.1074/jbc.m803746200] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Understanding how the catalytic mechanisms of enzymes are optimized through evolution remains a major challenge in molecular biology. The concept of co-evolution implicates that compensatory mutations occur to preserve the structure and function of proteins. We have combined statistical analysis of protein sequences with the sensitivity of single molecule force-clamp spectroscopy to probe how catalysis is affected by structurally distant correlated mutations in Escherichia coli thioredoxin. Our findings show that evolutionary anti-correlated mutations have an inhibitory effect on enzyme catalysis, whereas positively correlated mutations rescue the catalytic activity. We interpret these results in terms of an evolutionary tuning of both the enzyme-substrate binding process and the chemistry of the active site. Our results constitute a direct observation of distant residue co-evolution in enzyme catalysis.
Collapse
Affiliation(s)
- Raul Perez-Jimenez
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | | | | | | | | | | | | |
Collapse
|