1
|
Algorithmically-guided discovery of viral epitopes via linguistic parsing: Problem formulation and solving by soft computing. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
2
|
Lee MS, Tuohy PJ, Kim CY, Lichauco K, Parrish HL, Van Doorslaer K, Kuhns MS. Enhancing and inhibitory motifs regulate CD4 activity. eLife 2022; 11:79508. [PMID: 35861317 PMCID: PMC9333989 DOI: 10.7554/elife.79508] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 07/20/2022] [Indexed: 11/15/2022] Open
Abstract
CD4+ T cells use T cell receptor (TCR)–CD3 complexes, and CD4, to respond to peptide antigens within MHCII molecules (pMHCII). We report here that, through ~435 million years of evolution in jawed vertebrates, purifying selection has shaped motifs in the extracellular, transmembrane, and intracellular domains of eutherian CD4 that enhance pMHCII responses, and covary with residues in an intracellular motif that inhibits responses. Importantly, while CD4 interactions with the Src kinase, Lck, are viewed as key to pMHCII responses, our data indicate that CD4–Lck interactions derive their importance from the counterbalancing activity of the inhibitory motif, as well as motifs that direct CD4–Lck pairs to specific membrane compartments. These results have implications for the evolution and function of complex transmembrane receptors and for biomimetic engineering.
Collapse
Affiliation(s)
- Mark S Lee
- Department of Immunobiology, University of Arizona College of Medicine, Tucson, United States
| | - Peter J Tuohy
- Department of Immunobiology, University of Arizona College of Medicine, Tucson, United States
| | - Caleb Y Kim
- Department of Immunobiology, University of Arizona College of Medicine, Tucson, United States
| | - Katrina Lichauco
- Department of Immunobiology, University of Arizona College of Medicine, Tucson, United States
| | - Heather L Parrish
- Department of Immunobiology, University of Arizona College of Medicine, Tucson, United States
| | - Koenraad Van Doorslaer
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, United States
| | - Michael S Kuhns
- Department of Immunobiology, University of Arizona College of Medicine, Tucson, United States
| |
Collapse
|
3
|
Gou X, Feng X, Shi H, Guo T, Xie R, Liu Y, Wang Q, Li H, Yang B, Chen L, Lu Y. PPVED: A machine learning tool for predicting the effect of single amino acid substitution on protein function in plants. PLANT BIOTECHNOLOGY JOURNAL 2022; 20:1417-1431. [PMID: 35398963 PMCID: PMC9241370 DOI: 10.1111/pbi.13823] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2021] [Accepted: 04/03/2022] [Indexed: 05/31/2023]
Abstract
Single amino acid substitution (SAAS) produces the most common variant of protein function change under physiological conditions. As the number of SAAS events in plants has increased exponentially, an effective prediction tool is required to help identify and distinguish functional SAASs from the whole genome as either potentially causal traits or as variants. Here, we constructed a plant SAAS database that stores 12 865 SAASs in 6172 proteins and developed a tool called Plant Protein Variation Effect Detector (PPVED) that predicts the effect of SAASs on protein function in plants. PPVED achieved an 87% predictive accuracy when applied to plant SAASs, an accuracy that was much higher than those from six human database software: SIFT, PROVEAN, PANTHER-PSEP, PhD-SNP, PolyPhen-2, and MutPred2. The predictive effect of six SAASs from three proteins in Arabidopsis and maize was validated with wet lab experiments, of which five substitution sites were accurately predicted. PPVED could facilitate the identification and characterization of genetic variants that explain observed phenotype variations in plants, contributing to solutions for challenges in functional genomics and systems biology. PPVED can be accessed under a CC-BY (4.0) license via http://www.ppved.org.cn.
Collapse
Affiliation(s)
- Xiangjian Gou
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Xuanjun Feng
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Haoran Shi
- Chengdu Academy of Agricultural and Forestry SciencesWenjiangSichuanChina
| | - Tingting Guo
- National Key Laboratory of Crop Genetic ImprovementHuazhong Agricultural UniversityWuhanHubeiChina
| | - Rongqian Xie
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Yaxi Liu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Triticeae Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| | - Qi Wang
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
| | - Hongxiang Li
- College of Information EngineeringSichuan Agricultural UniversityYa’anSichuanChina
| | - Banglie Yang
- College of Information EngineeringSichuan Agricultural UniversityYa’anSichuanChina
| | - Lixue Chen
- College of Information EngineeringSichuan Agricultural UniversityYa’anSichuanChina
| | - Yanli Lu
- State Key Laboratory of Crop Gene Exploration and Utilization in Southwest ChinaWenjiangSichuanChina
- Maize Research InstituteSichuan Agricultural UniversityWenjiangSichuanChina
| |
Collapse
|
4
|
Lara Ortiz MT, Martinell García V, Del Rio G. Saturation Mutagenesis of the Transmembrane Region of HokC in Escherichia coli Reveals Its High Tolerance to Mutations. Int J Mol Sci 2021; 22:ijms221910359. [PMID: 34638709 PMCID: PMC8509063 DOI: 10.3390/ijms221910359] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 09/20/2021] [Accepted: 09/22/2021] [Indexed: 11/16/2022] Open
Abstract
Cells adapt to different stress conditions, such as the antibiotics presence. This adaptation sometimes is achieved by changing relevant protein positions, of which the mutability is limited by structural constrains. Understanding the basis of these constrains represent an important challenge for both basic science and potential biotechnological applications. To study these constraints, we performed a systematic saturation mutagenesis of the transmembrane region of HokC, a toxin used by Escherichia coli to control its own population, and observed that 92% of single-point mutations are tolerated and that all the non-tolerated mutations have compensatory mutations that reverse their effect. We provide experimental evidence that HokC accumulates multiple compensatory mutations that are found as correlated mutations in the HokC family multiple sequence alignment. In agreement with these observations, transmembrane proteins show higher probability to present correlated mutations and are less densely packed locally than globular proteins; previous mutagenesis results on transmembrane proteins further support our observations on the high tolerability to mutations of transmembrane regions of proteins. Thus, our experimental results reveal the HokC transmembrane region high tolerance to loss-of-function mutations that is associated with low sequence conservation and high rate of correlated mutations in the HokC family sequences alignment, which are features shared with other transmembrane proteins.
Collapse
|
5
|
Morán-Torres R, Castillo González DA, Durán-Pastén ML, Aguilar-Maldonado B, Castro-Obregón S, Del Rio G. Selective Moonlighting Cell-Penetrating Peptides. Pharmaceutics 2021; 13:1119. [PMID: 34452080 PMCID: PMC8400200 DOI: 10.3390/pharmaceutics13081119] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/06/2021] [Accepted: 07/19/2021] [Indexed: 11/16/2022] Open
Abstract
Cell penetrating peptides (CPPs) are molecules capable of passing through biological membranes. This capacity has been used to deliver impermeable molecules into cells, such as drugs and DNA probes, among others. However, the internalization of these peptides lacks specificity: CPPs internalize indistinctly on different cell types. Two major approaches have been described to address this problem: (i) targeting, in which a receptor-recognizing sequence is added to a CPP, and (ii) activation, where a non-active form of the CPP is activated once it interacts with cell target components. These strategies result in multifunctional peptides (i.e., penetrate and target recognition) that increase the CPP's length, the cost of synthesis and the likelihood to be degraded or become antigenic. In this work we describe the use of machine-learning methods to design short selective CPP; the reduction in size is accomplished by embedding two or more activities within a single CPP domain, hence we referred to these as moonlighting CPPs. We provide experimental evidence that these designed moonlighting peptides penetrate selectively in targeted cells and discuss areas of opportunity to improve in the design of these peptides.
Collapse
Affiliation(s)
- Rafael Morán-Torres
- Department of Biochemistry and Structural Biology, Institute of Cellular Physiology, National Autonomous University of Mexico, UNAM, Mexico City 04510, Mexico; (R.M.-T.); (D.A.C.G.)
| | - David A. Castillo González
- Department of Biochemistry and Structural Biology, Institute of Cellular Physiology, National Autonomous University of Mexico, UNAM, Mexico City 04510, Mexico; (R.M.-T.); (D.A.C.G.)
| | - Maria Luisa Durán-Pastén
- Laboratorio Nacional de Canalopatias, National Autonomous University of Mexico, UNAM, Mexico City 04510, Mexico;
| | - Beatriz Aguilar-Maldonado
- Department of Neurodevelopment and Physiology, Institute of Cellular Physiology, National Autonomous University of Mexico, Mexico City 04510, Mexico; (B.A.-M.); (S.C.-O.)
| | - Susana Castro-Obregón
- Department of Neurodevelopment and Physiology, Institute of Cellular Physiology, National Autonomous University of Mexico, Mexico City 04510, Mexico; (B.A.-M.); (S.C.-O.)
| | - Gabriel Del Rio
- Department of Biochemistry and Structural Biology, Institute of Cellular Physiology, National Autonomous University of Mexico, UNAM, Mexico City 04510, Mexico; (R.M.-T.); (D.A.C.G.)
| |
Collapse
|
6
|
What's in a mass? Biochem Soc Trans 2021; 49:1027-1037. [PMID: 33929513 DOI: 10.1042/bst20210288] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 03/27/2021] [Accepted: 03/30/2021] [Indexed: 02/03/2023]
Abstract
This short essay pretends to make the reader reflect on the concept of biological mass and on the added value that the determination of this molecular property of a protein brings to the interpretation of evolutionary and translational snake venomics research. Starting from the premise that the amino acid sequence is the most distinctive primary molecular characteristics of any protein, the thesis underlying the first part of this essay is that the isotopic distribution of a protein's molecular mass serves to unambiguously differentiate it from any other of an organism's proteome. In the second part of the essay, we discuss examples of collaborative projects among our laboratories, where mass profiling of snake venom PLA2 across conspecific populations played a key role revealing dispersal routes that determined the current phylogeographic pattern of the species.
Collapse
|
7
|
Werner M, Gapsys V, de Groot BL. One Plus One Makes Three: Triangular Coupling of Correlated Amino Acid Mutations. J Phys Chem Lett 2021; 12:3195-3201. [PMID: 33760609 PMCID: PMC8041375 DOI: 10.1021/acs.jpclett.1c00380] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 03/17/2021] [Indexed: 06/12/2023]
Abstract
Correlated mutations have played a pivotal role in the recent success in protein fold prediction. Understanding nonadditive effects of mutations is crucial for altering protein structure, as mutations of multiple residues may change protein stability or binding affinity in a manner unforeseen by the investigation of single mutants. While the couplings between amino acids can be inferred from homologous protein sequences, the physical mechanisms underlying these correlations remain elusive. In this work we demonstrate that calculations based on the first-principles of statistical mechanics are capable of capturing the effects of nonadditivities in protein mutations. The identified thermodynamic couplings cover the short-range as well as previously unknown long-range correlations. We further explore a set of mutations in staphyloccocal nuclease to unravel an intricate interaction pathway underlying the correlations between amino acid mutations.
Collapse
Affiliation(s)
- Martin Werner
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Vytautas Gapsys
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| | - Bert L. de Groot
- Computational
Biomolecular Dynamics Group, Max-Planck
Institute for Biophysical Chemistry, Am Fassberg 11, 37077 Göttingen, Germany
| |
Collapse
|
8
|
Timonina D, Sharapova Y, Švedas V, Suplatov D. Bioinformatic analysis of subfamily-specific regions in 3D-structures of homologs to study functional diversity and conformational plasticity in protein superfamilies. Comput Struct Biotechnol J 2021; 19:1302-1311. [PMID: 33738079 PMCID: PMC7933735 DOI: 10.1016/j.csbj.2021.02.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 02/08/2021] [Accepted: 02/09/2021] [Indexed: 02/07/2023] Open
Abstract
Local 3D-structural differences in homologous proteins contribute to functional diversity observed in a superfamily, but so far received little attention as bioinformatic analysis was usually carried out at the level of amino acid sequences. We have developed Zebra3D - the first-of-its-kind bioinformatic software for systematic analysis of 3D-alignments of protein families using machine learning. The new tool identifies subfamily-specific regions (SSRs) - patterns of local 3D-structure (i.e. single residues, loops, or secondary structure fragments) that are spatially equivalent within families/subfamilies, but are different among them, and thus can be associated with functional diversity and function-related conformational plasticity. Bioinformatic analysis of protein superfamilies by Zebra3D can be used to study 3D-determinants of catalytic activity and specific accommodation of ligands, help to prepare focused libraries for directed evolution or assist development of chimeric enzymes with novel properties by exchange of equivalent regions between homologs, and to characterize plasticity in binding sites. A companion Mustguseal web-server is available to automatically construct a 3D-alignment of functionally diverse proteins, thus reducing the minimal input required to operate Zebra3D to a single PDB code. The Zebra3D + Mustguseal combined approach provides the opportunity to systematically explore the value of SSRs in superfamilies and to use this information for protein design and drug discovery. The software is available open-access at https://biokinet.belozersky.msu.ru/Zebra3D.
Collapse
Affiliation(s)
- Daria Timonina
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
| | - Yana Sharapova
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
| | - Vytas Švedas
- Lomonosov Moscow State University, Faculty of Bioengineering and Bioinformatics, Lenin Hills 1-73, Moscow 119234, Russia
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
| | - Dmitry Suplatov
- Lomonosov Moscow State University, Belozersky Institute of Physicochemical Biology, Lenin Hills 1-73, Moscow 119234, Russia
- Corresponding author.
| |
Collapse
|
9
|
Miller M, Vitale D, Kahn PC, Rost B, Bromberg Y. funtrp: identifying protein positions for variation driven functional tuning. Nucleic Acids Res 2020; 47:e142. [PMID: 31584091 PMCID: PMC6868392 DOI: 10.1093/nar/gkz818] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/05/2019] [Accepted: 09/12/2019] [Indexed: 12/12/2022] Open
Abstract
Evaluating the impact of non-synonymous genetic variants is essential for uncovering disease associations and mechanisms of evolution. An in-depth understanding of sequence changes is also fundamental for synthetic protein design and stability assessments. However, the variant effect predictor performance gain observed in recent years has not kept up with the increased complexity of new methods. One likely reason for this might be that most approaches use similar sets of gene and protein features for modeling variant effects, often emphasizing sequence conservation. While high levels of conservation highlight residues essential for protein activity, much of the variation observable in vivo is arguably weaker in its impact, thus requiring evaluation at a higher level of resolution. Here, we describe functionNeutral/Toggle/Rheostatpredictor (funtrp), a novel computational method that categorizes protein positions based on the position-specific expected range of mutational impacts: Neutral (weak/no effects), Rheostat (function-tuning positions), or Toggle (on/off switches). We show that position types do not correlate strongly with familiar protein features such as conservation or protein disorder. We also find that position type distribution varies across different protein functions. Finally, we demonstrate that position types can improve performance of existing variant effect predictors and suggest a way forward for the development of new ones.
Collapse
Affiliation(s)
- Maximilian Miller
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08901, USA
| | - Daniel Vitale
- Columbian College of Arts and Sciences Data Science Program Corcoran Hall, 725 21st Street NW, Washington, DC 20052, USA
| | - Peter C Kahn
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08901, USA
| | - Burkhard Rost
- Department for Bioinformatics and Computational Biology, Technische Universität München, Boltzmannstr. 3, 85748 Garching/Munich, Germany.,Institute for Advanced Study at Technische Universität München (TUM-IAS), Lichtenbergstraße 2a 85748 Garching/Munich, Germany
| | - Yana Bromberg
- Department of Biochemistry and Microbiology, Rutgers University, 76 Lipman Dr, New Brunswick, NJ 08901, USA.,Institute for Advanced Study at Technische Universität München (TUM-IAS), Lichtenbergstraße 2a 85748 Garching/Munich, Germany.,Department of Genetics, Rutgers University, Human Genetics Institute, Life Sciences Building, 145 Bevier Road, Piscataway, NJ 08854, USA
| |
Collapse
|
10
|
Abstract
A Monte Carlo simulation based sequence design method is proposed to explore the effect of correlated pair mutations in proteins. In the designed sequences, the most correlated residue pairs are identified and mutated with all possible amino acid pairs except those already present. The cumulative correlated pair mutations generated an array of mutated sequences. Results show a significant increase in the probability of misfolding for correlated pair mutations as compared to that of the random pair mutations. The pair mutations of correlated residues that are in contact record a higher probability of misfolding as compared to the correlated residues that are not in contact. The probability of misfolding increases on pair mutation of nonlocally correlated residue pairs as compared to that of the locally correlated residue pairs. The choice of a compact or expanded conformation does not depend on the type of correlated pair mutations. Pair mutation of the most correlated residue pairs at the surface with hydrophobic amino acids results in higher misfolding probability as compared to that in the core. An exactly opposite behavior is observed on pair mutation with hydrophilic and charged amino acid pairs. The neutral amino acid pairs do not differentiate between core and surface sites. This study may be used for targeted mutation experiments to predict complex mutation patterns, reengineer the existing proteins, and design new proteins with reduced misfolding propensity.
Collapse
Affiliation(s)
- Adesh Kumar
- Department of Chemistry , University of Delhi , Delhi 110007 , India
| | - Parbati Biswas
- Department of Chemistry , University of Delhi , Delhi 110007 , India
| |
Collapse
|
11
|
Subramanian K, Mitusińska K, Raedts J, Almourfi F, Joosten HJ, Hendriks S, Sedelnikova SE, Kengen SWM, Hagen WR, Góra A, Martins Dos Santos VAP, Baker PJ, van der Oost J, Schaap PJ. Distant Non-Obvious Mutations Influence the Activity of a Hyperthermophilic Pyrococcus furiosus Phosphoglucose Isomerase. Biomolecules 2019; 9:biom9060212. [PMID: 31159273 PMCID: PMC6627849 DOI: 10.3390/biom9060212] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 05/20/2019] [Accepted: 05/28/2019] [Indexed: 01/06/2023] Open
Abstract
The cupin-type phosphoglucose isomerase (PfPGI) from the hyperthermophilic archaeon Pyrococcus furiosus catalyzes the reversible isomerization of glucose-6-phosphate to fructose-6-phosphate. We investigated PfPGI using protein-engineering bioinformatics tools to select functionally-important residues based on correlated mutation analyses. A pair of amino acids in the periphery of PfPGI was found to be the dominant co-evolving mutation. The position of these selected residues was found to be non-obvious to conventional protein engineering methods. We designed a small smart library of variants by substituting the co-evolved pair and screened their biochemical activity, which revealed their functional relevance. Four mutants were further selected from the library for purification, measurement of their specific activity, crystal structure determination, and metal cofactor coordination analysis. Though the mutant structures and metal cofactor coordination were strikingly similar, variations in their activity correlated with their fine-tuned dynamics and solvent access regulation. Alternative, small smart libraries for enzyme optimization are suggested by our approach, which is able to identify non-obvious yet beneficial mutations.
Collapse
Affiliation(s)
- Kalyanasundaram Subramanian
- Laboratory of Systems and Synthetic Biology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| | - Karolina Mitusińska
- Biotechnology Center, Silesian University of Technology, ul. Krzywoustego 8, 44-100 Gliwice, Poland.
- Faculty of Chemistry, Silesian University of Technology, ul. Strzody 9, 44-100 Gliwice, Poland.
| | - John Raedts
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| | - Feras Almourfi
- Saudi Human Genome Project, National Center of Genome Technology, King Abdulaziz City for Science and Technology (KACST), Riyadh 11442, Saudi Arabia.
| | - Henk-Jan Joosten
- Bio-Prodict, Nieuwe Marktstraat 54E, 6511 AA Nijmegen, The Netherlands.
| | - Sjon Hendriks
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| | - Svetlana E Sedelnikova
- The Krebs Institute for Biomolecular Research, Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield S10 2TN, UK.
| | - Servé W M Kengen
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| | - Wilfred R Hagen
- Department of Biotechnology, Delft University of Technology, Van der Maasweg 9, 2629 HZ Delft, The Netherlands.
| | - Artur Góra
- Biotechnology Center, Silesian University of Technology, ul. Krzywoustego 8, 44-100 Gliwice, Poland.
| | - Vitor A P Martins Dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| | - Patrick J Baker
- The Krebs Institute for Biomolecular Research, Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield S10 2TN, UK.
| | - John van der Oost
- Laboratory of Microbiology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| | - Peter J Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Stippeneng 4, 6708 WE Wageningen, The Netherlands.
| |
Collapse
|
12
|
Suplatov DA, Kopylov KE, Popova NN, Voevodin VV, Švedas VK. Mustguseal: a server for multiple structure-guided sequence alignment of protein families. Bioinformatics 2019; 34:1583-1585. [PMID: 29309510 DOI: 10.1093/bioinformatics/btx831] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 12/21/2017] [Indexed: 01/23/2023] Open
Abstract
Motivation Comparative analysis of homologous proteins in a functionally diverse superfamily is a valuable tool at studying structure-function relationship, but represents a methodological challenge. Results The Mustguseal web-server can automatically build large structure-guided sequence alignments of functionally diverse protein families that include thousands of proteins basing on all available information about their structures and sequences in public databases. Superimposition of protein structures is implemented to compare evolutionarily distant relatives, whereas alignment of sequences is used to compare close homologues. The final alignment can be downloaded for a local use or operated on-line with the built-in interactive tools and further submitted to the integrated sister web-servers of Mustguseal to analyze conserved, subfamily-specific and co-evolving residues at studying a protein function and regulation, designing improved enzyme variants for practical applications and selective ligands to modulate functional properties of proteins. Availability and implementation Freely available on the web at https://biokinet.belozersky.msu.ru/mustguseal. Contact vytas@belozersky.msu.ru. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Nina N Popova
- Faculty of Computational Mathematics and Cybernetics
| | - Vladimir V Voevodin
- Faculty of Computational Mathematics and Cybernetics.,Research Computing Center of the Lomonosov Moscow State University, Moscow 119991, Russia
| | - Vytas K Švedas
- Belozersky Institute of Physicochemical Biology.,Faculty of Bioengineering and Bioinformatics
| |
Collapse
|
13
|
Quantifying correlations between mutational sites in the catalytic subunit of γ-secretase. J Mol Graph Model 2019; 88:221-227. [PMID: 30772652 DOI: 10.1016/j.jmgm.2019.02.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 01/17/2019] [Accepted: 02/01/2019] [Indexed: 11/22/2022]
Abstract
Presenilin 1 (PS1) is the catalytic subunit of the γ-secretase complex which is involved in the generation of amyloid-β peptides (Aβ). Single point mutations in PS1 alter the cleavage pattern of the amyloid precursor protein (APP) and lead to the formation of aberrant Aβ peptides. To date, more than two hundred mutations distributed among almost a third of PS1's amino acids have been associated to the development of Alzheimer's disease (AD). Nevertheless, the mechanism by which mutations far from the catalytic site alter the γ-secretase's cleavage pattern remains unclear. In this work we analyzed correlated motions between amino acids in the wild type (WT) enzyme and 13 γ-secretase mutant models employing a multi-scale molecular dynamics approach. The effect of the protonation state of key catalytic residue Asp385 on the correlation networks was also evaluated. We observed that the strength and number of correlations is highly influenced in all mutant models in both protonation state models. The biggest changes were observed in mutants I83T, W165G, H214Y and L435F; the latest has been proved to drastically reduce γ-secretase activity. Finally, we made a classification of the studied mutations according to their correlation networks with amino acids at: (1) the interfaces with the other γ-secretase components, (2) the catalytic site, (3) the substrate entry site and (4) the substrate recognition site. Overall, this work provides insight into the allosteric communication networks of PS1.
Collapse
|
14
|
Plekhanova E, Nuzhdin SV, Utkin LV, Samsonova MG. Prediction of deleterious mutations in coding regions of mammals with transfer learning. Evol Appl 2019; 12:18-28. [PMID: 30622632 PMCID: PMC6304693 DOI: 10.1111/eva.12607] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 01/16/2018] [Indexed: 12/31/2022] Open
Abstract
The genomes of mammals contain thousands of deleterious mutations. It is important to be able to recognize them with high precision. In conservation biology, the small size of fragmented populations results in accumulation of damaging variants. Preserving animals with less damaged genomes could optimize conservation efforts. In breeding of farm animals, trade-offs between farm performance versus general fitness might be better avoided if deleterious mutations are well classified. In humans, the problem of such a precise classification has been successfully solved, in large part due to large databases of disease-causing mutations. However, this kind of information is very limited for other mammals. Here, we propose to better use information available on human mutations to enable classification of damaging mutations in other mammalian species. Specifically, we apply transfer learning-machine learning methods-improving small dataset for solving a focal problem (recognizing damaging mutations in our companion and farm animals) due to the use of much large datasets available for solving a related problem (recognizing damaging mutations in humans). We validate our tools using mouse and dog annotated datasets and obtain significantly better results in companion to the SIFT classifier. Then, we apply them to predict deleterious mutations in cattle genomewide dataset.
Collapse
Affiliation(s)
- Elena Plekhanova
- Peter the Great St. Petersburg Polytechnic UniversitySt. PetersburgRussia
| | - Sergey V. Nuzhdin
- Peter the Great St. Petersburg Polytechnic UniversitySt. PetersburgRussia
- Program Molecular and Computation BiologyDornsife College of Letters, Arts, and SciencesUniversity of Southern CaliforniaLos AngelesCAUSA
| | - Lev V. Utkin
- Peter the Great St. Petersburg Polytechnic UniversitySt. PetersburgRussia
| | - Maria G. Samsonova
- Peter the Great St. Petersburg Polytechnic UniversitySt. PetersburgRussia
| |
Collapse
|
15
|
Abstract
The comparative study of homologous proteins can provide abundant information about the functional and structural constraints on protein evolution. For example, an amino acid substitution that is deleterious may become permissive in the presence of another substitution at a second site of the protein. A popular approach for detecting coevolving residues is by looking for correlated substitution events on branches of the molecular phylogeny relating the protein-coding sequences. Here we describe a machine learning method (Bayesian graphical models) implemented in the open-source phylogenetic software package HyPhy, http://hyphy.org , for extracting a network of coevolving residues from a sequence alignment.
Collapse
|
16
|
Perlaza-Jiménez L, Walther D. A genome-wide scan for correlated mutations detects macromolecular and chromatin interactions in Arabidopsis thaliana. Nucleic Acids Res 2018; 46:8114-8132. [PMID: 29986106 PMCID: PMC6144803 DOI: 10.1093/nar/gky576] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Accepted: 06/14/2018] [Indexed: 01/05/2023] Open
Abstract
The concept of exploiting correlated mutations has been introduced and applied successfully to identify interactions within and between biological macromolecules. Its rationale lies in the preservation of physical interactions via compensatory mutations. With the massive increase of available sequence information, approaches based on correlated mutations have regained considerable attention. We analyzed a set of 10 707 430 single nucleotide polymorphisms detected in 1135 accessions of the plant Arabidopsis thaliana. To measure their covariance and to reveal the global genome-wide sequence correlation structure of the Arabidopsis genome, the adjusted mutual information has been estimated for each possible pair of polymorphic sites. We developed a series of filtering steps to account for genetic linkage and lineage relations between Arabidopsis accessions, as well as transitive covariance as possible confounding factors. We show that upon appropriate filtering, correlated mutations prove indeed informative with regard to molecular interactions, and furthermore, appear to reflect on chromosomal interactions. Our study demonstrates that the concept of correlated mutations can also be applied successfully to within-species sequence variation and establishes a promising approach to help unravel the complex molecular interactions in A. thaliana and other species with broad sequence information.
Collapse
Affiliation(s)
- Laura Perlaza-Jiménez
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| | - Dirk Walther
- Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm, Germany
| |
Collapse
|
17
|
Filip R, Leluk J. Comparative studies on variability, phylogenesis, and correlated mutations of neuraminidases from influenza virus type A. BIO-ALGORITHMS AND MED-SYSTEMS 2018. [DOI: 10.1515/bams-2017-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Abstract
Neuraminidase (NA) is an important protein for the replication cycle of influenza A viruses. NA is an enzyme that cleaves the sialic acid receptors; this process plays a significant role in viral life cycle. Blocking NA with a specific inhibitor is an effective way to treat the flu. However, some strains show resistance to current drugs. Therefore, NA is the focus for the intense research for new antiviral drugs and also for the explanation of the functions of new mutations. This research focuses on determining the profile of variability and phylogenetic analysis and finding the correlated mutations within a set of 149 sequences of NA belonging to various strains of influenza A virus. In this study, we have used the original programs (Corm, Consensus Constructor, and SSSSg) and also other bioinformatics software. NA proteins are characterized by various levels of variability in different regions, which was presented in detail with the aid of ConSurf. The use of four independent methods to create the phylogenetic trees gave some new data on the evolutionary relationship within the NA family proteins. The search for correlated mutations shows several potentially important correlated positions that were not reported previously to be significant. The use of such an approach can be potentially important and gives new information regarding NA proteins of influenza A virus.
Collapse
|
18
|
Kovalev MS, Igolkina AA, Samsonova MG, Nuzhdin SV. A Pipeline for Classifying Deleterious Coding Mutations in Agricultural Plants. FRONTIERS IN PLANT SCIENCE 2018; 9:1734. [PMID: 30546376 PMCID: PMC6279870 DOI: 10.3389/fpls.2018.01734] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 11/08/2018] [Indexed: 05/18/2023]
Abstract
The impact of deleterious variation on both plant fitness and crop productivity is not completely understood and is a hot topic of debates. The deleterious mutations in plants have been solely predicted using sequence conservation methods rather than function-based classifiers due to lack of well-annotated mutational datasets in these organisms. Here, we developed a machine learning classifier based on a dataset of deleterious and neutral mutations in Arabidopsis thaliana by extracting 18 informative features that discriminate deleterious mutations from neutral, including 9 novel features not used in previous studies. We examined linear SVM, Gaussian SVM, and Random Forest classifiers, with the latter performing best. Random Forest classifiers exhibited a markedly higher accuracy than the popular PolyPhen-2 tool in the Arabidopsis dataset. Additionally, we tested whether the Random Forest, trained on the Arabidopsis dataset, accurately predicts deleterious mutations in Orýza sativa and Pisum sativum and observed satisfactory levels of performance accuracy (87% and 93%, respectively) higher than obtained by the PolyPhen-2. Application of Transfer learning in classifiers did not improve their performance. To additionally test the performance of the Random Forest classifier across different angiosperm species, we applied it to annotate deleterious mutations in Cicer arietinum and validated them using population frequency data. Overall, we devised a classifier with the potential to improve the annotation of putative functional mutations in QTL and GWAS hit regions, as well as for the evolutionary analysis of proliferation of deleterious mutations during plant domestication; thus optimizing breeding improvement and development of new cultivars.
Collapse
Affiliation(s)
- Maxim S. Kovalev
- Department of Applied Mathematics, Peter the Great St.Petersburg Polytechnic University, St. Petersburg, Russia
| | - Anna A. Igolkina
- Department of Applied Mathematics, Peter the Great St.Petersburg Polytechnic University, St. Petersburg, Russia
- *Correspondence: Anna A. Igolkina, Maria G. Samsonova,
| | - Maria G. Samsonova
- Department of Applied Mathematics, Peter the Great St.Petersburg Polytechnic University, St. Petersburg, Russia
- *Correspondence: Anna A. Igolkina, Maria G. Samsonova,
| | - Sergey V. Nuzhdin
- Department of Applied Mathematics, Peter the Great St.Petersburg Polytechnic University, St. Petersburg, Russia
- Program Molecular & Computational Biology, Dornsife College of Letters Arts and Science, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
19
|
van den Bergh T, Tamo G, Nobili A, Tao Y, Tan T, Bornscheuer UT, Kuipers RKP, Vroling B, de Jong RM, Subramanian K, Schaap PJ, Desmet T, Nidetzky B, Vriend G, Joosten HJ. CorNet: Assigning function to networks of co-evolving residues by automated literature mining. PLoS One 2017; 12:e0176427. [PMID: 28545124 PMCID: PMC5436653 DOI: 10.1371/journal.pone.0176427] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022] Open
Abstract
CorNet is a web-based tool for the analysis of co-evolving residue positions in protein super-family sequence alignments. CorNet projects external information such as mutation data extracted from literature on interactively displayed groups of co-evolving residue positions to shed light on the functions associated with these groups and the residues in them. We used CorNet to analyse six enzyme super-families and found that groups of strongly co-evolving residues tend to consist of residues involved in a same function such as activity, specificity, co-factor binding, or enantioselectivity. This finding allows to assign a function to residues for which no data is available yet in the literature. A mutant library was designed to mutate residues observed in a group of co-evolving residues predicted to be involved in enantioselectivity, but for which no literature data is available yet. The resulting set of mutations indeed showed many instances of increased enantioselectivity.
Collapse
Affiliation(s)
- Tom van den Bergh
- Bio-Prodict, Nijmegen, The Netherlands
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | | | - Alberto Nobili
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | - Yifeng Tao
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Tianwei Tan
- Beijing Key Lab of Bioprocess, Beijing University of Chemical Technology, Chaoyang, Beijing, China
| | - Uwe T. Bornscheuer
- Institute of Biochemistry, Department of Biotechnology & Enzyme Catalysis, Greifswald University, Greifswald, Germany
| | | | | | | | | | - Peter J. Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | - Tom Desmet
- Centre for Industrial Biotechnology and Biocatalysis, Ghent University, Ghent, Belgium
| | - Bernd Nidetzky
- Institute of Biotechnology and Biochemical Engineering, Graz University of Technology, Graz, Austria
| | | | - Henk-Jan Joosten
- Bio-Prodict, Nijmegen, The Netherlands
- CMBI, Radboudumc, Nijmegen, The Netherlands
- * E-mail:
| |
Collapse
|
20
|
Molecular Evolutionary Constraints that Determine the Avirulence State of Clostridium botulinum C2 Toxin. J Mol Evol 2017; 84:174-186. [DOI: 10.1007/s00239-017-9791-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Accepted: 03/30/2017] [Indexed: 10/19/2022]
|
21
|
Buchholz PCF, Vogel C, Reusch W, Pohl M, Rother D, Spieß AC, Pleiss J. BioCatNet: A Database System for the Integration of Enzyme Sequences and Biocatalytic Experiments. Chembiochem 2016; 17:2093-2098. [DOI: 10.1002/cbic.201600462] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Indexed: 12/12/2022]
Affiliation(s)
- Patrick C. F. Buchholz
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Constantin Vogel
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Waldemar Reusch
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| | - Martina Pohl
- IBG-1: Biotechnology; Forschungszentrum Jülich GmbH; 52425 Jülich Germany
| | - Dörte Rother
- IBG-1: Biotechnology; Forschungszentrum Jülich GmbH; 52425 Jülich Germany
| | - Antje C. Spieß
- Institute of Biochemical Engineering; Technical University of Braunschweig; Rebenring 56 38106 Braunschweig Germany
- RWTH Aachen University; AVT.EPT; Worringerweg 1 52074 Aachen Germany
| | - Jürgen Pleiss
- Institute of Technical Biochemistry; University of Stuttgart; Allmandring 31 70569 Stuttgart Germany
| |
Collapse
|
22
|
Riera C, Padilla N, de la Cruz X. The Complementarity Between Protein-Specific and General Pathogenicity Predictors for Amino Acid Substitutions. Hum Mutat 2016; 37:1013-24. [DOI: 10.1002/humu.23048] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 06/30/2016] [Accepted: 07/06/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Casandra Riera
- Research Unit in Translational Bioinformatics; Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona; Barcelona Spain
| | - Natàlia Padilla
- Research Unit in Translational Bioinformatics; Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona; Barcelona Spain
| | - Xavier de la Cruz
- Research Unit in Translational Bioinformatics; Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona; Barcelona Spain
- ICREA; Barcelona Spain
| |
Collapse
|
23
|
Structural constraints-based evaluation of immunogenic avirulent toxins from Clostridium botulinum C2 and C3 toxins as subunit vaccines. INFECTION GENETICS AND EVOLUTION 2016; 44:17-27. [PMID: 27320793 DOI: 10.1016/j.meegid.2016.06.029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 05/26/2016] [Accepted: 06/13/2016] [Indexed: 12/11/2022]
Abstract
Clostridium botulinum (group-III) is an anaerobic bacterium producing C2 and C3 toxins in addition to botulinum neurotoxins in avian and mammalian cells. C2 and C3 toxins are members of bacterial ADP-ribosyltransferase superfamily, which modify the eukaryotic cell surface proteins by ADP-ribosylation reaction. Herein, the mutant proteins with lack of catalytic and pore forming function derived from C2 (C2I and C2II) and C3 toxins were computationally evaluated to understand their structure-function integrity. We have chosen many structural constraints including local structural environment, folding process, backbone conformation, conformational dynamic sub-space, NAD-binding specificity and antigenic determinants for screening of suitable avirulent toxins. A total of 20 avirulent mutants were identified out of 23 mutants, which were experimentally produced by site-directed mutagenesis. No changes in secondary structural elements in particular to α-helices and β-sheets and also in fold rate of all-β classes. Structural stability was maintained by reordered hydrophobic and hydrogen bonding patterns. Molecular dynamic studies suggested that coupled mutations may restrain the binding affinity to NAD(+) or protein substrate upon structural destabilization. Avirulent toxins of this study have stable energetic backbone conformation with a common blue print of folding process. Molecular docking studies revealed that avirulent mutants formed more favorable hydrogen bonding with the side-chain of amino acids near to conserved NAD-binding core, despite of restraining NAD-binding specificity. Thus, structural constraints in the avirulent toxins would determine their immunogenic nature for the prioritization of protein-based subunit vaccine/immunogens to avian and veterinary animals infected with C. botulinum.
Collapse
|
24
|
Khwaja A, Galilee M, Marx A, Alian A. Structure of FIV capsid C-terminal domain demonstrates lentiviral evasion of genetic fragility by coevolved substitutions. Sci Rep 2016; 6:24957. [PMID: 27102180 PMCID: PMC4840305 DOI: 10.1038/srep24957] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 04/08/2016] [Indexed: 12/22/2022] Open
Abstract
Viruses use a strategy of high mutational rates to adapt to environmental and therapeutic pressures, circumventing the deleterious effects of random single-point mutations by coevolved compensatory mutations, which restore protein fold, function or interactions damaged by initial ones. This mechanism has been identified as contributing to drug resistance in the HIV-1 Gag polyprotein and especially its capsid proteolytic product, which forms the viral capsid core and plays multifaceted roles in the viral life cycle. Here, we determined the X-ray crystal structure of C-terminal domain of the feline immunodeficiency virus (FIV) capsid and through interspecies analysis elucidate the structural basis of co-evolutionarily and spatially correlated substitutions in capsid sequences, which when otherwise uncoupled and individually substituted into HIV-1 capsid impair virion assembly and infectivity. The ability to circumvent the deleterious effects of single amino acid substitutions by cooperative secondary substitutions allows mutational flexibility that may afford viruses an important survival advantage. The potential of such interspecies structural analysis for preempting viral resistance by identifying such alternative but functionally equivalent patterns is discussed.
Collapse
Affiliation(s)
- Aya Khwaja
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa 320003, Israel
| | - Meytal Galilee
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa 320003, Israel
| | - Ailie Marx
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa 320003, Israel
| | - Akram Alian
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa 320003, Israel
| |
Collapse
|
25
|
Jeong CS, Kim D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinformatics 2016; 17:99. [PMID: 26911566 PMCID: PMC4765150 DOI: 10.1186/s12859-016-0948-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 11/10/2022] Open
Abstract
Background Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. Results In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. Conclusions The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
26
|
Prathiviraj R, Prisilla A, Chellapandi P. Structure–function discrepancy inClostridium botulinumC3 toxin for its rational prioritization as a subunit vaccine. J Biomol Struct Dyn 2015; 34:1317-29. [DOI: 10.1080/07391102.2015.1078745] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
27
|
Abstract
Mutations in the GBA1 gene are associated with increased risk of Parkinson's disease, and the protein produced by the gene, glucocerebrosidase, interacts with α-synuclein, the protein at the center of the disease etiology. One possibility is that the mutations disrupt a beneficial interaction between the proteins, and a beneficial interaction would imply that the proteins have coevolved. To explore this possibility, a correlated mutation analysis has been performed for all 72 vertebrate species where complete sequences of α-synuclein and glucocerebrosidase are known. The most highly correlated pair of residue variations is α-synuclein A53T and glucocerebrosidase G115E. Intriguingly, the A53T mutation is a Parkinson's disease risk factor in humans, suggesting the pathology associated with this mutation and interaction with glucocerebrosidase might be connected. Correlations with β-synuclein are also evaluated. To assess the impact of lowered species number on accuracy, intra and inter-chain correlations are also calculated for hemoglobin, using mutual information Z-value and direct coupling analyses.
Collapse
Affiliation(s)
- James M. Gruschus
- Laboratory of Structural Biophysics, NHLBI, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
28
|
The Patterns of Coevolution in Clade B HIV Envelope's N-Glycosylation Sites. PLoS One 2015; 10:e0128664. [PMID: 26110648 PMCID: PMC4482261 DOI: 10.1371/journal.pone.0128664] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 04/29/2015] [Indexed: 11/19/2022] Open
Abstract
The co-evolution of the potential N-glycosylation sites of HIV Clade B gp120 was mapped onto the coevolution network of the protein structure using mean field direct coupling analysis (mfDCA). This was possible for 327 positions with suitable entropy and gap content. Indications of pressure to preserve the evolving glycan shield are seen as well as strong dependencies between the majority of the potential N-glycosylation sites and the rest of the structure. These findings indicate that although mainly an adaptation against antibody neutralization, the evolving glycan shield is structurally related to the core polypeptide, which, thus, is also under pressure to reflect the changes in the N-glycosylation. The map we propose fills the gap in previous attempts to tease out sequon evolution by providing a more general molecular context. Thus, it will help design strategies guiding HIV gp120 evolution in a rational way.
Collapse
|
29
|
Tse A, Verkhivker GM. Molecular Determinants Underlying Binding Specificities of the ABL Kinase Inhibitors: Combining Alanine Scanning of Binding Hot Spots with Network Analysis of Residue Interactions and Coevolution. PLoS One 2015; 10:e0130203. [PMID: 26075886 PMCID: PMC4468085 DOI: 10.1371/journal.pone.0130203] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 05/17/2015] [Indexed: 12/20/2022] Open
Abstract
Quantifying binding specificity and drug resistance of protein kinase inhibitors is of fundamental importance and remains highly challenging due to complex interplay of structural and thermodynamic factors. In this work, molecular simulations and computational alanine scanning are combined with the network-based approaches to characterize molecular determinants underlying binding specificities of the ABL kinase inhibitors. The proposed theoretical framework unveiled a relationship between ligand binding and inhibitor-mediated changes in the residue interaction networks. By using topological parameters, we have described the organization of the residue interaction networks and networks of coevolving residues in the ABL kinase structures. This analysis has shown that functionally critical regulatory residues can simultaneously embody strong coevolutionary signal and high network centrality with a propensity to be energetic hot spots for drug binding. We have found that selective (Nilotinib) and promiscuous (Bosutinib, Dasatinib) kinase inhibitors can use their energetic hot spots to differentially modulate stability of the residue interaction networks, thus inhibiting or promoting conformational equilibrium between inactive and active states. According to our results, Nilotinib binding may induce a significant network-bridging effect and enhance centrality of the hot spot residues that stabilize structural environment favored by the specific kinase form. In contrast, Bosutinib and Dasatinib can incur modest changes in the residue interaction network in which ligand binding is primarily coupled only with the identity of the gate-keeper residue. These factors may promote structural adaptability of the active kinase states in binding with these promiscuous inhibitors. Our results have related ligand-induced changes in the residue interaction networks with drug resistance effects, showing that network robustness may be compromised by targeted mutations of key mediating residues. This study has outlined mechanisms by which inhibitor binding could modulate resilience and efficiency of allosteric interactions in the kinase structures, while preserving structural topology required for catalytic activity and regulation.
Collapse
Affiliation(s)
- Amanda Tse
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
| | - Gennady M. Verkhivker
- Graduate Program in Computational and Data Sciences, Schmid College of Science and Technology, Chapman University, Orange, California, United States of America
- Chapman University School of Pharmacy, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
30
|
Suplatov D, Voevodin V, Švedas V. Robust enzyme design: bioinformatic tools for improved protein stability. Biotechnol J 2014; 10:344-55. [PMID: 25524647 DOI: 10.1002/biot.201400150] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2014] [Revised: 09/30/2014] [Accepted: 11/04/2014] [Indexed: 01/22/2023]
Abstract
The ability of proteins and enzymes to maintain a functionally active conformation under adverse environmental conditions is an important feature of biocatalysts, vaccines, and biopharmaceutical proteins. From an evolutionary perspective, robust stability of proteins improves their biological fitness and allows for further optimization. Viewed from an industrial perspective, enzyme stability is crucial for the practical application of enzymes under the required reaction conditions. In this review, we analyze bioinformatic-driven strategies that are used to predict structural changes that can be applied to wild type proteins in order to produce more stable variants. The most commonly employed techniques can be classified into stochastic approaches, empirical or systematic rational design strategies, and design of chimeric proteins. We conclude that bioinformatic analysis can be efficiently used to study large protein superfamilies systematically as well as to predict particular structural changes which increase enzyme stability. Evolution has created a diversity of protein properties that are encoded in genomic sequences and structural data. Bioinformatics has the power to uncover this evolutionary code and provide a reproducible selection of hotspots - key residues to be mutated in order to produce more stable and functionally diverse proteins and enzymes. Further development of systematic bioinformatic procedures is needed to organize and analyze sequences and structures of proteins within large superfamilies and to link them to function, as well as to provide knowledge-based predictions for experimental evaluation.
Collapse
Affiliation(s)
- Dmitry Suplatov
- Belozersky Institute of Physicochemical Biology and Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Moscow, Russia
| | | | | |
Collapse
|
31
|
Xie L, Ge X, Tan H, Xie L, Zhang Y, Hart T, Yang X, Bourne PE. Towards structural systems pharmacology to study complex diseases and personalized medicine. PLoS Comput Biol 2014; 10:e1003554. [PMID: 24830652 PMCID: PMC4022462 DOI: 10.1371/journal.pcbi.1003554] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Genome-Wide Association Studies (GWAS), whole genome sequencing, and high-throughput omics techniques have generated vast amounts of genotypic and molecular phenotypic data. However, these data have not yet been fully explored to improve the effectiveness and efficiency of drug discovery, which continues along a one-drug-one-target-one-disease paradigm. As a partial consequence, both the cost to launch a new drug and the attrition rate are increasing. Systems pharmacology and pharmacogenomics are emerging to exploit the available data and potentially reverse this trend, but, as we argue here, more is needed. To understand the impact of genetic, epigenetic, and environmental factors on drug action, we must study the structural energetics and dynamics of molecular interactions in the context of the whole human genome and interactome. Such an approach requires an integrative modeling framework for drug action that leverages advances in data-driven statistical modeling and mechanism-based multiscale modeling and transforms heterogeneous data from GWAS, high-throughput sequencing, structural genomics, functional genomics, and chemical genomics into unified knowledge. This is not a small task, but, as reviewed here, progress is being made towards the final goal of personalized medicines for the treatment of complex diseases.
Collapse
Affiliation(s)
- Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
- Ph.D. Program in Computer Science, Biology, and Biochemistry, The Graduate Center, The City University of New York, New York, New York, United States of America
- * E-mail:
| | - Xiaoxia Ge
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Hepan Tan
- Department of Computer Science, Hunter College, The City University of New York, New York, New York, United States of America
| | - Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Yinliang Zhang
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| | - Thomas Hart
- Department of Biological Sciences, Hunter College, The City University of New York, New York, New York, United States of America
| | - Xiaowei Yang
- School of Public Health, Hunter College, The City University of New York, New York, New York, United States of America
| | - Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California, United States of America
| |
Collapse
|
32
|
Yourshaw M, Taylor SP, Rao AR, Martín MG, Nelson SF. Rich annotation of DNA sequencing variants by leveraging the Ensembl Variant Effect Predictor with plugins. Brief Bioinform 2014; 16:255-64. [PMID: 24626529 DOI: 10.1093/bib/bbu008] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
High-throughput DNA sequencing has become a mainstay for the discovery of genomic variants that may cause disease or affect phenotype. A next-generation sequencing pipeline typically identifies thousands of variants in each sample. A particular challenge is the annotation of each variant in a way that is useful to downstream consumers of the data, such as clinical sequencing centers or researchers. These users may require that all data storage and analysis remain on secure local servers to protect patient confidentiality or intellectual property, may have unique and changing needs to draw on a variety of annotation data sets and may prefer not to rely on closed-source applications beyond their control. Here we describe scalable methods for using the plugin capability of the Ensembl Variant Effect Predictor to enrich its basic set of variant annotations with additional data on genes, function, conservation, expression, diseases, pathways and protein structure, and describe an extensible framework for easily adding additional custom data sets.
Collapse
|
33
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
34
|
Riera C, Lois S, de la Cruz X. Prediction of pathological mutations in proteins: the challenge of integrating sequence conservation and structure stability principles. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1170] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Casandra Riera
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Sergio Lois
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
| | - Xavier de la Cruz
- Laboratory of Translational Bioinformatics in Neuroscience; VHIR; Barcelona Spain
- Institució Catalana per la Recerca i Estudis Avançats (ICREA); Barcelona Spain
| |
Collapse
|
35
|
Hecht M, Bromberg Y, Rost B. News from the protein mutability landscape. J Mol Biol 2013; 425:3937-48. [PMID: 23896297 DOI: 10.1016/j.jmb.2013.07.028] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 07/08/2013] [Accepted: 07/19/2013] [Indexed: 12/16/2022]
Abstract
Some mutations of protein residues matter more than others, and these are often conserved evolutionarily. The explosion of deep sequencing and genotyping increasingly requires the distinction between effect and neutral variants. The simplest approach predicts all mutations of conserved residues to have an effect; however, this works poorly, at best. Many computational tools that are optimized to predict the impact of point mutations provide more detail. Here, we expand the perspective from the view of single variants to the level of sketching the entire mutability landscape. This landscape is defined by the impact of substituting every residue at each position in a protein by each of the 19 non-native amino acids. We review some of the powerful conclusions about protein function, stability and their robustness to mutation that can be drawn from such an analysis. Large-scale experimental and computational mutagenesis experiments are increasingly furthering our understanding of protein function and of the genotype-phenotype associations. We also discuss how these can be used to improve predictions of protein function and pathogenicity of missense variants.
Collapse
Affiliation(s)
- Maximilian Hecht
- Department of Bioinformatics and Computational Biology I12, Technische Universität München, Boltzmannstrasse 3, 85748 Garching, Germany.
| | | | | |
Collapse
|
36
|
Xie L, Ng C, Ali T, Valencia R, Ferreira BL, Xue V, Tanweer M, Zhou D, Haddad GG, Bourne PE, Xie L. Multiscale modeling of the causal functional roles of nsSNPs in a genome-wide association study: application to hypoxia. BMC Genomics 2013; 14 Suppl 3:S9. [PMID: 23819581 PMCID: PMC3665574 DOI: 10.1186/1471-2164-14-s3-s9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions. RESULTS To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs--from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies. CONCLUSIONS Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.
Collapse
Affiliation(s)
- Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
37
|
Nguyen H, Luu TD, Poch O, Thompson JD. Knowledge discovery in variant databases using inductive logic programming. Bioinform Biol Insights 2013; 7:119-31. [PMID: 23589683 PMCID: PMC3615990 DOI: 10.4137/bbi.s11184] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Understanding the effects of genetic variation on the phenotype of an individual is a major goal of biomedical research, especially for the development of diagnostics and effective therapeutic solutions. In this work, we describe the use of a recent knowledge discovery from database (KDD) approach using inductive logic programming (ILP) to automatically extract knowledge about human monogenic diseases. We extracted background knowledge from MSV3d, a database of all human missense variants mapped to 3D protein structure. In this study, we identified 8,117 mutations in 805 proteins with known three-dimensional structures that were known to be involved in human monogenic disease. Our results help to improve our understanding of the relationships between structural, functional or evolutionary features and deleterious mutations. Our inferred rules can also be applied to predict the impact of any single amino acid replacement on the function of a protein. The interpretable rules are available at http://decrypthon.igbmc.fr/kd4v/.
Collapse
Affiliation(s)
- Hoan Nguyen
- Laboratoire de Bioinformatique et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire Illkirch, France
| | | | | | | |
Collapse
|
38
|
Kolaczkowski M, Sroda-Pomianek K, Kolaczkowska A, Michalak K. A conserved interdomain communication pathway of pseudosymmetrically distributed residues affects substrate specificity of the fungal multidrug transporter Cdr1p. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2012; 1828:479-90. [PMID: 23122779 DOI: 10.1016/j.bbamem.2012.10.024] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Revised: 09/19/2012] [Accepted: 10/21/2012] [Indexed: 11/19/2022]
Abstract
Understanding the communication pathways between remote sites in proteins is of key importance for understanding their function and mechanism of action. These remain largely unexplored among the pleiotropic drug resistance (PDR) representatives of the ubiquitous superfamily of ATP-binding cassette (ABC) transporters. To identify functionally coupled residues important for the polyspecific transport by the fungal ABC multidrug transporter Cdr1p a new selection strategy, towards increased resistance to a preferred substrate of the homologous Snq2p, was applied to a library of randomly generated mutants. The single amino acid substitutions, located pseudosymmetrically in each domain of the internally duplicated protein: the H-loop of the N-terminal nucleotide binding domain (NBD1) (C363R) and in the C-terminal NBD2 region preceding Walker A (V885G). The central regions of the first transmembrane helices 1 and 7 of both transmembrane domains were also affected by the G521S/D and A1208V substitutions respectively. Although the mutants were expressed at a similar level and located correctly to the plasma membrane, they selectively affected transport of multiple drugs, including azole antifungals. The synergistic effects of combined mutations on drug resistance, drug dependent ATPase activity and transport support the view inferred from the statistical coupling analysis (SCA) of aminoacid coevolution and mutational analysis of other ABC transporter families that these residues are an important part of the conserved, allosterically coupled interdomain communication network. Our results shed new light on the communication between the pseudosymmetrically arranged domains in a fungal PDR ABC transporter and reveal its profound influence on substrate specificity.
Collapse
Affiliation(s)
- Marcin Kolaczkowski
- Department of Biophysics, Wroclaw Medical University, PL-50-368 Wroclaw, Poland.
| | | | | | | |
Collapse
|
39
|
Steiner K, Schwab H. Recent advances in rational approaches for enzyme engineering. Comput Struct Biotechnol J 2012; 2:e201209010. [PMID: 24688651 PMCID: PMC3962183 DOI: 10.5936/csbj.201209010] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 10/16/2012] [Accepted: 10/18/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are an attractive alternative in the asymmetric syntheses of chiral building blocks. To meet the requirements of industrial biotechnology and to introduce new functionalities, the enzymes need to be optimized by protein engineering. This article specifically reviews rational approaches for enzyme engineering and de novo enzyme design involving structure-based approaches developed in recent years for improvement of the enzymes’ performance, broadened substrate range, and creation of novel functionalities to obtain products with high added value for industrial applications.
Collapse
Affiliation(s)
- Kerstin Steiner
- ACIB GmbH, (Austrian Centre of Industrial Biotechnology), c/o TU Graz, 8010 Graz, Austria
| | - Helmut Schwab
- ACIB GmbH, (Austrian Centre of Industrial Biotechnology), c/o TU Graz, 8010 Graz, Austria ; Institute of Molecular Biotechnology, TU Graz, 8010 Graz, Austria
| |
Collapse
|
40
|
Jeong CS, Kim D. Reliable and robust detection of coevolving protein residues†. Protein Eng Des Sel 2012; 25:705-13. [DOI: 10.1093/protein/gzs081] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
41
|
FunSAV: predicting the functional effect of single amino acid variants using a two-stage random forest model. PLoS One 2012; 7:e43847. [PMID: 22937107 PMCID: PMC3427247 DOI: 10.1371/journal.pone.0043847] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2012] [Accepted: 07/26/2012] [Indexed: 11/26/2022] Open
Abstract
Single amino acid variants (SAVs) are the most abundant form of known genetic variations associated with human disease. Successful prediction of the functional impact of SAVs from sequences can thus lead to an improved understanding of the underlying mechanisms of why a SAV may be associated with certain disease. In this work, we constructed a high-quality structural dataset that contained 679 high-quality protein structures with 2,048 SAVs by collecting the human genetic variant data from multiple resources and dividing them into two categories, i.e., disease-associated and neutral variants. We built a two-stage random forest (RF) model, termed as FunSAV, to predict the functional effect of SAVs by combining sequence, structure and residue-contact network features with other additional features that were not explored in previous studies. Importantly, a two-step feature selection procedure was proposed to select the most important and informative features that contribute to the prediction of disease association of SAVs. In cross-validation experiments on the benchmark dataset, FunSAV achieved a good prediction performance with the area under the curve (AUC) of 0.882, which is competitive with and in some cases better than other existing tools including SIFT, SNAP, Polyphen2, PANTHER, nsSNPAnalyzer and PhD-SNP. The sourcecodes of FunSAV and the datasets can be downloaded at http://sunflower.kuicr.kyoto-u.ac.jp/sjn/FunSAV.
Collapse
|
42
|
Kalinina OV, Oberwinkler H, Glass B, Kräusslich HG, Russell RB, Briggs JAG. Computational identification of novel amino-acid interactions in HIV Gag via correlated evolution. PLoS One 2012; 7:e42468. [PMID: 22879995 PMCID: PMC3411748 DOI: 10.1371/journal.pone.0042468] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 07/09/2012] [Indexed: 12/31/2022] Open
Abstract
Pairs of amino acid positions that evolve in a correlated manner are proposed to play important roles in protein structure or function. Methods to detect them might fare better with families for which sequences of thousands of closely related homologs are available than families with only a few distant relatives. We applied co-evolution analysis to thousands of sequences of HIV Gag, finding that the most significantly co-evolving positions are proximal in the quaternary structures of the viral capsid. A reduction in infectivity caused by mutating one member of a significant pair could be rescued by a compensatory mutation of the other.
Collapse
Affiliation(s)
- Olga V. Kalinina
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
| | - Heike Oberwinkler
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Bärbel Glass
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Hans-Georg Kräusslich
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
- Department of Infectious Diseases, Virology, Universitätsklinikum Heidelberg, Heidelberg, Germany
| | - Robert B. Russell
- CellNetworks, Bioquant, University of Heidelberg, Heidelberg, Germany
| | - John A. G. Briggs
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| |
Collapse
|
43
|
Aguilar D, Oliva B, Marino Buslje C. Mapping the mutual information network of enzymatic families in the protein structure to unveil functional features. PLoS One 2012; 7:e41430. [PMID: 22848494 PMCID: PMC3405127 DOI: 10.1371/journal.pone.0041430] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 06/26/2012] [Indexed: 11/24/2022] Open
Abstract
Amino acids committed to a particular function correlate tightly along evolution and tend to form clusters in the 3D structure of the protein. Consequently, a protein can be seen as a network of co-evolving clusters of residues. The goal of this work is two-fold: first, we have combined mutual information and structural data to describe the amino acid networks within a protein and their interactions. Second, we have investigated how this information can be used to improve methods of prediction of functional residues by reducing the search space. As a main result, we found that clusters of co-evolving residues related to the catalytic site of an enzyme have distinguishable topological properties in the network. We also observed that these clusters usually evolve independently, which could be related to a fail-safe mechanism. Finally, we discovered a significant enrichment of functional residues (e.g. metal binding, susceptibility to detrimental mutations) in the clusters, which could be the foundation of new prediction tools.
Collapse
Affiliation(s)
- Daniel Aguilar
- Structural Bioinformatics Group, Departament de Ciencies Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona Biomedical Research Park, Barcelona, Spain.
| | | | | |
Collapse
|
44
|
Han L, Zhang YJ, Song J, Liu MS, Zhang Z. Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues. PLoS One 2012; 7:e41370. [PMID: 22829945 PMCID: PMC3400608 DOI: 10.1371/journal.pone.0041370] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2012] [Accepted: 06/20/2012] [Indexed: 11/18/2022] Open
Abstract
Enzymes play a fundamental role in almost all biological processes and identification of catalytic residues is a crucial step for deciphering the biological functions and understanding the underlying catalytic mechanisms. In this work, we developed a novel structural feature called MEDscore to identify catalytic residues, which integrated the microenvironment (ME) and geometrical properties of amino acid residues. Firstly, we converted a residue's ME into a series of spatially neighboring residue pairs, whose likelihood of being located in a catalytic ME was deduced from a benchmark enzyme dataset. We then calculated an ME-based score, termed as MEscore, by summing up the likelihood of all residue pairs. Secondly, we defined a parameter called Dscore to measure the relative distance of a residue to the center of the protein, provided that catalytic residues are typically located in the center of the protein structure. Finally, we defined the MEDscore feature based on an effective nonlinear integration of MEscore and Dscore. When evaluated on a well-prepared benchmark dataset using five-fold cross-validation tests, MEDscore achieved a robust performance in identifying catalytic residues with an AUC1.0 of 0.889. At a ≤ 10% false positive rate control, MEDscore correctly identified approximately 70% of the catalytic residues. Remarkably, MEDscore achieved a competitive performance compared with the residue conservation score (e.g. CONscore), the most informative singular feature predominantly employed to identify catalytic residues. To the best of our knowledge, MEDscore is the first singular structural feature exhibiting such an advantage. More importantly, we found that MEDscore is complementary with CONscore and a significantly improved performance can be achieved by combining CONscore with MEDscore in a linear manner. As an implementation of this work, MEDscore has been made freely accessible at http://protein.cau.edu.cn/mepi/.
Collapse
Affiliation(s)
- Lei Han
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
| | - Yong-Jun Zhang
- State Key Laboratory for Biology of Plant Diseases and Insect Pests, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing, People's Republic of China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, People's Republic of China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Ming S. Liu
- CSIRO - Mathematics, Informatics and Statistics, Clayton, Victoria, Australia
- * E-mail: (MSL); (ZZ)
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, People's Republic of China
- * E-mail: (MSL); (ZZ)
| |
Collapse
|
45
|
Dietrich S, Borst N, Schlee S, Schneider D, Janda JO, Sterner R, Merkl R. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments. Biochemistry 2012; 51:5633-41. [PMID: 22737967 DOI: 10.1021/bi300747r] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an increased probability of being important for enzyme function or stability.
Collapse
Affiliation(s)
- Susanne Dietrich
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany
| | | | | | | | | | | | | |
Collapse
|
46
|
Park K, Kim D. Structure-based rebuilding of coevolutionary information reveals functional modules in rhodopsin structure. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2012; 1824:1484-9. [PMID: 22684088 DOI: 10.1016/j.bbapap.2012.05.015] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2012] [Revised: 05/01/2012] [Accepted: 05/31/2012] [Indexed: 10/28/2022]
Abstract
Correlated mutation analysis (CMA) has been used to investigate protein functional sites. However, CMA has suffered from low signal-to-noise ratio caused by meaningless phylogenetic signals or structural constraints. We present a new method, Structure-based Correlated Mutation Analysis (SCMA), which encodes coevolution scores into a protein structure network. A path-based network model is adapted to describe information transfer between residues, and the statistical significance is estimated by network shuffling. This model intrinsically assumes that residues in physical contact have a more reliable coevolution score than distant residues, and that coevolution in distant residues likely arises from a series of contacting and coevolving residues. In addition, coevolutionary coupling is statistically controlled to remove the structural effects. When applied to the rhodopsin structure, the SCMA method identified a much higher percentage of functional residues than the typical coevolution score (61% vs. 22%). In addition, statistically significant residues are used to construct the coevolved residue-residue subnetwork. The network has one highly connected node (retinal bound Lys296), indicating that Lys296 can induce and regulate most other coevolved residues in a variety of locations. The coevolved network consists of a few modular clusters which have distinct functional roles. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.
Collapse
Affiliation(s)
- Keunwan Park
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea.
| | | |
Collapse
|
47
|
Stothard P, Choi JW, Basu U, Sumner-Thomson JM, Meng Y, Liao X, Moore SS. Whole genome resequencing of black Angus and Holstein cattle for SNP and CNV discovery. BMC Genomics 2011; 12:559. [PMID: 22085807 PMCID: PMC3229636 DOI: 10.1186/1471-2164-12-559] [Citation(s) in RCA: 123] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2011] [Accepted: 11/15/2011] [Indexed: 01/27/2023] Open
Abstract
Background One of the goals of livestock genomics research is to identify the genetic differences responsible for variation in phenotypic traits, particularly those of economic importance. Characterizing the genetic variation in livestock species is an important step towards linking genes or genomic regions with phenotypes. The completion of the bovine genome sequence and recent advances in DNA sequencing technology allow for in-depth characterization of the genetic variations present in cattle. Here we describe the whole-genome resequencing of two Bos taurus bulls from distinct breeds for the purpose of identifying and annotating novel forms of genetic variation in cattle. Results The genomes of a Black Angus bull and a Holstein bull were sequenced to 22-fold and 19-fold coverage, respectively, using the ABI SOLiD system. Comparisons of the sequences with the Btau4.0 reference assembly yielded 7 million single nucleotide polymorphisms (SNPs), 24% of which were identified in both animals. Of the total SNPs found in Holstein, Black Angus, and in both animals, 81%, 81%, and 75% respectively are novel. In-depth annotations of the data identified more than 16 thousand distinct non-synonymous SNPs (85% novel) between the two datasets. Alignments between the SNP-altered proteins and orthologues from numerous species indicate that many of the SNPs alter well-conserved amino acids. Several SNPs predicted to create or remove stop codons were also found. A comparison between the sequencing SNPs and genotyping results from the BovineHD high-density genotyping chip indicates a detection rate of 91% for homozygous SNPs and 81% for heterozygous SNPs. The false positive rate is estimated to be about 2% for both the Black Angus and Holstein SNP sets, based on follow-up genotyping of 422 and 427 SNPs, respectively. Comparisons of read depth between the two bulls along the reference assembly identified 790 putative copy-number variations (CNVs). Ten randomly selected CNVs, five genic and five non-genic, were successfully validated using quantitative real-time PCR. The CNVs are enriched for immune system genes and include genes that may contribute to lactation capacity. The majority of the CNVs (69%) were detected as regions with higher abundance in the Holstein bull. Conclusions Substantial genetic differences exist between the Black Angus and Holstein animals sequenced in this work and the Hereford reference sequence, and some of this variation is predicted to affect evolutionarily conserved amino acids or gene copy number. The deeply annotated SNPs and CNVs identified in this resequencing study can serve as useful genetic tools, and as candidates in searches for phenotype-altering DNA differences.
Collapse
Affiliation(s)
- Paul Stothard
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2P5, Canada
| | | | | | | | | | | | | |
Collapse
|
48
|
Ackerman SH, Gatti DL. The contribution of coevolving residues to the stability of KDO8P synthase. PLoS One 2011; 6:e17459. [PMID: 21408011 PMCID: PMC3052366 DOI: 10.1371/journal.pone.0017459] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2010] [Accepted: 02/03/2011] [Indexed: 12/03/2022] Open
Abstract
Background The evolutionary tree of 3-deoxy-D-manno-octulosonate 8-phosphate (KDO8P) synthase (KDO8PS), a bacterial enzyme that catalyzes a key step in the biosynthesis of bacterial endotoxin, is evenly divided between metal and non-metal forms, both having similar structures, but diverging in various degrees in amino acid sequence. Mutagenesis, crystallographic and computational studies have established that only a few residues determine whether or not KDO8PS requires a metal for function. The remaining divergence in the amino acid sequence of KDO8PSs is apparently unrelated to the underlying catalytic mechanism. Methodology/Principal Findings The multiple alignment of all known KDO8PS sequences reveals that several residue pairs coevolved, an indication of their possible linkage to a structural constraint. In this study we investigated by computational means the contribution of coevolving residues to the stability of KDO8PS. We found that about 1/4 of all strongly coevolving pairs probably originated from cycles of mutation (decreasing stability) and suppression (restoring it), while the remaining pairs are best explained by a succession of neutral or nearly neutral covarions. Conclusions/Significance Both sequence conservation and coevolution are involved in the preservation of the core structure of KDO8PS, but the contribution of coevolving residues is, in proportion, smaller. This is because small stability gains or losses associated with selection of certain residues in some regions of the stability landscape of KDO8PS are easily offset by a large number of possible changes in other regions. While this effect increases the tolerance of KDO8PS to deleterious mutations, it also decreases the probability that specific pairs of residues could have a strong contribution to the thermodynamic stability of the protein.
Collapse
Affiliation(s)
- Sharon H. Ackerman
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
| | - Domenico L. Gatti
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- Cardiovascular Research Institute, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- * E-mail:
| |
Collapse
|
49
|
Abstract
Domain Interaction MAp (DIMA, available at http://webclu.bio.wzw.tum.de/dima) is a database of predicted and known interactions between protein domains. It integrates 5807 structurally known interactions imported from the iPfam and 3did databases and 46 900 domain interactions predicted by four computational methods: domain phylogenetic profiling, domain pair exclusion algorithm correlated mutations and domain interaction prediction in a discriminative way. Additionally predictions are filtered to exclude those domain pairs that are reported as non-interacting by the Negatome database. The DIMA Web site allows to calculate domain interaction networks either for a domain of interest or for entire organisms, and to explore them interactively using the Flash-based Cytoscape Web software.
Collapse
Affiliation(s)
- Qibin Luo
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | | | | | | |
Collapse
|