1
|
Galpern EA, Jaafari H, Bueno C, Wolynes PG, Ferreiro DU. Reassessing the exon-foldon correspondence using frustration analysis. Proc Natl Acad Sci U S A 2024; 121:e2400151121. [PMID: 38954548 PMCID: PMC11252736 DOI: 10.1073/pnas.2400151121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/31/2024] [Indexed: 07/04/2024] Open
Abstract
Protein folding and evolution are intimately linked phenomena. Here, we revisit the concept of exons as potential protein folding modules across a set of 38 abundant and conserved protein families. Taking advantage of genomic exon-intron organization and extensive protein sequence data, we explore exon boundary conservation and assess the foldon-like behavior of exons using energy landscape theoretic measurements. We found deviations in the exon size distribution from exponential decay indicating selection in evolution. We show that when taken together there is a pronounced tendency to independent foldability for segments corresponding to the more conserved exons, supporting the idea of exon-foldon correspondence. While 45% of the families follow this general trend when analyzed individually, there are some families for which other stronger functional determinants, such as preserving frustrated active sites, may be acting. We further develop a systematic partitioning of protein domains using exon boundary hotspots, showing that minimal common exons correspond with uninterrupted alpha and/or beta elements for the majority of the families but not for all of them.
Collapse
Affiliation(s)
- Ezequiel A. Galpern
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos AiresC1428EGA, Argentina
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Consejo Nacional de Investigaciones Cientificas y Tecnicas - Universidad de Buenos Aires, Buenos AiresC1428EGA, Argentina
| | - Hana Jaafari
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Applied Physics Graduate Program, Smalley-Curl Institute, Rice University, Houston, TX77005
| | - Carlos Bueno
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
| | - Peter G. Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of Physics, Rice University, Houston, TX77005
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos AiresC1428EGA, Argentina
- Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales, Consejo Nacional de Investigaciones Cientificas y Tecnicas - Universidad de Buenos Aires, Buenos AiresC1428EGA, Argentina
| |
Collapse
|
2
|
Zhao K, Zhao P, Wang S, Xia Y, Zhang G. FoldPAthreader: predicting protein folding pathway using a novel folding force field model derived from known protein universe. Genome Biol 2024; 25:152. [PMID: 38862984 PMCID: PMC11167914 DOI: 10.1186/s13059-024-03291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
Protein folding has become a tractable problem with the significant advances in deep learning-driven protein structure prediction. Here we propose FoldPAthreader, a protein folding pathway prediction method that uses a novel folding force field model by exploring the intrinsic relationship between protein evolution and folding from the known protein universe. Further, the folding force field is used to guide Monte Carlo conformational sampling, driving the protein chain fold into its native state by exploring potential intermediates. On 30 example targets, FoldPAthreader successfully predicts 70% of the proteins whose folding pathway is consistent with biological experimental data.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Pengxin Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Suhui Wang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
3
|
Huang Z, Cui X, Xia Y, Zhao K, Zhang G. Pathfinder: Protein folding pathway prediction based on conformational sampling. PLoS Comput Biol 2023; 19:e1011438. [PMID: 37695768 PMCID: PMC10513300 DOI: 10.1371/journal.pcbi.1011438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 09/21/2023] [Accepted: 08/17/2023] [Indexed: 09/13/2023] Open
Abstract
The study of protein folding mechanism is a challenge in molecular biology, which is of great significance for revealing the movement rules of biological macromolecules, understanding the pathogenic mechanism of folding diseases, and designing protein engineering materials. Based on the hypothesis that the conformational sampling trajectory contain the information of folding pathway, we propose a protein folding pathway prediction algorithm named Pathfinder. Firstly, Pathfinder performs large-scale sampling of the conformational space and clusters the decoys obtained in the sampling. The heterogeneous conformations obtained by clustering are named seed states. Then, a resampling algorithm that is not constrained by the local energy basin is designed to obtain the transition probabilities of seed states. Finally, protein folding pathways are inferred from the maximum transition probabilities of seed states. The proposed Pathfinder is tested on our developed test set (34 proteins). For 11 widely studied proteins, we correctly predicted their folding pathways and specifically analyzed 5 of them. For 13 proteins, we predicted their folding pathways to be further verified by biological experiments. For 6 proteins, we analyzed the reasons for the low prediction accuracy. For the other 4 proteins without biological experiment results, potential folding pathways were predicted to provide new insights into protein folding mechanism. The results reveal that structural analogs may have different folding pathways to express different biological functions, homologous proteins may contain common folding pathways, and α-helices may be more prone to early protein folding than β-strands.
Collapse
Affiliation(s)
- Zhaohong Huang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Xinyue Cui
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, China
| |
Collapse
|
4
|
Smets D, Tsirigotaki A, Smit JH, Krishnamurthy S, Portaliou AG, Vorobieva A, Vranken W, Karamanou S, Economou A. Evolutionary adaptation of the protein folding pathway for secretability. EMBO J 2022; 41:e111344. [PMID: 36031863 PMCID: PMC9713715 DOI: 10.15252/embj.2022111344] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 07/14/2022] [Accepted: 08/02/2022] [Indexed: 01/15/2023] Open
Abstract
Secretory preproteins of the Sec pathway are targeted post-translationally and cross cellular membranes through translocases. During cytoplasmic transit, mature domains remain non-folded for translocase recognition/translocation. After translocation and signal peptide cleavage, mature domains fold to native states in the bacterial periplasm or traffic further. We sought the structural basis for delayed mature domain folding and how signal peptides regulate it. We compared how evolution diversified a periplasmic peptidyl-prolyl isomerase PpiA mature domain from its structural cytoplasmic PpiB twin. Global and local hydrogen-deuterium exchange mass spectrometry showed that PpiA is a slower folder. We defined at near-residue resolution hierarchical folding initiated by similar foldons in the twins, at different order and rates. PpiA folding is delayed by less hydrophobic native contacts, frustrated residues and a β-turn in the earliest foldon and by signal peptide-mediated disruption of foldon hierarchy. When selected PpiA residues and/or its signal peptide were grafted onto PpiB, they converted it into a slow folder with enhanced in vivo secretion. These structural adaptations in a secretory protein facilitate trafficking.
Collapse
Affiliation(s)
- Dries Smets
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| | - Alexandra Tsirigotaki
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| | - Jochem H Smit
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| | - Srinath Krishnamurthy
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| | - Athina G Portaliou
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| | - Anastassia Vorobieva
- Structural Biology BrusselsVrije Universiteit Brussel and Center for Structural BiologyBrusselsBelgium
- VIB‐VUB Center for Structural Biology, VIBBrusselsBelgium
| | - Wim Vranken
- Structural Biology BrusselsVrije Universiteit Brussel and Center for Structural BiologyBrusselsBelgium
- VIB‐VUB Center for Structural Biology, VIBBrusselsBelgium
- Interuniversity Institute of Bioinformatics in BrusselsFree University of BrusselsBrusselsBelgium
| | - Spyridoula Karamanou
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| | - Anastassios Economou
- Department of Microbiology and Immunology, Rega Institute for Medical Research, Laboratory of Molecular BacteriologyKU LeuvenLeuvenBelgium
| |
Collapse
|
5
|
Tran MH, Schoeder CT, Schey KL, Meiler J. Computational Structure Prediction for Antibody-Antigen Complexes From Hydrogen-Deuterium Exchange Mass Spectrometry: Challenges and Outlook. Front Immunol 2022; 13:859964. [PMID: 35720345 PMCID: PMC9204306 DOI: 10.3389/fimmu.2022.859964] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2022] [Accepted: 04/22/2022] [Indexed: 11/21/2022] Open
Abstract
Although computational structure prediction has had great successes in recent years, it regularly fails to predict the interactions of large protein complexes with residue-level accuracy, or even the correct orientation of the protein partners. The performance of computational docking can be notably enhanced by incorporating experimental data from structural biology techniques. A rapid method to probe protein-protein interactions is hydrogen-deuterium exchange mass spectrometry (HDX-MS). HDX-MS has been increasingly used for epitope-mapping of antibodies (Abs) to their respective antigens (Ags) in the past few years. In this paper, we review the current state of HDX-MS in studying protein interactions, specifically Ab-Ag interactions, and how it has been used to inform computational structure prediction calculations. Particularly, we address the limitations of HDX-MS in epitope mapping and techniques and protocols applied to overcome these barriers. Furthermore, we explore computational methods that leverage HDX-MS to aid structure prediction, including the computational simulation of HDX-MS data and the combination of HDX-MS and protein docking. We point out challenges in interpreting and incorporating HDX-MS data into Ab-Ag complex docking and highlight the opportunities they provide to build towards a more optimized hybrid method, allowing for more reliable, high throughput epitope identification.
Collapse
Affiliation(s)
- Minh H. Tran
- Chemical and Physical Biology Program, Vanderbilt University, Nashville, TN, United States
- Center of Structural Biology, Vanderbilt University, Nashville, TN, United States
- Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University, Nashville, TN, United States
| | - Clara T. Schoeder
- Center of Structural Biology, Vanderbilt University, Nashville, TN, United States
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Institute for Drug Discovery, University Leipzig Medical School, Leipzig, Germany
| | - Kevin L. Schey
- Mass Spectrometry Research Center, Department of Biochemistry, Vanderbilt University, Nashville, TN, United States
| | - Jens Meiler
- Center of Structural Biology, Vanderbilt University, Nashville, TN, United States
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Institute for Drug Discovery, University Leipzig Medical School, Leipzig, Germany
| |
Collapse
|
6
|
Devaurs D, Antunes DA, Borysik AJ. Computational Modeling of Molecular Structures Guided by Hydrogen-Exchange Data. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2022; 33:215-237. [PMID: 35077179 DOI: 10.1021/jasms.1c00328] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Data produced by hydrogen-exchange monitoring experiments have been used in structural studies of molecules for several decades. Despite uncertainties about the structural determinants of hydrogen exchange itself, such data have successfully helped guide the structural modeling of challenging molecular systems, such as membrane proteins or large macromolecular complexes. As hydrogen-exchange monitoring provides information on the dynamics of molecules in solution, it can complement other experimental techniques in so-called integrative modeling approaches. However, hydrogen-exchange data have often only been used to qualitatively assess molecular structures produced by computational modeling tools. In this paper, we look beyond qualitative approaches and survey the various paradigms under which hydrogen-exchange data have been used to quantitatively guide the computational modeling of molecular structures. Although numerous prediction models have been proposed to link molecular structure and hydrogen exchange, none of them has been widely accepted by the structural biology community. Here, we present as many hydrogen-exchange prediction models as we could find in the literature, with the aim of providing the first exhaustive list of its kind. From purely structure-based models to so-called fractional-population models or knowledge-based models, the field is quite vast. We aspire for this paper to become a resource for practitioners to gain a broader perspective on the field and guide research toward the definition of better prediction models. This will eventually improve synergies between hydrogen-exchange monitoring and molecular modeling.
Collapse
Affiliation(s)
- Didier Devaurs
- MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh EH4 2XU, U.K
| | - Dinler A Antunes
- Department of Biology and Biochemistry, University of Houston, Houston, Texas 77005, United States
| | - Antoni J Borysik
- Department of Chemistry, King's College London, London SE1 1DB, U.K
| |
Collapse
|
7
|
Abstract
Summary Motivation. Predicting the native state of a protein has long been considered a gateway problem for understanding protein folding. Recent advances in structural modeling driven by deep learning have achieved unprecedented success at predicting a protein’s crystal structure, but it is not clear if these models are learning the physics of how proteins dynamically fold into their equilibrium structure or are just accurate knowledge-based predictors of the final state. Results. In this work, we compare the pathways generated by state-of-the-art protein structure prediction methods to experimental data about protein folding pathways. The methods considered were AlphaFold 2, RoseTTAFold, trRosetta, RaptorX, DMPfold, EVfold, SAINT2 and Rosetta. We find evidence that their simulated dynamics capture some information about the folding pathway, but their predictive ability is worse than a trivial classifier using sequence-agnostic features like chain length. The folding trajectories produced are also uncorrelated with experimental observables such as intermediate structures and the folding rate constant. These results suggest that recent advances in structure prediction do not yet provide an enhanced understanding of protein folding. Availability. The data underlying this article are available in GitHub at https://github.com/oxpig/structure-vs-folding/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos Outeiral
- Department of Statistics, University of Oxford, Oxford OX1 3PB, UK
| | - Daniel A Nissley
- Department of Statistics, University of Oxford, Oxford OX1 3PB, UK
| | | |
Collapse
|
8
|
Abstract
Knowledge of protein structure is crucial to our understanding of biological function and is routinely used in drug discovery. High-resolution techniques to determine the three-dimensional atomic coordinates of proteins are available. However, such methods are frequently limited by experimental challenges such as sample quantity, target size, and efficiency. Structural mass spectrometry (MS) is a technique in which structural features of proteins are elucidated quickly and relatively easily. Computational techniques that convert sparse MS data into protein models that demonstrate agreement with the data are needed. This review features cutting-edge computational methods that predict protein structure from MS data such as chemical cross-linking, hydrogen-deuterium exchange, hydroxyl radical protein footprinting, limited proteolysis, ion mobility, and surface-induced dissociation. Additionally, we address future directions for protein structure prediction with sparse MS data. Expected final online publication date for the Annual Review of Physical Chemistry, Volume 73 is April 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Sarah E Biehn
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA;
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, USA;
| |
Collapse
|
9
|
Grau I, Nowé A, Vranken W. Interpreting a black box predictor to gain insights into early folding mechanisms. Comput Struct Biotechnol J 2021; 19:4919-4930. [PMID: 34527196 PMCID: PMC8433119 DOI: 10.1016/j.csbj.2021.08.041] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 11/21/2022] Open
Abstract
Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding.
Collapse
Affiliation(s)
- Isel Grau
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Ann Nowé
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, 1050 Brussels, Belgium
| | - Wim Vranken
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium.,VIB Structural Biology Research Centre, Brussels 1050, Belgium
| |
Collapse
|
10
|
Marzolf DR, Seffernick JT, Lindert S. Protein Structure Prediction from NMR Hydrogen-Deuterium Exchange Data. J Chem Theory Comput 2021; 17:2619-2629. [PMID: 33780620 DOI: 10.1021/acs.jctc.1c00077] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Amide hydrogen-deuterium exchange (HDX) has long been used to determine regional flexibility and binding sites in proteins; however, the data are too sparse for full structural characterization. Experiments that measure HDX rates, such as HDX-NMR, have far higher throughput compared to structure determination via X-ray crystallography, cryo-EM, or a full suite of NMR experiments. Data from HDX-NMR experiments encode information on the protein structure, making HDX a prime candidate to be supplemented by computational algorithms for protein structure prediction. We have developed a methodology to incorporate HDX-NMR data into ab initio protein structure prediction using the Rosetta software framework to predict structures based on experimental agreement. To demonstrate the efficacy of our algorithm, we examined 38 proteins with HDX-NMR data available, comparing the predicted model with and without the incorporation of HDX data into scoring. The root-mean-square deviation (rmsd, a measure of the average atomic distance between superimposed models) of the predicted model improved by 1.42 Å on average after incorporating the HDX-NMR data into scoring. The average rmsd improvement for the proteins where the selected model rmsd changed after incorporating HDX data was 3.63 Å, including one improvement of more than 11 Å and seven proteins improving by greater than 4 Å, with 12/15 proteins improving overall. Additionally, for independent verification, two proteins that were not part of the original benchmark were scored including HDX data, with a dramatic improvement of the selected model rmsd of nearly 9 Å for one of the proteins. Moreover, we have developed a confidence metric allowing us to successfully identify near-native models in the absence of a native structure. Improvement in model selection with a strong confidence measure demonstrates that protein structure prediction with HDX-NMR is a powerful tool which can be performed with minimal additional computational strain and expense.
Collapse
Affiliation(s)
- Daniel R Marzolf
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Justin T Seffernick
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, Ohio 43210, United States
| |
Collapse
|
11
|
Turina P, Fariselli P, Capriotti E. ThermoScan: Semi-automatic Identification of Protein Stability Data From PubMed. Front Mol Biosci 2021; 8:620475. [PMID: 33842537 PMCID: PMC8027235 DOI: 10.3389/fmolb.2021.620475] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/18/2021] [Indexed: 11/13/2022] Open
Abstract
During the last years, the increasing number of DNA sequencing and protein mutagenesis studies has generated a large amount of variation data published in the biomedical literature. The collection of such data has been essential for the development and assessment of tools predicting the impact of protein variants at functional and structural levels. Nevertheless, the collection of manually curated data from literature is a highly time consuming and costly process that requires domain experts. In particular, the development of methods for predicting the effect of amino acid variants on protein stability relies on the thermodynamic data extracted from literature. In the past, such data were deposited in the ProTherm database, which however is no longer maintained since 2013. For facilitating the collection of protein thermodynamic data from literature, we developed the semi-automatic tool ThermoScan. ThermoScan is a text mining approach for the identification of relevant thermodynamic data on protein stability from full-text articles. The method relies on a regular expression searching for groups of words, including the most common conceptual words appearing in experimental studies on protein stability, several thermodynamic variables, and their units of measure. ThermoScan analyzes full-text articles from the PubMed Central Open Access subset and calculates an empiric score that allows the identification of manuscripts reporting thermodynamic data on protein stability. The method was optimized on a set of publications included in the ProTherm database, and tested on a new curated set of articles, manually selected for presence of thermodynamic data. The results show that ThermoScan returns accurate predictions and outperforms recently developed text-mining algorithms based on the analysis of publication abstracts. Availability: The ThermoScan server is freely accessible online at https://folding.biofold.org/thermoscan. The ThermoScan python code and the Google Chrome extension for submitting visualized PMC web pages to the ThermoScan server are available at https://github.com/biofold/ThermoScan.
Collapse
Affiliation(s)
- Paola Turina
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Torino, Torino, Italy
| | - Emidio Capriotti
- Department of Pharmacy and Biotechnology (FaBiT), University of Bologna, Bologna, Italy
| |
Collapse
|
12
|
Hou Q, Pucci F, Ancien F, Kwasigroch JM, Bourgeas R, Rooman M. SWOTein: a structure-based approach to predict stability Strengths and Weaknesses of prOTEINs. Bioinformatics 2021; 37:1963–1971. [PMID: 33471089 DOI: 10.1093/bioinformatics/btab034] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 12/05/2020] [Accepted: 01/15/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Although structured proteins adopt their lowest free energy conformation in physiological conditions, the individual residues are generally not in their lowest free energy conformation. Residues that are stability weaknesses are often involved in functional regions, whereas stability strengths ensure local structural stability. The detection of strengths and weaknesses provides key information to guide protein engineering experiments aiming to modulate folding and various functional processes. RESULTS We developed the SWOTein predictor which identifies strong and weak residues in proteins on the basis of three types of statistical energy functions describing local interactions along the chain, hydrophobic forces and tertiary interactions. The large-scale analysis of the different types of strengths and weaknesses demonstrated their complementarity and the enhancement of the information they provide. Moreover, a good average correlation was observed between predicted and experimental strengths and weaknesses obtained from native hydrogen exchange data. SWOTein application to three test cases further showed its suitability to predict and interpret strong and weak residues in the context of folding, conformational changes and protein-protein binding. In summary, SWOTein is both fast and accurate and can be applied at small and large scale to analyze and modulate folding and molecular recognition processes. AVAILABILITY The SWOTein webserver provides the list of predicted strengths and weaknesses and a protein structure visualization tool that facilitates the interpretation of the predictions. It is freely available for academic use at http://babylone.ulb.ac.be/SWOTein/.
Collapse
Affiliation(s)
- Qingzhen Hou
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Shandong 250002, P. R. China.,National Institute of Health Data Science of China, Shandong University, Shandong 250002, P. R. China.,Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - François Ancien
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| | - Jean-Marc Kwasigroch
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Raphaël Bourgeas
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, Brussels 1050, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, Boulevard du Triomphe, 1050 Brussels, Belgium
| |
Collapse
|
13
|
Pantoja-Uceda D, Oroz J, Fernández C, de Alba E, Giraldo R, Laurents DV. Conformational Priming of RepA-WH1 for Functional Amyloid Conversion Detected by NMR Spectroscopy. Structure 2020; 28:336-347.e4. [PMID: 31918960 DOI: 10.1016/j.str.2019.12.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 10/03/2019] [Accepted: 12/16/2019] [Indexed: 12/21/2022]
Abstract
How proteins with a stable globular fold acquire the amyloid state is still largely unknown. RepA, a versatile plasmidic DNA binding protein from Pseudomonas savastanoi, is functional as a transcriptional repressor or as an initiator or inhibitor of DNA replication, the latter via assembly of an amyloidogenic oligomer. Its N-terminal domain (WH1) is responsible for discrimination between these functional abilities by undergoing insufficiently understood structural changes. RepA-WH1 is a stable dimer whose conformational dynamics had not been explored. Here, we have studied it through NMR {1H}-15N relaxation and H/D exchange kinetics measurements. The N- and the C-terminal α-helices, and the internal amyloidogenic loop, are partially unfolded in solution. S4-indigo, a small inhibitor of RepA-WH1 amyloidogenesis, binds to and tethers the N-terminal α-helix to a β-hairpin that is involved in dimerization, thus providing evidence for a priming role of fraying ends and dimerization switches in the amyloidogenesis of folded proteins.
Collapse
Affiliation(s)
- David Pantoja-Uceda
- Instituto de Química Física "Rocasolano", Consejo Superior de Investigaciones Científicas, c/ Serrano 119, Madrid 28006, Spain
| | - Javier Oroz
- Instituto de Química Física "Rocasolano", Consejo Superior de Investigaciones Científicas, c/ Serrano 119, Madrid 28006, Spain
| | - Cristina Fernández
- Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas, c/ Ramiro de Maeztu 9, Madrid 28040, Spain
| | - Eva de Alba
- Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas, c/ Ramiro de Maeztu 9, Madrid 28040, Spain
| | - Rafael Giraldo
- Centro de Investigaciones Biológicas, Consejo Superior de Investigaciones Científicas, c/ Ramiro de Maeztu 9, Madrid 28040, Spain.
| | - Douglas V Laurents
- Instituto de Química Física "Rocasolano", Consejo Superior de Investigaciones Científicas, c/ Serrano 119, Madrid 28006, Spain.
| |
Collapse
|
14
|
Bittrich S, Schroeder M, Labudde D. StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map. Sci Rep 2019; 9:18517. [PMID: 31811259 PMCID: PMC6898053 DOI: 10.1038/s41598-019-55047-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022] Open
Abstract
Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.
Collapse
Affiliation(s)
- Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany. .,Biotechnology Center (BIOTEC), TU Dresden, Dresden, 01307, Germany. .,Research Collaboratory for Structural Bioinformatics Protein Data Bank, University of California, San Diego, La Jolla, CA, 92093, USA.
| | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany
| |
Collapse
|
15
|
Raimondi D, Tanyalcin I, Ferté J, Gazzo A, Orlando G, Lenaerts T, Rooman M, Vranken W. DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins. Nucleic Acids Res 2019; 45:W201-W206. [PMID: 28498993 PMCID: PMC5570203 DOI: 10.1093/nar/gkx390] [Citation(s) in RCA: 91] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Accepted: 04/26/2017] [Indexed: 12/22/2022] Open
Abstract
High-throughput sequencing methods are generating enormous amounts of genomic data, giving unprecedented insights into human genetic variation and its relation to disease. An individual human genome contains millions of Single Nucleotide Variants: to discriminate the deleterious from the benign ones, a variety of methods have been developed that predict whether a protein-coding variant likely affects the carrier individual's health. We present such a method, DEOGEN2, which incorporates heterogeneous information about the molecular effects of the variants, the domains involved, the relevance of the gene and the interactions in which it participates. This extensive contextual information is non-linearly mapped into one single deleteriousness score for each variant. Since for the non-expert user it is sometimes still difficult to assess what this score means, how it relates to the encoded protein, and where it originates from, we developed an interactive online framework (http://deogen2.mutaframe.com/) to better present the DEOGEN2 deleteriousness predictions of all possible variants in all human proteins. The prediction is visualized so both expert and non-expert users can gain insights into the meaning, protein context and origins of each prediction.
Collapse
Affiliation(s)
- Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Ibrahim Tanyalcin
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Julien Ferté
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,3BIO-BioInfo Group, Université Libre De Bruxelles, AV Fr. Roosevelt 50, CP 165/61, Brussels 1050, Belgium
| | - Andrea Gazzo
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, 1050 Brussels, Belgium
| | - Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,Machine Learning Group, Université Libre de Bruxelles, Boulevard du Triomphe, CP 212, 1050 Brussels, Belgium.,Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
| | - Marianne Rooman
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,3BIO-BioInfo Group, Université Libre De Bruxelles, AV Fr. Roosevelt 50, CP 165/61, Brussels 1050, Belgium
| | - Wim Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, BC building, 6th floor, CP 263, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.,Artificial Intelligence Lab, Vrije Universiteit Brussel, Pleinlaan 2, Brussels 1050, Belgium
| |
Collapse
|
16
|
Auto-encoding NMR chemical shifts from their native vector space to a residue-level biophysical index. Nat Commun 2019; 10:2511. [PMID: 31175284 PMCID: PMC6555786 DOI: 10.1038/s41467-019-10322-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 05/01/2019] [Indexed: 11/26/2022] Open
Abstract
Chemical shifts (CS) are determined from NMR experiments and represent the resonance frequency of the spin of atoms in a magnetic field. They contain a mixture of information, encompassing the in-solution conformations a protein adopts, as well as the movements it performs. Due to their intrinsically multi-faceted nature, CS are difficult to interpret and visualize. Classical approaches for the analysis of CS aim to extract specific protein-related properties, thus discarding a large amount of information that cannot be directly linked to structural features of the protein. Here we propose an autoencoder-based method, called ShiftCrypt, that provides a way to analyze, compare and interpret CS in their native, multidimensional space. We show that ShiftCrypt conserves information about the most common structural features. In addition, it can be used to identify hidden similarities between diverse proteins and peptides, and differences between the same protein in two different binding states. NMR chemical shift information is highly valuable in the investigation of small molecule and protein structure. Here, the authors developed a neural network approach to unify protein chemical shifts and their changes in response to changes in protein sequence, structure, and dimerization interactions.
Collapse
|
17
|
Bittrich S, Kaden M, Leberecht C, Kaiser F, Villmann T, Labudde D. Application of an interpretable classification model on Early Folding Residues during protein folding. BioData Min 2019; 12:1. [PMID: 30627219 PMCID: PMC6321665 DOI: 10.1186/s13040-018-0188-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 11/20/2018] [Indexed: 01/09/2023] Open
Abstract
Background Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. Results Generalized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/. Conclusions The application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results. Electronic supplementary material The online version of this article (10.1186/s13040-018-0188-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sebastian Bittrich
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Marika Kaden
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| | - Christoph Leberecht
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Florian Kaiser
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Thomas Villmann
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| | - Dirk Labudde
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| |
Collapse
|
18
|
Bittrich S, Schroeder M, Labudde D. Characterizing the relation of functional and Early Folding Residues in protein structures using the example of aminoacyl-tRNA synthetases. PLoS One 2018; 13:e0206369. [PMID: 30376559 PMCID: PMC6207335 DOI: 10.1371/journal.pone.0206369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 10/11/2018] [Indexed: 01/10/2023] Open
Abstract
Proteins are chains of amino acids which adopt a three-dimensional structure and are then able to catalyze chemical reactions or propagate signals in organisms. Without external influence, many proteins fold into their native structure, and a small number of Early Folding Residues (EFR) have previously been shown to initiate the formation of secondary structure elements and guide their respective assembly. Using the two diverse superfamilies of aminoacyl-tRNA synthetases (aaRS), it is shown that the position of EFR is preserved over the course of evolution even when the corresponding sequence conservation is small. Folding initiation sites are positioned in the center of secondary structure elements, independent of aaRS class. In class I, the predicted position of EFR resembles an ancient structural packing motif present in many seemingly unrelated proteins. Furthermore, it is shown that EFR and functionally relevant residues in aaRS are almost entirely disjoint sets of residues. The Start2Fold database is used to investigate whether this separation of EFR and functional residues can be observed for other proteins. EFR are found to constitute crucial connectors of protein regions which are distant at sequence level. Especially, these residues exhibit a high number of non-covalent residue-residue contacts such as hydrogen bonds and hydrophobic interactions. This tendency also manifests as energetically stable local regions, as substantiated by a knowledge-based potential. Despite profound differences regarding how EFR and functional residues are embedded in protein structures, a strict separation of structurally and functionally relevant residues cannot be observed for a more general collection of proteins.
Collapse
Affiliation(s)
- Sebastian Bittrich
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Dirk Labudde
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
| |
Collapse
|
19
|
Wang B, Perez-Rathke A, Li R, Liang J. A General Method for Predicting Amino Acid Residues Experiencing Hydrogen Exchange. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2018; 2018:341-344. [PMID: 29780972 PMCID: PMC5957487 DOI: 10.1109/bhi.2018.8333438] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Information on protein hydrogen exchange can help delineate key regions involved in protein-protein interactions and provides important insight towards determining functional roles of genetic variants and their possible mechanisms in disease processes. Previous studies have shown that the degree of hydrogen exchange is affected by hydrogen bond formations, solvent accessibility, proximity to other residues, and experimental conditions. However, a general predictive method for identifying residues capable of hydrogen exchange transferable to a broad set of proteins is lacking. We have developed a machine learning method based on random forest that can predict whether a residue experiences hydrogen exchange. Using data from the Start2Fold database, which contains information on 13,306 residues (3,790 of which experience hydrogen exchange and 9,516 which do not exchange), our method achieves good performance. Specifically, we achieve an overall out-of-bag (OOB) error, an unbiased estimate of the test set error, of 20.3 percent. Using a randomly selected test data set consisting of 500 residues experiencing hydrogen exchange and 500 which do not, our method achieves an accuracy of 0.79, a recall of 0.74, a precision of 0.82, and an F1 score of 0.78.
Collapse
Affiliation(s)
- Boshen Wang
- Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Alan Perez-Rathke
- Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Renhao Li
- Aflac Cancer and Blood Disorders Center, Department of Pediatrics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Jie Liang
- Bioinformatics Program, Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, USA
| |
Collapse
|
20
|
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 2017; 7:8826. [PMID: 28821744 PMCID: PMC5562875 DOI: 10.1038/s41598-017-08366-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 07/10/2017] [Indexed: 11/23/2022] Open
Abstract
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Collapse
|
21
|
Pancsa R, Raimondi D, Cilia E, Vranken WF. Early Folding Events, Local Interactions, and Conservation of Protein Backbone Rigidity. Biophys J 2017; 110:572-583. [PMID: 26840723 DOI: 10.1016/j.bpj.2015.12.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/21/2015] [Accepted: 12/29/2015] [Indexed: 01/20/2023] Open
Abstract
Protein folding is in its early stages largely determined by the protein sequence and complex local interactions between amino acids, resulting in lower energy conformations that provide the context for further folding into the native state. We compiled a comprehensive data set of early folding residues based on pulsed labeling hydrogen deuterium exchange experiments. These early folding residues have corresponding higher backbone rigidity as predicted by DynaMine from sequence, an effect also present when accounting for the secondary structures in the folded protein. We then show that the amino acids involved in early folding events are not more conserved than others, but rather, early folding fragments and the secondary structure elements they are part of show a clear trend toward conserving a rigid backbone. We therefore propose that backbone rigidity is a fundamental physical feature conserved by proteins that can provide important insights into their folding mechanisms and stability.
Collapse
Affiliation(s)
- Rita Pancsa
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Daniele Raimondi
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Elisa Cilia
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium
| | - Wim F Vranken
- Structural Biology Brussels, Vrije Universiteit Brussel, Brussels, Belgium.
| |
Collapse
|
22
|
Rigden DJ, Fernández-Suárez XM, Galperin MY. The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection. Nucleic Acids Res 2016; 44:D1-6. [PMID: 26740669 PMCID: PMC4702933 DOI: 10.1093/nar/gkv1356] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2015] [Accepted: 11/23/2015] [Indexed: 01/21/2023] Open
Abstract
The 2016 Database Issue of Nucleic Acids Research starts with overviews of the resources provided by three major bioinformatics centers, the U.S. National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI) and Swiss Institute for Bioinformatics (SIB). Also included are descriptions of 62 new databases and updates on 95 databases that have been previously featured in NAR plus 17 previously described elsewhere. A number of papers in this issue deal with resources on nucleic acids, including various kinds of non-coding RNAs and their interactions, molecular dynamics simulations of nucleic acid structure, and two databases of super-enhancers. The protein database section features important updates on the EBI's Pfam, PDBe and PRIDE databases, as well as a variety of resources on pathways, metabolomics and metabolic modeling. This issue also includes updates on popular metagenomics resources, such as MG-RAST, EBI Metagenomics, and probeBASE, as well as a newly compiled Human Pan-Microbe Communities database. A significant fraction of the new and updated databases are dedicated to the genetic basis of disease, primarily cancer, and various aspects of drug research, including resources for patented drugs, their side effects, withdrawn drugs, and potential drug targets. A further six papers present updated databases of various antimicrobial and anticancer peptides. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/). The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been updated with the addition of 88 new resources and removal of 23 obsolete websites, which brought the current listing to 1685 databases.
Collapse
Affiliation(s)
- Daniel J Rigden
- Institute of Integrative Biology, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
| | | | - Michael Y Galperin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|