1
|
Chalopin Y, Sparfel J. Energy Bilocalization Effect and the Emergence of Molecular Functions in Proteins. Front Mol Biosci 2022; 8:736376. [PMID: 35004841 PMCID: PMC8733615 DOI: 10.3389/fmolb.2021.736376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Accepted: 10/20/2021] [Indexed: 11/13/2022] Open
Abstract
Proteins are among the most complex molecular structures, which have evolved to develop broad functions, such as energy conversion and transport, information storage and processing, communication, and regulation of chemical reactions. However, the mechanisms by which these dynamical entities coordinate themselves to perform biological tasks remain hotly debated. Here, a physical theory is presented to explain how functional dynamical behavior possibly emerge in complex/macro molecules, thanks to the effect that we term bilocalization of thermal vibrations. More specifically, our approach allows us to understand how structural irregularities lead to a partitioning of the energy of the vibrations into two distinct sets of molecular domains, corresponding to slow and fast motions. This shape-encoded spectral allocation, associated to the genetic sequence, provides a close access to a wide reservoir of dynamical patterns, and eventually allows the emergence of biological functions by natural selection. To illustrate our approach, the SPIKE protein structure of SARS-COV2 is considered.
Collapse
Affiliation(s)
- Yann Chalopin
- Laboratoire EM2C-CNRS and CentraleSupélec, University of Paris-Saclay, Gif-sur-Yvette, France
| | - Julien Sparfel
- Laboratoire EM2C-CNRS and CentraleSupélec, University of Paris-Saclay, Gif-sur-Yvette, France
| |
Collapse
|
2
|
Grau I, Nowé A, Vranken W. Interpreting a black box predictor to gain insights into early folding mechanisms. Comput Struct Biotechnol J 2021; 19:4919-4930. [PMID: 34527196 PMCID: PMC8433119 DOI: 10.1016/j.csbj.2021.08.041] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 11/21/2022] Open
Abstract
Protein folding and function are closely connected, but the exact mechanisms by which proteins fold remain elusive. Early folding residues (EFRs) are amino acids within a particular protein that induce the very first stages of the folding process. High-resolution EFR data are only available for few proteins, which has previously enabled the training of a protein sequence-based machine learning 'black box' predictor (EFoldMine). Such a black box approach does not allow a direct extraction of the 'early folding rules' embedded in the protein sequence, whilst such interpretation is essential to improve our understanding of how the folding process works. We here apply and investigate a novel 'grey box' approach to the prediction of EFRs from protein sequence to gain mechanistic residue-level insights into the sequence determinants of EFRs in proteins. We interpret the rule set for three datasets, a default set comprised of natural proteins, a scrambled set comprised of the scrambled default set sequences, and a set of de novo designed proteins. Finally, we relate these data to the secondary structure adopted in the folded protein and provide all information online via http://xefoldmine.bio2byte.be/, as a resource to help understand and steer early protein folding.
Collapse
Affiliation(s)
- Isel Grau
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Ann Nowé
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, 1050 Brussels, Belgium
| | - Wim Vranken
- Artificial Intelligence Laboratory, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, ULB/VUB, Triomflaan, 1050 Brussels, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Brussels 1050, Belgium.,VIB Structural Biology Research Centre, Brussels 1050, Belgium
| |
Collapse
|
3
|
James EI, Murphree TA, Vorauer C, Engen JR, Guttman M. Advances in Hydrogen/Deuterium Exchange Mass Spectrometry and the Pursuit of Challenging Biological Systems. Chem Rev 2021; 122:7562-7623. [PMID: 34493042 PMCID: PMC9053315 DOI: 10.1021/acs.chemrev.1c00279] [Citation(s) in RCA: 102] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
![]()
Solution-phase hydrogen/deuterium
exchange (HDX) coupled to mass
spectrometry (MS) is a widespread tool for structural analysis across
academia and the biopharmaceutical industry. By monitoring the exchangeability
of backbone amide protons, HDX-MS can reveal information about higher-order
structure and dynamics throughout a protein, can track protein folding
pathways, map interaction sites, and assess conformational states
of protein samples. The combination of the versatility of the hydrogen/deuterium
exchange reaction with the sensitivity of mass spectrometry has enabled
the study of extremely challenging protein systems, some of which
cannot be suitably studied using other techniques. Improvements over
the past three decades have continually increased throughput, robustness,
and expanded the limits of what is feasible for HDX-MS investigations.
To provide an overview for researchers seeking to utilize and derive
the most from HDX-MS for protein structural analysis, we summarize
the fundamental principles, basic methodology, strengths and weaknesses,
and the established applications of HDX-MS while highlighting new
developments and applications.
Collapse
Affiliation(s)
- Ellie I James
- Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Taylor A Murphree
- Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - Clint Vorauer
- Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
| | - John R Engen
- Department of Chemistry & Chemical Biology, Northeastern University, Boston, Massachusetts 02115, United States
| | - Miklos Guttman
- Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195, United States
| |
Collapse
|
4
|
Danchin A. Biological innovation in the functional landscape of a model regulator, or the lactose operon repressor. C R Biol 2021; 344:111-126. [PMID: 34213850 DOI: 10.5802/crbiol.52] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 06/01/2021] [Indexed: 12/24/2022]
Abstract
The operon model was proposed six decades ago. And yet, despite all this time, the lactose operon repressor, LacI, remains a subject of major interest. While it is well established that LacI can exist in two functional forms, one that renders the operon inactive via binding of LacI to DNA and another, bound to an inducer that does not allow repression, how it switches from one to the other is still not well understood. The construction of a library of several tens of thousands of LacI mutants has revealed some unexpected features. In particular, the transition implemented in some of them reveals a new type of transcription regulation: band-pass (OFF/ON/OFF) and band-stop (ON/OFF/ON) filters. This makes it natural to think that it is the network of hydrogen bonds associated with the water bound to the molecule that allows the remote interconnection between the binding site to an inducer molecule and the one that binds it to the DNA.
Collapse
Affiliation(s)
- Antoine Danchin
- Kodikos Labs, Institut Cochin, 24 rue du Faubourg Saint-Jacques 75014 Paris, France
| |
Collapse
|
5
|
Kagami L, Roca-Martínez J, Gavaldá-García J, Ramasamy P, Feenstra KA, Vranken WF. Online biophysical predictions for SARS-CoV-2 proteins. BMC Mol Cell Biol 2021; 22:23. [PMID: 33892639 PMCID: PMC8062939 DOI: 10.1186/s12860-021-00362-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 04/01/2021] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND The SARS-CoV-2 virus, the causative agent of COVID-19, consists of an assembly of proteins that determine its infectious and immunological behavior, as well as its response to therapeutics. Major structural biology efforts on these proteins have already provided essential insights into the mode of action of the virus, as well as avenues for structure-based drug design. However, not all of the SARS-CoV-2 proteins, or regions thereof, have a well-defined three-dimensional structure, and as such might exhibit ambiguous, dynamic behaviour that is not evident from static structure representations, nor from molecular dynamics simulations using these structures. MAIN: We present a website ( https://bio2byte.be/sars2/ ) that provides protein sequence-based predictions of the backbone and side-chain dynamics and conformational propensities of these proteins, as well as derived early folding, disorder, β-sheet aggregation, protein-protein interaction and epitope propensities. These predictions attempt to capture the inherent biophysical propensities encoded in the sequence, rather than context-dependent behaviour such as the final folded state. In addition, we provide the biophysical variation that is observed in homologous proteins, which gives an indication of the limits of their functionally relevant biophysical behaviour. CONCLUSION The https://bio2byte.be/sars2/ website provides a range of protein sequence-based predictions for 27 SARS-CoV-2 proteins, enabling researchers to form hypotheses about their possible functional modes of action.
Collapse
Affiliation(s)
- Luciano Kagami
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
| | - Joel Roca-Martínez
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium
| | - Jose Gavaldá-García
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium
| | - Pathmanaban Ramasamy
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium
- VIB-UGent Center for Medical Biotechnology, VIB, 9000, Ghent, Belgium
- Department of Biomolecular Medicine, Faculty of Health Sciences and Medicine, Ghent University, 9000, Ghent, Belgium
| | - K Anton Feenstra
- IBIVU - Center for Integrative Bioinformatics, Department of Computer Science, Vrije Universiteit Amsterdam, Amsterdam, 1081HV, The Netherlands
- AIMMS - Amsterdam Institute for Molecules Medicines and Systems, Vrije Universiteit Amsterdam, Amsterdam, 1081HV, The Netherlands
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Triomflaan, 1050, Brussels, Belgium.
- Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, 1050, Brussels, Belgium.
- VIB Structural Biology Research Centre, Pleinlaan 2, 1050, Brussels, Belgium.
| |
Collapse
|
6
|
Waman VP, Sen N, Varadi M, Daina A, Wodak SJ, Zoete V, Velankar S, Orengo C. The impact of structural bioinformatics tools and resources on SARS-CoV-2 research and therapeutic strategies. Brief Bioinform 2021; 22:742-768. [PMID: 33348379 PMCID: PMC7799268 DOI: 10.1093/bib/bbaa362] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 11/06/2020] [Accepted: 11/09/2020] [Indexed: 01/18/2023] Open
Abstract
SARS-CoV-2 is the causative agent of COVID-19, the ongoing global pandemic. It has posed a worldwide challenge to human health as no effective treatment is currently available to combat the disease. Its severity has led to unprecedented collaborative initiatives for therapeutic solutions against COVID-19. Studies resorting to structure-based drug design for COVID-19 are plethoric and show good promise. Structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species. The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics tools and resources for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 infection and therapeutic development. These include (i) understanding differences between SARS-CoV-2 and SARS-CoV, leading to increased infectivity of SARS-CoV-2, (ii) deciphering key residues in the SARS-CoV-2 involved in receptor-antibody recognition, (iii) analysis of variants in host proteins that affect host susceptibility to infection and (iv) analyses facilitating structure-based drug and vaccine design against SARS-CoV-2.
Collapse
Affiliation(s)
| | | | | | - Antoine Daina
- Molecular Modeling Group at SIB, Swiss Institute of Bioinformatics
| | | | - Vincent Zoete
- Department of Fundamental Oncology at the University of Lausanne and Group leader at SIB
| | | | | |
Collapse
|
7
|
Chalopin Y. The physical origin of rate promoting vibrations in enzymes revealed by structural rigidity. Sci Rep 2020; 10:17465. [PMID: 33060716 PMCID: PMC7566648 DOI: 10.1038/s41598-020-74439-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2020] [Accepted: 09/30/2020] [Indexed: 02/07/2023] Open
Abstract
Enzymes are the most efficient catalysts known to date. However, decades of research have failed to fully explain the catalytic power of enzymes, and most of the current attempts to uncloak the details of atomic motions at active sites remain incomplete. Here, a straightforward manner for understanding the interplay between the complex or irregular enzyme topology and dynamical effects at catalytic sites is introduced, by revealing how fast localized vibrations form spontaneously in the stiffest parts of the scaffold. While shedding light on a physical mechanism that allowed the selection of the picosecond (ps) timescale to increase the catalytic proficiency, this approach exposes the functional importance of localized motions as a by-product of the stability-function tradeoff in enzyme evolution. From this framework of analysis—directly accessible from available diffraction data—experimental strategies for engineering the catalytic rate in enzymatic proteins are proposed.
Collapse
Affiliation(s)
- Yann Chalopin
- Laboratoire EM2C, CNRS & CentraleSupelec, University of Paris-Saclay, 91190, Gif-sur-Yvette, France.
| |
Collapse
|
8
|
Bittrich S, Schroeder M, Labudde D. StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map. Sci Rep 2019; 9:18517. [PMID: 31811259 PMCID: PMC6898053 DOI: 10.1038/s41598-019-55047-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022] Open
Abstract
Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.
Collapse
Affiliation(s)
- Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany. .,Biotechnology Center (BIOTEC), TU Dresden, Dresden, 01307, Germany. .,Research Collaboratory for Structural Bioinformatics Protein Data Bank, University of California, San Diego, La Jolla, CA, 92093, USA.
| | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany
| |
Collapse
|
9
|
Narwani TJ, Craveur P, Shinada NK, Floch A, Santuz H, Vattekatte AM, Srinivasan N, Rebehmed J, Gelly JC, Etchebest C, de Brevern AG. Discrete analyses of protein dynamics. J Biomol Struct Dyn 2019; 38:2988-3002. [PMID: 31361191 DOI: 10.1080/07391102.2019.1650112] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tarun Jairaj Narwani
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Pierrick Craveur
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nicolas K Shinada
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Discngine, SAS, Paris, France
| | - Aline Floch
- Laboratoire D'Excellence GR-Ex, Paris, France.,Etablissement Français du Sang Ile de France, Créteil, France.,IMRB - INSERM U955 Team 2 « Transfusion et Maladies du Globule Rouge », Paris Est- Créteil Univ, Créteil, France.,UPEC, Université Paris Est-Créteil, Créteil, France
| | - Hubert Santuz
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Akhila Melarkode Vattekatte
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | | | - Joseph Rebehmed
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| | - Catherine Etchebest
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | - Alexandre G de Brevern
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| |
Collapse
|
10
|
Auto-encoding NMR chemical shifts from their native vector space to a residue-level biophysical index. Nat Commun 2019; 10:2511. [PMID: 31175284 PMCID: PMC6555786 DOI: 10.1038/s41467-019-10322-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 05/01/2019] [Indexed: 11/26/2022] Open
Abstract
Chemical shifts (CS) are determined from NMR experiments and represent the resonance frequency of the spin of atoms in a magnetic field. They contain a mixture of information, encompassing the in-solution conformations a protein adopts, as well as the movements it performs. Due to their intrinsically multi-faceted nature, CS are difficult to interpret and visualize. Classical approaches for the analysis of CS aim to extract specific protein-related properties, thus discarding a large amount of information that cannot be directly linked to structural features of the protein. Here we propose an autoencoder-based method, called ShiftCrypt, that provides a way to analyze, compare and interpret CS in their native, multidimensional space. We show that ShiftCrypt conserves information about the most common structural features. In addition, it can be used to identify hidden similarities between diverse proteins and peptides, and differences between the same protein in two different binding states. NMR chemical shift information is highly valuable in the investigation of small molecule and protein structure. Here, the authors developed a neural network approach to unify protein chemical shifts and their changes in response to changes in protein sequence, structure, and dimerization interactions.
Collapse
|
11
|
Bittrich S, Kaden M, Leberecht C, Kaiser F, Villmann T, Labudde D. Application of an interpretable classification model on Early Folding Residues during protein folding. BioData Min 2019; 12:1. [PMID: 30627219 PMCID: PMC6321665 DOI: 10.1186/s13040-018-0188-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 11/20/2018] [Indexed: 01/09/2023] Open
Abstract
Background Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. Results Generalized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/. Conclusions The application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results. Electronic supplementary material The online version of this article (10.1186/s13040-018-0188-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sebastian Bittrich
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Marika Kaden
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| | - Christoph Leberecht
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Florian Kaiser
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Thomas Villmann
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| | - Dirk Labudde
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| |
Collapse
|
12
|
Bittrich S, Schroeder M, Labudde D. Characterizing the relation of functional and Early Folding Residues in protein structures using the example of aminoacyl-tRNA synthetases. PLoS One 2018; 13:e0206369. [PMID: 30376559 PMCID: PMC6207335 DOI: 10.1371/journal.pone.0206369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 10/11/2018] [Indexed: 01/10/2023] Open
Abstract
Proteins are chains of amino acids which adopt a three-dimensional structure and are then able to catalyze chemical reactions or propagate signals in organisms. Without external influence, many proteins fold into their native structure, and a small number of Early Folding Residues (EFR) have previously been shown to initiate the formation of secondary structure elements and guide their respective assembly. Using the two diverse superfamilies of aminoacyl-tRNA synthetases (aaRS), it is shown that the position of EFR is preserved over the course of evolution even when the corresponding sequence conservation is small. Folding initiation sites are positioned in the center of secondary structure elements, independent of aaRS class. In class I, the predicted position of EFR resembles an ancient structural packing motif present in many seemingly unrelated proteins. Furthermore, it is shown that EFR and functionally relevant residues in aaRS are almost entirely disjoint sets of residues. The Start2Fold database is used to investigate whether this separation of EFR and functional residues can be observed for other proteins. EFR are found to constitute crucial connectors of protein regions which are distant at sequence level. Especially, these residues exhibit a high number of non-covalent residue-residue contacts such as hydrogen bonds and hydrophobic interactions. This tendency also manifests as energetically stable local regions, as substantiated by a knowledge-based potential. Despite profound differences regarding how EFR and functional residues are embedded in protein structures, a strict separation of structurally and functionally relevant residues cannot be observed for a more general collection of proteins.
Collapse
Affiliation(s)
- Sebastian Bittrich
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Dirk Labudde
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
| |
Collapse
|
13
|
Conformational and dynamical basis for cross-reactivity observed between anti HIV-1 protease antibody with protease and an epitope peptide from it. Int J Biol Macromol 2018; 118:1696-1707. [PMID: 29990556 DOI: 10.1016/j.ijbiomac.2018.07.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Revised: 07/03/2018] [Accepted: 07/04/2018] [Indexed: 11/23/2022]
Abstract
F11.2.32 is a monoclonal antibody raised against HIV-1 protease and it inhibits protease activity. While the structure of the epitope peptide in complex with the antibody is known, how protease interacts with the antibody is not known. In this study, we model the conformational features of the free and bound epitope peptide and protease-antibody interactions. We find through our simulations, that the free epitope peptide P36-46 samples conformations akin to the bound conformation of the peptide in complex with the Ab, with a β-turn conformation sampled by the 38LPGR41 sequence highlighting the role of inherent conformational preferences of the peptide. Further, to determine the interactions present between the protease and antibody, we docked the protease in its conformation observed in the crystal structure, onto the antibody and simulated the dynamics of the complex in explicit water. We have identified the key residues involved in hydrogen-bond interactions and salt-bridges in Ag-Ab complex and examined the role of CDR flexibility in binding different conformations of the same epitope sequence in peptide and protein antigens. Thus, our results provide the basis for understanding the cross-reactivity observed between the antibody with protease and the epitope peptide from it.
Collapse
|
14
|
Orlando G, Raimondi D, Khan T, Lenaerts T, Vranken WF. SVM-dependent pairwise HMM: an application to protein pairwise alignments. Bioinformatics 2018; 33:3902-3908. [PMID: 28666322 DOI: 10.1093/bioinformatics/btx391] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 06/12/2017] [Indexed: 12/27/2022] Open
Abstract
Motivation Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Results Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. Availability and implementation A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. Contact wim.vranken@vub.be. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gabriele Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.,Structural Biology Research Center, VIB.,Structural Machine Learning Group, Université Libre de Bruxelles
| | - Daniele Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.,Structural Biology Research Center, VIB.,Structural Machine Learning Group, Université Libre de Bruxelles
| | - Taushif Khan
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Machine Learning Group, Université Libre de Bruxelles.,Artificial Intelligence Lab, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Wim F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2.,Structural Biology Research Center, VIB
| |
Collapse
|
15
|
Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins. Sci Rep 2017; 7:8826. [PMID: 28821744 PMCID: PMC5562875 DOI: 10.1038/s41598-017-08366-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 07/10/2017] [Indexed: 11/23/2022] Open
Abstract
Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.
Collapse
|
16
|
Gupta S, Sasidhar YU. Impact of Turn Propensity on the Folding Rates of Z34C Protein: Implications for the Folding of Helix-Turn-Helix Motif. J Phys Chem B 2017; 121:1268-1283. [PMID: 28094941 DOI: 10.1021/acs.jpcb.6b12219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The rate-limiting step for the folding of the helix-turn-helix (HTH) protein, Z34C, involves β-turn region 20DPNL23. This reverse turn has been observed to be part of the transition state in the folding process for Z34C, influencing its folding rates. Molecular dynamics simulations were performed on this turn peptide and its two mutants, D20A and P21A, to study turn formation using GROMOS54A7 force field. We find that this region has a turn propensity of its own, and the highest turn propensity is observed for the wild-type, which correlates well with available experimental results. We also find that a slight unfavorable change in ΔG turn folding causes a drastic change in the folding rates of HTH motif and a mechanistic interpretation is given. Implications of these observations for the folding of the HTH protein Z34C are discussed.
Collapse
Affiliation(s)
- Shubhangi Gupta
- Department of Chemistry, Indian Institute of Technology Bombay , Powai, Mumbai 400 076, India
| | - Yellamraju U Sasidhar
- Department of Chemistry, Indian Institute of Technology Bombay , Powai, Mumbai 400 076, India
| |
Collapse
|
17
|
Orlando G, Raimondi D, Vranken WF. Observation selection bias in contact prediction and its implications for structural bioinformatics. Sci Rep 2016; 6:36679. [PMID: 27857150 PMCID: PMC5114557 DOI: 10.1038/srep36679] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 10/18/2016] [Indexed: 01/14/2023] Open
Abstract
Next Generation Sequencing is dramatically increasing the number of known protein sequences, with related experimentally determined protein structures lagging behind. Structural bioinformatics is attempting to close this gap by developing approaches that predict structure-level characteristics for uncharacterized protein sequences, with most of the developed methods relying heavily on evolutionary information collected from homologous sequences. Here we show that there is a substantial observational selection bias in this approach: the predictions are validated on proteins with known structures from the PDB, but exactly for those proteins significantly more homologs are available compared to less studied sequences randomly extracted from Uniprot. Structural bioinformatics methods that were developed this way are thus likely to have over-estimated performances; we demonstrate this for two contact prediction methods, where performances drop up to 60% when taking into account a more realistic amount of evolutionary information. We provide a bias-free dataset for the validation for contact prediction methods called NOUMENON.
Collapse
Affiliation(s)
- G Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Belgium.,Structural Biology Research Center, VIB, 1050 Brussels, Belgium
| | - D Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Belgium.,Structural Biology Research Center, VIB, 1050 Brussels, Belgium
| | - W F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Belgium.,Structural Biology Research Center, VIB, 1050 Brussels, Belgium
| |
Collapse
|
18
|
Mapping the Geometric Evolution of Protein Folding Motor. PLoS One 2016; 11:e0163993. [PMID: 27716851 PMCID: PMC5055333 DOI: 10.1371/journal.pone.0163993] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 09/19/2016] [Indexed: 11/19/2022] Open
Abstract
Polypeptide chain has an invariant main-chain and a variant side-chain sequence. How the side-chain sequence determines fold in terms of its chemical constitution has been scrutinized extensively and verified periodically. However, a focussed investigation on the directive effect of side-chain geometry may provide important insights supplementing existing algorithms in mapping the geometrical evolution of protein chains and its structural preferences. Geometrically, folding of protein structure may be envisaged as the evolution of its geometric variables: ϕ, and ψ dihedral angles of polypeptide main-chain directed by χ1, and χ2 of side chain. In this work, protein molecule is metaphorically modelled as a machine with 4 rotors ϕ, ψ, χ1 and χ2, with its evolution to the functional fold is directed by combinations of its rotor directions. We observe that differential rotor motions lead to different secondary structure formations and the combinatorial pattern is unique and consistent for particular secondary structure type. Further, we found that combination of rotor geometries of each amino acid is unique which partly explains how different amino acid sequence combinations have unique structural evolution and functional adaptation. Quantification of these amino acid rotor preferences, resulted in the generation of 3 substitution matrices, which later on plugged in the BLAST tool, for evaluating their efficiency in aligning sequences. We have employed BLOSUM62 and PAM30 as standard for primary evaluation. Generation of substitution matrices is a logical extension of the conceptual framework we attempted to build during the development of this work. Optimization of matrices following the conventional routines and possible application with biologically relevant data sets are beyond the scope of this manuscript, though it is a part of the larger project design.
Collapse
|