1
|
Dubois C, Lahfa M, Pissarra J, de Guillen K, Barthe P, Kroj T, Roumestand C, Padilla A. Combining High-Pressure NMR and Geometrical Sampling to Obtain a Full Topological Description of Protein Folding Landscapes: Application to the Folding of Two MAX Effectors from Magnaporthe oryzae. Int J Mol Sci 2022; 23:ijms23105461. [PMID: 35628267 PMCID: PMC9141691 DOI: 10.3390/ijms23105461] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/10/2022] [Accepted: 05/12/2022] [Indexed: 11/16/2022] Open
Abstract
Despite advances in experimental and computational methods, the mechanisms by which an unstructured polypeptide chain regains its unique three-dimensional structure remains one of the main puzzling questions in biology. Single-molecule techniques, ultra-fast perturbation and detection approaches and improvement in all-atom and coarse-grained simulation methods have greatly deepened our understanding of protein folding and the effects of environmental factors on folding landscape. However, a major challenge remains the detailed characterization of the protein folding landscape. Here, we used high hydrostatic pressure 2D NMR spectroscopy to obtain high-resolution experimental structural information in a site-specific manner across the polypeptide sequence and along the folding reaction coordinate. We used this residue-specific information to constrain Cyana3 calculations, in order to obtain a topological description of the entire folding landscape. This approach was used to describe the conformers populating the folding landscape of two small globular proteins, AVR-Pia and AVR-Pib, that belong to the structurally conserved but sequence-unrelated MAX effectors superfamily. Comparing the two folding landscapes, we found that, in spite of their divergent sequences, the folding pathway of these two proteins involves a similar, inescapable, folding intermediate, even if, statistically, the routes used are different.
Collapse
Affiliation(s)
- Cécile Dubois
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| | - Mounia Lahfa
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| | - Joana Pissarra
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| | - Karine de Guillen
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| | - Philippe Barthe
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| | - Thomas Kroj
- PHIM Plant Health Institute, University of Montpellier, INRAE, CIRAD, Institut Agro, IRD, 34000 Montpellier, France
| | - Christian Roumestand
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| | - André Padilla
- Centre de Biologie Structurale, University of Montpellier, INSERM U1054, CNRS UMR 5048, 34000 Montpellier, France
| |
Collapse
|
2
|
Chen Z, Zhang N, Chu HY, Yu Y, Zhang ZK, Zhang G, Zhang BT. Connective Tissue Growth Factor: From Molecular Understandings to Drug Discovery. Front Cell Dev Biol 2020; 8:593269. [PMID: 33195264 PMCID: PMC7658337 DOI: 10.3389/fcell.2020.593269] [Citation(s) in RCA: 73] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Accepted: 10/09/2020] [Indexed: 01/18/2023] Open
Abstract
Connective tissue growth factor (CTGF) is a key signaling and regulatory molecule involved in different biological processes, such as cell proliferation, angiogenesis, and wound healing, as well as multiple pathologies, such as tumor development and tissue fibrosis. Although the underlying mechanisms of CTGF remain incompletely understood, a commonly accepted theory is that the interactions between different protein domains in CTGF and other various regulatory proteins and ligands contribute to its variety of functions. Here, we highlight the structure of each domain of CTGF and its biology functions in physiological conditions. We further summarized main diseases that are deeply influenced by CTGF domains and the potential targets of these diseases. Finally, we address the advantages and disadvantages of current drugs targeting CTGF and provide the perspective for the drug discovery of the next generation of CTGF inhibitors based on aptamers.
Collapse
Affiliation(s)
- Zihao Chen
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Ning Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Hang Yin Chu
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| | - Yuanyuan Yu
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| | - Zong-Kang Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| | - Ge Zhang
- Law Sau Fai Institute for Advancing Translational Medicine in Bone and Joint Diseases, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
| | - Bao-Ting Zhang
- School of Chinese Medicine, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China
| |
Collapse
|
3
|
Adhikari B. A fully open-source framework for deep learning protein real-valued distances. Sci Rep 2020; 10:13374. [PMID: 32770096 PMCID: PMC7414848 DOI: 10.1038/s41598-020-70181-0] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 07/23/2020] [Indexed: 11/12/2022] Open
Abstract
As deep learning algorithms drive the progress in protein structure prediction, a lot remains to be studied at this merging superhighway of deep learning and protein structure prediction. Recent findings show that inter-residue distance prediction, a more granular version of the well-known contact prediction problem, is a key to predicting accurate models. However, deep learning methods that predict these distances are still in the early stages of their development. To advance these methods and develop other novel methods, a need exists for a small and representative dataset packaged for faster development and testing. In this work, we introduce protein distance net (PDNET), a framework that consists of one such representative dataset along with the scripts for training and testing deep learning methods. The framework also includes all the scripts that were used to curate the dataset, and generate the input features and distance maps. Deep learning models can also be trained and tested in a web browser using free platforms such as Google Colab. We discuss how PDNET can be used to predict contacts, distance intervals, and real-valued distances.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, St. Louis, MO, 63132, USA.
| |
Collapse
|
4
|
Bittrich S, Schroeder M, Labudde D. StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map. Sci Rep 2019; 9:18517. [PMID: 31811259 PMCID: PMC6898053 DOI: 10.1038/s41598-019-55047-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022] Open
Abstract
Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.
Collapse
Affiliation(s)
- Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany. .,Biotechnology Center (BIOTEC), TU Dresden, Dresden, 01307, Germany. .,Research Collaboratory for Structural Bioinformatics Protein Data Bank, University of California, San Diego, La Jolla, CA, 92093, USA.
| | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany
| |
Collapse
|
5
|
Hou J, Wu T, Cao R, Cheng J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 2019; 87:1165-1178. [PMID: 30985027 PMCID: PMC6800999 DOI: 10.1002/prot.25697] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 04/04/2019] [Accepted: 04/12/2019] [Indexed: 12/28/2022]
Abstract
Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.
Collapse
Affiliation(s)
- Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, Washington
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| |
Collapse
|
6
|
Bittrich S, Kaden M, Leberecht C, Kaiser F, Villmann T, Labudde D. Application of an interpretable classification model on Early Folding Residues during protein folding. BioData Min 2019; 12:1. [PMID: 30627219 PMCID: PMC6321665 DOI: 10.1186/s13040-018-0188-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2018] [Accepted: 11/20/2018] [Indexed: 01/09/2023] Open
Abstract
Background Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. Results Generalized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/. Conclusions The application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results. Electronic supplementary material The online version of this article (10.1186/s13040-018-0188-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sebastian Bittrich
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Marika Kaden
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| | - Christoph Leberecht
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Florian Kaiser
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany.,2Biotechnology Center (BIOTEC) TU Dresden, Tatzberg 47/49, Dresden, 01307 Germany
| | - Thomas Villmann
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| | - Dirk Labudde
- 1University of Applied Sciences Mittweida, Technikumplatz 17, Mittweida, 09648 Germany
| |
Collapse
|
7
|
Bittrich S, Schroeder M, Labudde D. Characterizing the relation of functional and Early Folding Residues in protein structures using the example of aminoacyl-tRNA synthetases. PLoS One 2018; 13:e0206369. [PMID: 30376559 PMCID: PMC6207335 DOI: 10.1371/journal.pone.0206369] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2018] [Accepted: 10/11/2018] [Indexed: 01/10/2023] Open
Abstract
Proteins are chains of amino acids which adopt a three-dimensional structure and are then able to catalyze chemical reactions or propagate signals in organisms. Without external influence, many proteins fold into their native structure, and a small number of Early Folding Residues (EFR) have previously been shown to initiate the formation of secondary structure elements and guide their respective assembly. Using the two diverse superfamilies of aminoacyl-tRNA synthetases (aaRS), it is shown that the position of EFR is preserved over the course of evolution even when the corresponding sequence conservation is small. Folding initiation sites are positioned in the center of secondary structure elements, independent of aaRS class. In class I, the predicted position of EFR resembles an ancient structural packing motif present in many seemingly unrelated proteins. Furthermore, it is shown that EFR and functionally relevant residues in aaRS are almost entirely disjoint sets of residues. The Start2Fold database is used to investigate whether this separation of EFR and functional residues can be observed for other proteins. EFR are found to constitute crucial connectors of protein regions which are distant at sequence level. Especially, these residues exhibit a high number of non-covalent residue-residue contacts such as hydrogen bonds and hydrophobic interactions. This tendency also manifests as energetically stable local regions, as substantiated by a knowledge-based potential. Despite profound differences regarding how EFR and functional residues are embedded in protein structures, a strict separation of structurally and functionally relevant residues cannot be observed for a more general collection of proteins.
Collapse
Affiliation(s)
- Sebastian Bittrich
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Technische Universität Dresden, Dresden, Saxony, Germany
| | - Dirk Labudde
- Applied Computer Sciences & Biosciences, University of Applied Sciences Mittweida, Mittweida, Saxony, Germany
| |
Collapse
|