1
|
Todd BP, Downard KM. Structural Phylogenetics with Protein Mass Spectrometry: A Proof-of-Concept. Protein J 2024; 43:997-1008. [PMID: 39078529 DOI: 10.1007/s10930-024-10227-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/18/2024] [Indexed: 07/31/2024]
Abstract
It is demonstrated, for the first time, that a mass spectrometry approach (known as phylonumerics) can be successfully implemented for structural phylogenetics investigations to chart the evolution of a protein's structure and function. Illustrated for the compact globular protein myoglobin, peptide masses produced from the proteolytic digestion of the protein across animal species generate trees congruent to the sequence tree counterparts. Single point mutations calculated during the same mass tree building step can be followed along interconnected branches of the tree and represent a viable structural metric. A mass tree built for 15 diverse animal species, easily resolve the birds from mammal species, and the ruminant mammals from the remainder of the animals. Mutations within helix-spanning peptide segments alter both the mass and structure of the protein in these segments. Greater evolution is found in the B-helix over the A, E, F, G and H helices. A further mass tree study, of six more closely related primate species, resolves gorilla from the other primates based on a P22S mutation within the B-helix. The remaining five primates are resolved into two groups based on whether they contain a glycine or serine at position 23 in the same helix. The orangutan is resolved from the gibbon and siamang by its G-helix C110S mutation, while homo sapiens are resolved from chimpanzee based on the Q116H mutation. All are associated with structural perturbations in such helices. These structure altering mutations can be tracked along interconnecting branches of a mass tree, to follow the protein's structure and evolution, and ultimately the evolution of the species in which the proteins are expressed. Those that have the greatest impact on a protein's structure, its function, and ultimately the evolution of the species, can be selectively tracked or monitored.
Collapse
Affiliation(s)
- Benjamin P Todd
- Infectious Disease Responses Laboratory, Prince of Wales Clinical Research Sciences, Sydney, NSW, Australia
| | - Kevin M Downard
- Infectious Disease Responses Laboratory, Prince of Wales Clinical Research Sciences, Sydney, NSW, Australia.
| |
Collapse
|
2
|
Divergence Entropy-Based Evaluation of Hydrophobic Core in Aggressive and Resistant Forms of Transthyretin. ENTROPY 2021; 23:e23040458. [PMID: 33924717 PMCID: PMC8070611 DOI: 10.3390/e23040458] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 04/01/2021] [Accepted: 04/09/2021] [Indexed: 12/19/2022]
Abstract
The two forms of transthyretin differing slightly in the tertiary structure, despite the presence of five mutations, show radically different properties in terms of susceptibility to the amyloid transformation process. These two forms of transthyretin are the object of analysis. The search for the sources of these differences was carried out by means of a comparative analysis of the structure of these molecules in their native and early intermediate stage forms in the folding process. The criterion for assessing the degree of similarity and differences is the status of the hydrophobic core. The comparison of the level of arrangement of the hydrophobic core and its initial stages is possible thanks to the application of divergence entropy for the early intermediate stage and for the final forms. It was shown that the minimal differences observed in the structure of the hydrophobic core of the forms available in PDB, turned out to be significantly different in the early stage (ES) structure in folding process. The determined values of divergence entropy for both ES forms indicate the presence of the seed of hydrophobic core only in the form resistant to amyloid transformation. In the form of aggressively undergoing amyloid transformation, the structure lacking such a seed is revealed, being a stretched one with a high content of β-type structure. In the discussed case, the active presence of water in the structural transformation of proteins expressed in the fuzzy oil drop model (FOD) is of decisive importance for the generation of the final protein structure. It has been shown that the resistant form tends to generate a centric hydrophobic core with the possibility of creating a globular structure, i.e., a spherical micelle-like form. The aggressively transforming form reveals in the structure of its early intermediate, a tendency to form the ribbon-like micelle as observed in amyloid.
Collapse
|
3
|
Durairaj J, Akdel M, de Ridder D, van Dijk ADJ. Geometricus represents protein structures as shape-mers derived from moment invariants. Bioinformatics 2021; 36:i718-i725. [PMID: 33381814 DOI: 10.1093/bioinformatics/btaa839] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2020] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION As the number of experimentally solved protein structures rises, it becomes increasingly appealing to use structural information for predictive tasks involving proteins. Due to the large variation in protein sizes, folds and topologies, an attractive approach is to embed protein structures into fixed-length vectors, which can be used in machine learning algorithms aimed at predicting and understanding functional and physical properties. Many existing embedding approaches are alignment based, which is both time-consuming and ineffective for distantly related proteins. On the other hand, library- or model-based approaches depend on a small library of fragments or require the use of a trained model, both of which may not generalize well. RESULTS We present Geometricus, a novel and universally applicable approach to embedding proteins in a fixed-dimensional space. The approach is fast, accurate, and interpretable. Geometricus uses a set of 3D moment invariants to discretize fragments of protein structures into shape-mers, which are then counted to describe the full structure as a vector of counts. We demonstrate the applicability of this approach in various tasks, ranging from fast structure similarity search, unsupervised clustering and structure classification across proteins from different superfamilies as well as within the same family. AVAILABILITY AND IMPLEMENTATION Python code available at https://git.wur.nl/durai001/geometricus.
Collapse
Affiliation(s)
| | - Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences
| | | | - Aalt D J van Dijk
- Bioinformatics Group, Department of Plant Sciences.,Mathematical and Statistical Methods - Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen 6700AP, The Netherlands
| |
Collapse
|
4
|
Nojoomi S, Koehl P. A weighted string kernel for protein fold recognition. BMC Bioinformatics 2017; 18:378. [PMID: 28841820 PMCID: PMC5574112 DOI: 10.1186/s12859-017-1795-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 08/15/2017] [Indexed: 11/10/2022] Open
Abstract
Background Alignment-free methods for comparing protein sequences have proved to be viable alternatives to approaches that first rely on an alignment of the sequences to be compared. Much work however need to be done before those methods provide reliable fold recognition for proteins whose sequences share little similarity. We have recently proposed an alignment-free method based on the concept of string kernels, SeqKernel (Nojoomi and Koehl, BMC Bioinformatics, 2017, 18:137). In this previous study, we have shown that while Seqkernel performs better than standard alignment-based methods, its applications are potentially limited, because of biases due mostly to sequence length effects. Methods In this study, we propose improvements to SeqKernel that follows two directions. First, we developed a weighted version of the kernel, WSeqKernel. Second, we expand the concept of string kernels into a novel framework for deriving information on amino acids from protein sequences. Results Using a dataset that only contains remote homologs, we have shown that WSeqKernel performs remarkably well in fold recognition experiments. We have shown that with the appropriate weighting scheme, we can remove the length effects on the kernel values. WSeqKernel, just like any alignment-based sequence comparison method, depends on a substitution matrix. We have shown that this matrix can be optimized so that sequence similarity scores correlate well with structure similarity scores. Starting from no information on amino acid similarity, we have shown that we can derive a scoring matrix that echoes the physico-chemical properties of amino acids. Conclusion We have made progress in characterizing and parametrizing string kernels as alignment-based methods for comparing protein sequences, and we have shown that they provide a framework for extracting sequence information from structure. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1795-5) contains supplementary material, which is available to authorized users.
Collapse
|
5
|
Levy Y. Protein Assembly and Building Blocks: Beyond the Limits of the LEGO Brick Metaphor. Biochemistry 2017; 56:5040-5048. [PMID: 28809494 DOI: 10.1021/acs.biochem.7b00666] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Proteins, like other biomolecules, have a modular and hierarchical structure. Various building blocks are used to construct proteins of high structural complexity and diverse functionality. In multidomain proteins, for example, domains are fused to each other in different combinations to achieve different functions. Although the LEGO brick metaphor is justified as a means of simplifying the complexity of three-dimensional protein structures, several fundamental properties (such as allostery or the induced-fit mechanism) make deviation from it necessary to respect the plasticity, softness, and cross-talk that are essential to protein function. In this work, we illustrate recently reported protein behavior in multidomain proteins that deviates from the LEGO brick analogy. While earlier studies showed that a protein domain is often unaffected by being fused to another domain or becomes more stable following the formation of a new interface between the tethered domains, destabilization due to tethering has been reported for several systems. We illustrate that tethering may sometimes result in a multidomain protein behaving as "less than the sum of its parts". We survey these cases for which structure additivity does not guarantee thermodynamic additivity. Protein destabilization due to fusion to other domains may be linked in some cases to biological function and should be taken into account when designing large assemblies.
Collapse
Affiliation(s)
- Yaakov Levy
- Department of Structural Biology, Weizmann Institute of Science , Rehovot 76100, Israel
| |
Collapse
|
6
|
Nojoomi S, Koehl P. String kernels for protein sequence comparisons: improved fold recognition. BMC Bioinformatics 2017; 18:137. [PMID: 28245816 PMCID: PMC5331664 DOI: 10.1186/s12859-017-1560-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Accepted: 02/23/2017] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND The amino acid sequence of a protein is the blueprint from which its structure and ultimately function can be derived. Therefore, sequence comparison methods remain essential for the determination of similarity between proteins. Traditional approaches for comparing two protein sequences begin with strings of letters (amino acids) that represent the sequences, before generating textual alignments between these strings and providing scores for each alignment. When the similitude between the two protein sequences to be compared is low however, the quality of the corresponding sequence alignment is usually poor, leading to poor performance for the recognition of similarity. RESULTS In this study, we develop an alignment free alternative to these methods that is based on the concept of string kernels. Starting from recently proposed kernels on the discrete space of protein sequences (Shen et al, Found. Comput. Math., 2013,14:951-984), we introduce our own version, SeqKernel. Its implementation depends on two parameters, a coefficient that tunes the substitution matrix and the maximum length of k-mers that it includes. We provide an exhaustive analysis of the impacts of these two parameters on the performance of SeqKernel for fold recognition. We show that with the right choice of parameters, use of the SeqKernel similarity measure improves fold recognition compared to the use of traditional alignment-based methods. We illustrate the application of SeqKernel to inferring phylogeny on RNA polymerases and show that it performs as well as methods based on multiple sequence alignments. CONCLUSION We have presented and characterized a new alignment free method based on a mathematical kernel for scoring the similarity of protein sequences. We discuss possible improvements of this method, as well as an extension of its applications to other modeling methods that rely on sequence comparison.
Collapse
Affiliation(s)
- Saghi Nojoomi
- Biotechnology program, University of California, Davis, 1, Shields Avenue, Davis, CA, 95616 USA
| | - Patrice Koehl
- Department of Computer Science and Genome Center, 1, Shields Avenue, Davis, CA, 95616 USA
| |
Collapse
|
7
|
Pandini A, Fornili A. Using Local States To Drive the Sampling of Global Conformations in Proteins. J Chem Theory Comput 2016; 12:1368-79. [PMID: 26808351 PMCID: PMC5356493 DOI: 10.1021/acs.jctc.5b00992] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Conformational
changes associated with protein function often occur
beyond the time scale currently accessible to unbiased molecular dynamics
(MD) simulations, so that different approaches have been developed
to accelerate their sampling. Here we investigate how the knowledge
of backbone conformations preferentially adopted by protein fragments,
as contained in precalculated libraries known as structural alphabets
(SA), can be used to explore the landscape of protein conformations
in MD simulations. We find that (a) enhancing the sampling of native
local states in both metadynamics and steered MD simulations allows
the recovery of global folded states in small proteins; (b) folded
states can still be recovered when the amount of information on the
native local states is reduced by using a low-resolution version of
the SA, where states are clustered into macrostates; and (c) sequences
of SA states derived from collections of structural motifs can be
used to sample alternative conformations of preselected protein regions.
The present findings have potential impact on several applications,
ranging from protein model refinement to protein folding and design.
Collapse
Affiliation(s)
- Alessandro Pandini
- Department of Computer Science, College of Engineering, Design and Physical Sciences and Synthetic Biology Theme, Institute of Environment, Health and Societies, Brunel University London , Uxbridge UB8 3PH, United Kingdom
| | - Arianna Fornili
- School of Biological and Chemical Sciences, Queen Mary University of London , Mile End Road, London E1 4NS, United Kingdom
| |
Collapse
|
8
|
Craveur P, Joseph AP, Esque J, Narwani TJ, Noël F, Shinada N, Goguet M, Leonard S, Poulain P, Bertrand O, Faure G, Rebehmed J, Ghozlane A, Swapna LS, Bhaskara RM, Barnoud J, Téletchéa S, Jallu V, Cerny J, Schneider B, Etchebest C, Srinivasan N, Gelly JC, de Brevern AG. Protein flexibility in the light of structural alphabets. Front Mol Biosci 2015; 2:20. [PMID: 26075209 PMCID: PMC4445325 DOI: 10.3389/fmolb.2015.00020] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 04/30/2015] [Indexed: 01/01/2023] Open
Abstract
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases.
Collapse
Affiliation(s)
- Pierrick Craveur
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Agnel P Joseph
- Rutherford Appleton Laboratory, Science and Technology Facilities Council Didcot, UK
| | - Jeremy Esque
- Institut National de la Santé et de la Recherche Médicale U964,7 UMR Centre National de la Recherche Scientifique 7104, IGBMC, Université de Strasbourg Illkirch, France
| | - Tarun J Narwani
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Floriane Noël
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Nicolas Shinada
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Matthieu Goguet
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Sylvain Leonard
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Pierre Poulain
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France ; Ets Poulain Pointe-Noire, Congo
| | - Olivier Bertrand
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Guilhem Faure
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health Bethesda, MD, USA
| | - Joseph Rebehmed
- Centre National de la Recherche Scientifique UMR7590, Sorbonne Universités, Université Pierre et Marie Curie - MNHN - IRD - IUC Paris, France
| | | | - Lakshmipuram S Swapna
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore Bangalore, India ; Hospital for Sick Children, and Departments of Biochemistry and Molecular Genetics, University of Toronto Toronto, ON, Canada
| | - Ramachandra M Bhaskara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore Bangalore, India ; Department of Theoretical Biophysics, Max Planck Institute of Biophysics Frankfurt, Germany
| | - Jonathan Barnoud
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France ; Laboratoire de Physique, École Normale Supérieure de Lyon, Université de Lyon, Centre National de la Recherche Scientifique UMR 5672 Lyon, France
| | - Stéphane Téletchéa
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France ; Faculté des Sciences et Techniques, Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines, Centre National de la Recherche Scientifique UMR 6286, Université Nantes Nantes, France
| | - Vincent Jallu
- Platelet Unit, Institut National de la Transfusion Sanguine Paris, France
| | - Jiri Cerny
- Institute of Biotechnology, The Czech Academy of Sciences Prague, Czech Republic
| | - Bohdan Schneider
- Institute of Biotechnology, The Czech Academy of Sciences Prague, Czech Republic
| | - Catherine Etchebest
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | | | - Jean-Christophe Gelly
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Alexandre G de Brevern
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| |
Collapse
|
9
|
Li J, Koehl P. 3D representations of amino acids-applications to protein sequence comparison and classification. Comput Struct Biotechnol J 2014; 11:47-58. [PMID: 25379143 PMCID: PMC4212284 DOI: 10.1016/j.csbj.2014.09.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
The amino acid sequence of a protein is the key to understanding its structure and ultimately its function in the cell. This paper addresses the fundamental issue of encoding amino acids in ways that the representation of such a protein sequence facilitates the decoding of its information content. We show that a feature-based representation in a three-dimensional (3D) space derived from amino acid substitution matrices provides an adequate representation that can be used for direct comparison of protein sequences based on geometry. We measure the performance of such a representation in the context of the protein structural fold prediction problem. We compare the results of classifying different sets of proteins belonging to distinct structural folds against classifications of the same proteins obtained from sequence alone or directly from structural information. We find that sequence alone performs poorly as a structure classifier. We show in contrast that the use of the three dimensional representation of the sequences significantly improves the classification accuracy. We conclude with a discussion of the current limitations of such a representation and with a description of potential improvements.
Collapse
Affiliation(s)
- Jie Li
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, United States
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, One Shields Ave, Davis, CA 95616, United States
| |
Collapse
|
10
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
11
|
Mao W, Cong P, Wang Z, Lu L, Zhu Z, Li T. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data. PLoS One 2013; 8:e83532. [PMID: 24376713 PMCID: PMC3871590 DOI: 10.1371/journal.pone.0083532] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2013] [Accepted: 11/04/2013] [Indexed: 11/28/2022] Open
Abstract
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Collapse
Affiliation(s)
- Wusong Mao
- Department of Chemistry, Tongji University, Shanghai, China
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| | - Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Longjian Lu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Zhongliang Zhu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| |
Collapse
|
12
|
Shen Y, Picord G, Guyon F, Tuffery P. Detecting protein candidate fragments using a structural alphabet profile comparison approach. PLoS One 2013; 8:e80493. [PMID: 24303019 PMCID: PMC3841190 DOI: 10.1371/journal.pone.0080493] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 10/03/2013] [Indexed: 01/28/2023] Open
Abstract
Predicting accurate fragments from sequence has recently become a critical step for protein structure modeling, as protein fragment assembly techniques are presently among the most efficient approaches for de novo prediction. A key step in these approaches is, given the sequence of a protein to model, the identification of relevant fragments - candidate fragments - from a collection of the available 3D structures. These fragments can then be assembled to produce a model of the complete structure of the protein of interest. The search for candidate fragments is classically achieved by considering local sequence similarity using profile comparison, or threading approaches. In the present study, we introduce a new profile comparison approach that, instead of using amino acid profiles, is based on the use of predicted structural alphabet profiles, where structural alphabet profiles contain information related to the 3D local shapes associated with the sequences. We show that structural alphabet profile-profile comparison can be used efficiently to retrieve accurate structural fragments, and we introduce a fully new protocol for the detection of candidate fragments. It identifies fragments specific of each position of the sequence and of size varying between 6 and 27 amino-acids. We find it outperforms present state of the art approaches in terms (i) of the accuracy of the fragments identified, (ii) the rate of true positives identified, while having a high coverage score. We illustrate the relevance of the approach on complete target sets of the two previous Critical Assessment of Techniques for Protein Structure Prediction (CASP) rounds 9 and 10. A web server for the approach is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/SAFrag.
Collapse
Affiliation(s)
- Yimin Shen
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Géraldine Picord
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Frédéric Guyon
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Pierre Tuffery
- INSERM, U973, MTi, Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, Paris, France
- RPBS, Paris, France
- * E-mail:
| |
Collapse
|
13
|
Sunami T, Kono H. Local conformational changes in the DNA interfaces of proteins. PLoS One 2013; 8:e56080. [PMID: 23418514 PMCID: PMC3571985 DOI: 10.1371/journal.pone.0056080] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2012] [Accepted: 01/03/2013] [Indexed: 11/18/2022] Open
Abstract
When a protein binds to DNA, a conformational change is often induced so that the protein will fit into the DNA structure. Therefore, quantitative analyses were conducted to understand the conformational changes in proteins. The results showed that conformational changes in DNA interfaces are more frequent than in non-interfaces, and DNA interfaces have more conformational variations in the DNA-free form. As expected, the former indicates that interaction with DNA has some influence on protein structure. The latter suggests that the intrinsic conformational flexibility of DNA interfaces is important for adjusting their conformation for DNA. The amino acid propensities of the conformationally changed regions in DNA interfaces indicate that hydrophilic residues are preferred over the amino acids that appear in the conformationally unchanged regions. This trend is true for disordered regions, suggesting again that intrinsic flexibility is of importance not only for DNA binding but also for interactions with other molecules. These results demonstrate that fragments destined to be DNA interfaces have an intrinsic flexibility and are composed of amino acids with the capability of binding to DNA. This information suggests that the prediction of DNA binding sites may be improved by the integration of amino acid preference for DNA and one for disordered regions.
Collapse
Affiliation(s)
- Tomoko Sunami
- Molecular Modeling and Simulation Group, Quantum Beam Science Directorate, Japan Atomic Energy Agency, Kizugawa, Kyoto, Japan
| | - Hidetoshi Kono
- Molecular Modeling and Simulation Group, Quantum Beam Science Directorate, Japan Atomic Energy Agency, Kizugawa, Kyoto, Japan
- * E-mail:
| |
Collapse
|
14
|
Chellapa GD, Rose GD. Reducing the dimensionality of the protein-folding search problem. Protein Sci 2012; 21:1231-40. [PMID: 22692765 DOI: 10.1002/pro.2106] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2012] [Revised: 06/04/2012] [Accepted: 06/05/2012] [Indexed: 11/10/2022]
Abstract
How does a folding protein negotiate a vast, featureless conformational landscape and adopt its native structure in biological real time? Motivated by this search problem, we developed a novel algorithm to compare protein structures. Procedures to identify structural analogs are typically conducted in three-dimensional space: the tertiary structure of a target protein is matched against each candidate in a database of structures, and goodness of fit is evaluated by a distance-based measure, such as the root-mean-square distance between target and candidate. This is an expensive approach because three-dimensional space is complex. Here, we transform the problem into a simpler one-dimensional procedure. Specifically, we identify and label the 11 most populated residue basins in a database of high-resolution protein structures. Using this 11-letter alphabet, any protein's three-dimensional structure can be transformed into a one-dimensional string by mapping each residue onto its corresponding basin. Similarity between the resultant basin strings can then be evaluated by conventional sequence-based comparison. The disorder → order folding transition is abridged on both sides. At the onset, folding conditions necessitate formation of hydrogen-bonded scaffold elements on which proteins are assembled, severely restricting the magnitude of accessible conformational space. Near the end, chain topology is established prior to emergence of the close-packed native state. At this latter stage of folding, the chain remains molten, and residues populate natural basins that are approximated by the 11 basins derived here. In essence, our algorithm reduces the protein-folding search problem to mapping the amino acid sequence onto a restricted basin string.
Collapse
Affiliation(s)
- George D Chellapa
- TC Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | | |
Collapse
|
15
|
Kawabata T. Build-up algorithm for atomic correspondence between chemical structures. J Chem Inf Model 2011; 51:1775-87. [PMID: 21736325 DOI: 10.1021/ci2001023] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Determining a one-to-one atom correspondence between two chemical compounds is important to measure molecular similarities and to find compounds with similar biological activities. This calculation can be formalized as the maximum common substructure (MCS) problem, which is well-studied and has been shown to be NP-complete. Although many rigorous and heuristic algorithms have been developed, none of these algorithms is sufficiently fast and accurate. We developed a new program, called "kcombu" using a build-up algorithm, which is a type of the greedy heuristic algorithms. The program can search connected and disconnected MCSs as well as topologically constrained disconnected MCS (TD-MCS), which is introduced in this study. To evaluate the performance of our program, we prepared two correct standards: the exact correspondences generated by the maximum clique algorithms and the 3D correspondences obtained from superimposed 3D structure of the molecules in a complex 3D structure with the same protein. For the five sets of molecules taken from the protein structure database, the agreement value between the build-up and the exact correspondences for the connected MCS is sufficiently high, but the computation time of the build-up algorithm is much smaller than that of the exact algorithm. The comparison between the build-up and the 3D correspondences shows that the TD-MCS has the best agreement value among the other types of MCS. Additionally, we observed a strong correlation between the molecular similarity and the agreement with the correct and 3D correspondences; more similar molecule pairs are more correctly matched. Molecular pairs with more than 40% Tanimoto similarities can be correctly matched for more than half of the atoms with the 3D correspondences.
Collapse
Affiliation(s)
- Takeshi Kawabata
- Institute of Protein Science, Osaka University, Osaka 565-0871 Japan.
| |
Collapse
|
16
|
Verschueren E, Vanhee P, van der Sloot AM, Serrano L, Rousseau F, Schymkowitz J. Protein design with fragment databases. Curr Opin Struct Biol 2011; 21:452-9. [PMID: 21684149 DOI: 10.1016/j.sbi.2011.05.002] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2011] [Accepted: 05/25/2011] [Indexed: 11/25/2022]
Abstract
Structure-based computational methods are popular tools for designing proteins and interactions between proteins because they provide the necessary insight and details required for rational engineering. Here, we first argue that large-scale databases of fragments contain a discrete but complete set of building blocks that can be used to design structures. We show that these structural alphabets can be saturated to provide conformational ensembles that sample the native structure space around energetic minima. Second, we show that catalogs of interaction patterns hold the key to overcome the lack of scaffolds when computationally designing protein interactions. Finally, we illustrate the power of database-driven computational protein design methods by recent successful applications and discuss what challenges remain to push this field forward.
Collapse
Affiliation(s)
- Erik Verschueren
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG) and UPF, Barcelona, Spain
| | | | | | | | | | | |
Collapse
|
17
|
Gamliel R, Kedem K, Kolodny R, Keasar C. A library of protein surface patches discriminates between native structures and decoys generated by structure prediction servers. BMC STRUCTURAL BIOLOGY 2011; 11:20. [PMID: 21542935 PMCID: PMC3114701 DOI: 10.1186/1472-6807-11-20] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2010] [Accepted: 05/04/2011] [Indexed: 11/10/2022]
Abstract
Background Protein surfaces serve as an interface with the molecular environment and are thus tightly bound to protein function. On the surface, geometric and chemical complementarity to other molecules provides interaction specificity for ligand binding, docking of bio-macromolecules, and enzymatic catalysis. As of today, there is no accepted general scheme to represent protein surfaces. Furthermore, most of the research on protein surface focuses on regions of specific interest such as interaction, ligand binding, and docking sites. We present a first step toward a general purpose representation of protein surfaces: a novel surface patch library that represents most surface patches (~98%) in a data set regardless of their functional roles. Results Surface patches, in this work, are small fractions of the protein surface. Using a measure of inter-patch distance, we clustered patches extracted from a data set of high quality, non-redundant, proteins. The surface patch library is the collection of all the cluster centroids; thus, each of the data set patches is close to one of the elements in the library. We demonstrate the biological significance of our method through the ability of the library to capture surface characteristics of native protein structures as opposed to those of decoy sets generated by state-of-the-art protein structure prediction methods. The patches of the decoys are significantly less compatible with the library than their corresponding native structures, allowing us to reliably distinguish native models from models generated by servers. This trend, however, does not extend to the decoys themselves, as their similarity to the native structures does not correlate with compatibility with the library. Conclusions We expect that this high-quality, generic surface patch library will add a new perspective to the description of protein structures and improve our ability to predict them. In particular, we expect that it will help improve the prediction of surface features that are apparently neglected by current techniques. The surface patch libraries are publicly available at http://www.cs.bgu.ac.il/~keasar/patchLibrary.
Collapse
Affiliation(s)
- Roi Gamliel
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | | | | | | |
Collapse
|
18
|
Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, Schymkowitz J. BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res 2010; 39:D435-42. [PMID: 20972210 PMCID: PMC3013806 DOI: 10.1093/nar/gkq972] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-resolution structures of proteins remain the most valuable source for understanding their function in the cell and provide leads for drug design. Since the availability of sufficient protein structures to tackle complex problems such as modeling backbone moves or docking remains a problem, alternative approaches using small, recurrent protein fragments have been employed. Here we present two databases that provide a vast resource for implementing such fragment-based strategies. The BriX database contains fragments from over 7000 non-homologous proteins from the Astral collection, segmented in lengths from 4 to 14 residues and clustered according to structural similarity, summing up to a content of 2 million fragments per length. To overcome the lack of loops classified in BriX, we constructed the Loop BriX database of non-regular structure elements, clustered according to end-to-end distance between the regular residues flanking the loop. Both databases are available online (http://brix.crg.es) and can be accessed through a user-friendly web-interface. For high-throughput queries a web-based API is provided, as well as full database downloads. In addition, two exciting applications are provided as online services: (i) user-submitted structures can be covered on the fly with BriX classes, representing putative structural variation throughout the protein and (ii) gaps or low-confidence regions in these structures can be bridged with matching fragments.
Collapse
Affiliation(s)
- Peter Vanhee
- VIB SWITCH Laboratory, Flanders Institute of Biotechnology, Free University of Brussels, Pleinlaan 2, 1050 Brussels, Belgium
| | | | | | | | | | | | | |
Collapse
|
19
|
Pandini A, Fornili A, Kleinjung J. Structural alphabets derived from attractors in conformational space. BMC Bioinformatics 2010; 11:97. [PMID: 20170534 PMCID: PMC2838871 DOI: 10.1186/1471-2105-11-97] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 02/20/2010] [Indexed: 11/20/2022] Open
Abstract
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.
Collapse
Affiliation(s)
- Alessandro Pandini
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, UK
| | | | | |
Collapse
|