1
|
Kranjc A, Narwani TJ, Abby SS, de Brevern AG. Structural Space of the Duffy Antigen/Receptor for Chemokines' Intrinsically Disordered Ectodomain 1 Explored by Temperature Replica-Exchange Molecular Dynamics Simulations. Int J Mol Sci 2023; 24:13280. [PMID: 37686086 PMCID: PMC10488288 DOI: 10.3390/ijms241713280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/18/2023] [Accepted: 08/22/2023] [Indexed: 09/10/2023] Open
Abstract
Plasmodium vivax malaria affects 14 million people each year. Its invasion requires interactions between the parasitic Duffy-binding protein (PvDBP) and the N-terminal extracellular domain (ECD1) of the host's Duffy antigen/receptor for chemokines (DARC). ECD1 is highly flexible and intrinsically disordered, therefore it can adopt different conformations. We computationally modeled the challenging ECD1 local structure. With T-REMD simulations, we sampled its dynamic behavior and collected its most representative conformations. Our results suggest that most of the DARC ECD1 domain remains in a disordered state during the simulated time. Globular local conformations are found in the analyzed local free-energy minima. These globular conformations share an α-helix spanning residues Ser18 to Ser29 and in many cases they comprise an antiparallel β-sheet, whose β-strands are formed around residues Leu10 and Ala49. The formation of a parallel β-sheet is almost negligible. So far, progress in understanding the mechanisms forming the basis of the P. vivax malaria infection of reticulocytes has been hampered by experimental difficulties, along with a lack of DARC structural information. Our collection of the most probable ECD1 structural conformations will help to advance modeling of the DARC structure and to explore DARC-ECD1 interactions with a range of physiological and pathological ligands.
Collapse
Affiliation(s)
- Agata Kranjc
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, F-75014 Paris, France;
- Institut National de la Transfusion Sanguine (INTS), F-75015 Paris, France
- Institute of Neuroscience and Medicine (INM-9)/Institute for Advanced Simulation (IAS-5), Forschungszentrum Jülich, D-52425 Jülich, Germany
| | - Tarun Jairaj Narwani
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, F-75014 Paris, France;
- Institut National de la Transfusion Sanguine (INTS), F-75015 Paris, France
| | - Sophie S. Abby
- University Grenoble Alpes, CNRS, UMR 5525, VetAgro Sup, Grenoble INP, TIMC, F-38000 Grenoble, France;
| | - Alexandre G. de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, BIGR, UMR_S1134, DSIMB Team, Inserm, F-75014 Paris, France;
- Institut National de la Transfusion Sanguine (INTS), F-75015 Paris, France
| |
Collapse
|
2
|
de Brevern AG. An agnostic analysis of the human AlphaFold2 proteome using local protein conformations. Biochimie 2023; 207:11-19. [PMID: 36417962 DOI: 10.1016/j.biochi.2022.11.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/14/2022] [Accepted: 11/17/2022] [Indexed: 11/21/2022]
Abstract
Knowledge of the 3D structure of proteins is a valuable asset for understanding their precise biological mechanisms. However, the cost of production of 3D structures and experimental difficulties limit their obtaining. The proposal of 3D structural models is consequently an appealing alternative. The release of the AlphaFold Deep Learning approach has revolutionized the field. The recent near-complete human proteome proposal makes it possible to analyse large amounts of data and evaluate the results of the approach in greater depth. The 3D human proteome was thus analysed in light of the classic secondary structures, and many less-used protein local conformations (PolyProline II helices, type of γ-turns, of β-turns and of β-bulges, curvature of the helices, and a structural alphabet). Without questioning the global quality of the approach, this analysis highlights certain local conformations, which maybe poorly predicted and they could therefore be better addressed.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Bioinformatics team, F-75014, Paris, France.
| |
Collapse
|
3
|
de Brevern AG. A Perspective on the (Rise and Fall of) Protein β-Turns. Int J Mol Sci 2022; 23:12314. [PMID: 36293166 PMCID: PMC9604201 DOI: 10.3390/ijms232012314] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 10/07/2022] [Accepted: 10/13/2022] [Indexed: 11/21/2022] Open
Abstract
The β-turn is the third defined secondary structure after the α-helix and the β-sheet. The β-turns were described more than 50 years ago and account for more than 20% of protein residues. Nonetheless, they are often overlooked or even misunderstood. This poor knowledge of these local protein conformations is due to various factors, causes that I discuss here. For example, confusion still exists about the assignment of these local protein structures, their overlaps with other structures, the potential absence of a stabilizing hydrogen bond, the numerous types of β-turns and the software's difficulty in assigning or visualizing them. I also propose some ideas to potentially/partially remedy this and present why β-turns can still be helpful, even in the AlphaFold 2 era.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM UMR_S 1134, BIGR, DSIMB Team, F-75014 Paris, France
| |
Collapse
|
4
|
Yang J, Cheng WX, Zhao XF, Wu G, Sheng ST, Hu Q, Ge H, Qin Q, Jin X, Zhang L, Zhang P. Comprehensive folding variations for protein folding. Proteins 2022; 90:1851-1872. [DOI: 10.1002/prot.26381] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/12/2022] [Accepted: 04/22/2022] [Indexed: 11/12/2022]
Affiliation(s)
- Jiaan Yang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen Guangdong China
- Micro Biotech, Ltd. Shanghai China
| | - Wen Xiang Cheng
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen Guangdong China
| | | | - Gang Wu
- School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology Wuhan China
| | - Shi Tong Sheng
- Shenzhen Hua Ying Kang Gene Technology Co., Ltd Shenzhen Guangdong China
| | - Qiyue Hu
- Shanghai Hengrui Pharmaceutical Co. Ltd. Shanghai China
| | - Hu Ge
- Shanghai Hengrui Pharmaceutical Co. Ltd. Shanghai China
| | - Qianshan Qin
- Shanghai Hengrui Pharmaceutical Co. Ltd. Shanghai China
| | - Xinshen Jin
- Shanghai Hengrui Pharmaceutical Co. Ltd. Shanghai China
| | | | - Peng Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences Shenzhen Guangdong China
| |
Collapse
|
5
|
de Brevern AG. Impact of protein dynamics on secondary structure prediction. Biochimie 2020; 179:14-22. [PMID: 32946990 DOI: 10.1016/j.biochi.2020.09.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 09/04/2020] [Accepted: 09/10/2020] [Indexed: 02/08/2023]
Abstract
Protein 3D structures support their biological functions. As the number of protein structures is negligible in regards to the number of available protein sequences, prediction methodologies relying only on protein sequences are essential tools. In this field, protein secondary structure prediction (PSSPs) is a mature area, and is considered to have reached a plateau. Nonetheless, proteins are highly dynamical macromolecules, a property that could impact the PSSP methods. Indeed, in a previous study, the stability of local protein conformations was evaluated demonstrating that some regions easily changed to another type of secondary structure. The protein sequences of this dataset were used by PSSPs and their results compared to molecular dynamics to investigate their potential impact on the quality of the secondary structure prediction. Interestingly, a direct link is observed between the quality of the prediction and the stability of the assignment to the secondary structure state. The more stable a local protein conformation is, the better the prediction will be. The secondary structure assignment not taken from the crystallized structures but from the conformations observed during the dynamics slightly increase the quality of the secondary structure prediction. These results show that evaluation of PSSPs can be done differently, but also that the notion of dynamics can be included in development of PSSPs and other approaches such as de novo approaches.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Biologie Intégrée Du Globule Rouge UMR_S1134, Inserm, Université de Paris, Univ. de la Réunion, Univ. des Antilles, F-75739, Paris, France; Laboratoire D'Excellence GR-Ex, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; IBL, F-75015, Paris, France.
| |
Collapse
|
6
|
Shapovalov M, Dunbrack RL, Vucetic S. Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction. PLoS One 2020; 15:e0232528. [PMID: 32374785 PMCID: PMC7202669 DOI: 10.1371/journal.pone.0232528] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 04/16/2020] [Indexed: 11/30/2022] Open
Abstract
Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.
Collapse
Affiliation(s)
- Maxim Shapovalov
- Fox Chase Cancer Center, Philadelphia, PA, United States of America
- Temple University, Philadelphia, PA, United States of America
| | | | | |
Collapse
|
7
|
Akhila MV, Narwani TJ, Floch A, Maljković M, Bisoo S, Shinada NK, Kranjc A, Gelly JC, Srinivasan N, Mitić N, de Brevern AG. A structural entropy index to analyse local conformations in intrinsically disordered proteins. J Struct Biol 2020; 210:107464. [DOI: 10.1016/j.jsb.2020.107464] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2019] [Revised: 01/06/2020] [Accepted: 01/15/2020] [Indexed: 10/25/2022]
|
8
|
Narwani TJ, Craveur P, Shinada NK, Floch A, Santuz H, Vattekatte AM, Srinivasan N, Rebehmed J, Gelly JC, Etchebest C, de Brevern AG. Discrete analyses of protein dynamics. J Biomol Struct Dyn 2019; 38:2988-3002. [PMID: 31361191 DOI: 10.1080/07391102.2019.1650112] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tarun Jairaj Narwani
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Pierrick Craveur
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nicolas K Shinada
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Discngine, SAS, Paris, France
| | - Aline Floch
- Laboratoire D'Excellence GR-Ex, Paris, France.,Etablissement Français du Sang Ile de France, Créteil, France.,IMRB - INSERM U955 Team 2 « Transfusion et Maladies du Globule Rouge », Paris Est- Créteil Univ, Créteil, France.,UPEC, Université Paris Est-Créteil, Créteil, France
| | - Hubert Santuz
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Akhila Melarkode Vattekatte
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | | | - Joseph Rebehmed
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| | - Catherine Etchebest
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | - Alexandre G de Brevern
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| |
Collapse
|
9
|
Chu H, Liu H. TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones. J Chem Inf Model 2018; 58:430-442. [PMID: 29314837 DOI: 10.1021/acs.jcim.7b00677] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To construct backbone structures of high designability is a primary aspect of computational protein design. We report here a side chain-independent statistical energy that aims at realistic modeling of through-space packing of polypeptide backbones. To mitigate the lack of explicit amino acid side chains, the model treats the interbackbone site packing as being dependent on peptide local conformation. In addition, new variables suitable for statistical analysis, one for relative orientation and another for distance, have been introduced to represent the intersite geometry based on the asymmetrical tetrahedron organization of distinct chemical groups surrounding the Cα-carbon atoms. The resulting tetrahedron-based backbone statistical energy (tetraBASE) model has been used to optimize the tertiary organizations of secondary structure elements (SSEs) of designated types with Monte Caro simulated annealing, starting from artificial initial configurations. The tetraBASE minimum energy structures can reproduce SSE packing frequently observed in native proteins with atomic root-mean-square deviations of 1-2 Å. The model has also been tested by examining the stability of native SSE arrangements under tetraBASE. The results suggest that tetraBASE model can be used to effectively represent interbackbone packing when designing backbone structures without explicitly knowing side chain types.
Collapse
Affiliation(s)
- Huanyu Chu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China
| | - Haiyan Liu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China.,Collaborative Innovation Center of Chemistry for Life Sciences , 230027 Hefei, Anhui China
| |
Collapse
|
10
|
Extension of the classical classification of β-turns. Sci Rep 2016; 6:33191. [PMID: 27627963 PMCID: PMC5024104 DOI: 10.1038/srep33191] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 08/22/2016] [Indexed: 11/29/2022] Open
Abstract
The functional properties of a protein primarily depend on its three-dimensional (3D) structure. These properties have classically been assigned, visualized and analysed on the basis of protein secondary structures. The β-turn is the third most important secondary structure after helices and β-strands. β-turns have been classified according to the values of the dihedral angles φ and ψ of the central residue. Conventionally, eight different types of β-turns have been defined, whereas those that cannot be defined are classified as type IV β-turns. This classification remains the most widely used. Nonetheless, the miscellaneous type IV β-turns represent 1/3rd of β-turn residues. An unsupervised specific clustering approach was designed to search for recurrent new turns in the type IV category. The classical rules of β-turn type assignment were central to the approach. The four most frequently occurring clusters defined the new β-turn types. Unexpectedly, these types, designated IV1, IV2, IV3 and IV4, represent half of the type IV β-turns and occur more frequently than many of the previously established types. These types show convincing particularities, in terms of both structures and sequences that allow for the classical β-turn classification to be extended for the first time in 25 years.
Collapse
|
11
|
Noël F, Malpertuy A, de Brevern AG. Global analysis of VHHs framework regions with a structural alphabet. Biochimie 2016; 131:11-19. [PMID: 27613403 DOI: 10.1016/j.biochi.2016.09.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 09/05/2016] [Accepted: 09/05/2016] [Indexed: 02/08/2023]
Abstract
The VHHs are antigen-binding region/domain of camelid heavy chain antibodies (HCAb). They have many interesting biotechnological and biomedical properties due to their small size, high solubility and stability, and high affinity and specificity for their antigens. HCAb and classical IgGs are evolutionary related and share a common fold. VHHs are composed of regions considered as constant, called the frameworks (FRs) connected by Complementarity Determining Regions (CDRs), a highly variable region that provide interaction with the epitope. Actually, no systematic structural analyses had been performed on VHH structures despite a significant number of structures. This work is the first study to analyse the structural diversity of FRs of VHHs. Using a structural alphabet that allows approximating the local conformation, we show that each of the four FRs do not have a unique structure but exhibit many structural variant patterns. Moreover, no direct simple link between the local conformational change and amino acid composition can be detected. These results indicate that long-range interactions affect the local conformation of FRs and impact the building of structural models.
Collapse
Affiliation(s)
- Floriane Noël
- INSERM, U 1134, DSIMB, F-75739 Paris, France; Univ Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France; Laboratoire d'Excellence GR-Ex, F-75739 Paris, France
| | | | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, F-75739 Paris, France; Univ Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France; Laboratoire d'Excellence GR-Ex, F-75739 Paris, France.
| |
Collapse
|
12
|
Zhang Y, Sagui C. Secondary structure assignment for conformationally irregular peptides: comparison between DSSP, STRIDE and KAKSI. J Mol Graph Model 2014; 55:72-84. [PMID: 25424660 DOI: 10.1016/j.jmgm.2014.10.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 10/08/2014] [Indexed: 11/25/2022]
Abstract
Secondary structure assignment codes were built to explore the regularities associated with the periodic motifs of proteins, such as those in backbone dihedral angles or in hydrogen bonds between backbone atoms. Precise structure assignment is challenging because real-life secondary structures are susceptible to bending, twist, fraying and other deformations that can distance them from their geometrical prototypes. Although results from codes such as DSSP and STRIDE converge in well-ordered structures, the agreement between the secondary structure assignments is known to deteriorate as the conformations become more distorted. Conformationally irregular peptides therefore offer a great opportunity to explore the differences between these codes. This is especially important for unfolded proteins and intrinsically disordered proteins, which are known to exhibit residual and/or transient secondary structure whose characterization is challenging. In this work, we have carried out Molecular Dynamics simulations of (relatively) disordered peptides, specifically gp41659-671 (ELLELDKWASLWN), the homopeptide polyasparagine (N18), and polyasparagine dimers. We have analyzed the resulting conformations with DSSP and STRIDE, based on hydrogen-bond patterns (and dihedral angles for STRIDE), and KAKSI, based on α-Carbon distances; and carefully characterized the differences in structural assignments. The full-sequence Segment Overlap (SOV) scores, that quantify the agreement between two secondary structure assignments, vary from 70% for gp41659-671 (STRIDE as reference) to 49% for N18 (DSSP as reference). Major differences are observed in turns, in the distinction between α and 310 helices, and in short parallel-sheet segments.
Collapse
Affiliation(s)
- Yuan Zhang
- Department of Physics, North Carolina State University, Raleigh, NC 27695, United States; Center for High Performance Simulations (CHiPS), North Carolina State University, Raleigh, NC 27695, United States
| | - Celeste Sagui
- Department of Physics, North Carolina State University, Raleigh, NC 27695, United States; Center for High Performance Simulations (CHiPS), North Carolina State University, Raleigh, NC 27695, United States.
| |
Collapse
|
13
|
Schneider B, Černý J, Svozil D, Čech P, Gelly JC, de Brevern AG. Bioinformatic analysis of the protein/DNA interface. Nucleic Acids Res 2014; 42:3381-94. [PMID: 24335080 PMCID: PMC3950675 DOI: 10.1093/nar/gkt1273] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2013] [Revised: 11/14/2013] [Accepted: 11/14/2013] [Indexed: 01/04/2023] Open
Abstract
To investigate the principles driving recognition between proteins and DNA, we analyzed more than thousand crystal structures of protein/DNA complexes. We classified protein and DNA conformations by structural alphabets, protein blocks [de Brevern, Etchebest and Hazout (2000) (Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks. Prots. Struct. Funct. Genet., 41:271-287)] and dinucleotide conformers [Svozil, Kalina, Omelka and Schneider (2008) (DNA conformations and their sequence preferences. Nucleic Acids Res., 36:3690-3706)], respectively. Assembling the mutually interacting protein blocks and dinucleotide conformers into 'interaction matrices' revealed their correlations and conformer preferences at the interface relative to their occurrence outside the interface. The analyzed data demonstrated important differences between complexes of various types of proteins such as transcription factors and nucleases, distinct interaction patterns for the DNA minor groove relative to the major groove and phosphate and importance of water-mediated contacts. Water molecules mediate proportionally the largest number of contacts in the minor groove and form the largest proportion of contacts in complexes of transcription factors. The generally known induction of A-DNA forms by complexation was more accurately attributed to A-like and intermediate A/B conformers rare in naked DNA molecules.
Collapse
Affiliation(s)
- Bohdan Schneider
- Institute of Biotechnology AS CR, Videnska 1083, CZ-142 20 Prague, Czech Republic, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic, INSERM, U665, DSIMB, F-75739 Paris, France, University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739 Paris, France, Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France and Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Jiří Černý
- Institute of Biotechnology AS CR, Videnska 1083, CZ-142 20 Prague, Czech Republic, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic, INSERM, U665, DSIMB, F-75739 Paris, France, University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739 Paris, France, Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France and Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Daniel Svozil
- Institute of Biotechnology AS CR, Videnska 1083, CZ-142 20 Prague, Czech Republic, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic, INSERM, U665, DSIMB, F-75739 Paris, France, University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739 Paris, France, Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France and Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Petr Čech
- Institute of Biotechnology AS CR, Videnska 1083, CZ-142 20 Prague, Czech Republic, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic, INSERM, U665, DSIMB, F-75739 Paris, France, University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739 Paris, France, Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France and Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Jean-Christophe Gelly
- Institute of Biotechnology AS CR, Videnska 1083, CZ-142 20 Prague, Czech Republic, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic, INSERM, U665, DSIMB, F-75739 Paris, France, University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739 Paris, France, Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France and Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Alexandre G. de Brevern
- Institute of Biotechnology AS CR, Videnska 1083, CZ-142 20 Prague, Czech Republic, Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic, INSERM, U665, DSIMB, F-75739 Paris, France, University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739 Paris, France, Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France and Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| |
Collapse
|
14
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
15
|
Bhaskara RM, de Brevern AG, Srinivasan N. Understanding the role of domain–domain linkers in the spatial orientation of domains in multi-domain proteins. J Biomol Struct Dyn 2013; 31:1467-80. [DOI: 10.1080/07391102.2012.743438] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
16
|
Craveur P, Joseph AP, Rebehmed J, de Brevern AG. β-Bulges: extensive structural analyses of β-sheets irregularities. Protein Sci 2013; 22:1366-78. [PMID: 23904395 DOI: 10.1002/pro.2324] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 07/19/2013] [Accepted: 07/22/2013] [Indexed: 12/30/2022]
Abstract
β-Sheets are quite frequent in protein structures and are stabilized by regular main-chain hydrogen bond patterns. Irregularities in β-sheets, named β-bulges, are distorted regions between two consecutive hydrogen bonds. They disrupt the classical alternation of side chain direction and can alter the directionality of β-strands. They are implicated in protein-protein interactions and are introduced to avoid β-strand aggregation. Five different types of β-bulges are defined. Previous studies on β-bulges were performed on a limited number of protein structures or one specific family. These studies evoked a potential conservation during evolution. In this work, we analyze the β-bulge distribution and conservation in terms of local backbone conformations and amino acid composition. Our dataset consists of 66 times more β-bulges than the last systematic study (Chan et al. Protein Science 1993, 2:1574-1590). Novel amino acid preferences are underlined and local structure conformations are highlighted by the use of a structural alphabet. We observed that β-bulges are preferably localized at the N- and C-termini of β-strands, but contrary to the earlier studies, no significant conservation of β-bulges was observed among structural homologues. Displacement of β-bulges along the sequence was also investigated by Molecular Dynamics simulations.
Collapse
Affiliation(s)
- Pierrick Craveur
- INSERM, U665, DSIMB, F-75739, Paris, France; University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire d'Excellence GR-Ex, F-75739, Paris, France
| | | | | | | |
Collapse
|
17
|
Species specific amino acid sequence–protein local structure relationships: An analysis in the light of a structural alphabet. J Theor Biol 2011; 276:209-17. [DOI: 10.1016/j.jtbi.2011.01.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2010] [Revised: 01/28/2011] [Accepted: 01/31/2011] [Indexed: 11/24/2022]
|
18
|
Mansiaux Y, Joseph AP, Gelly JC, de Brevern AG. Assignment of PolyProline II conformation and analysis of sequence--structure relationship. PLoS One 2011; 6:e18401. [PMID: 21483785 PMCID: PMC3069088 DOI: 10.1371/journal.pone.0018401] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Accepted: 03/07/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely α-helices and β-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein-protein interactions. METHODOLOGY/PRINCIPAL FINDINGS A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined. CONCLUSIONS/SIGNIFICANCE The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence-structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field.
Collapse
Affiliation(s)
- Yohann Mansiaux
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Jean-Christophe Gelly
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
19
|
Agarwal G, Mahajan S, Srinivasan N, de Brevern AG. Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks. PLoS One 2011; 6:e17826. [PMID: 21445259 PMCID: PMC3060819 DOI: 10.1371/journal.pone.0017826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 02/15/2011] [Indexed: 11/18/2022] Open
Abstract
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Collapse
Affiliation(s)
- Garima Agarwal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Swapnil Mahajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore, India
| | | | - Alexandre G. de Brevern
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot - Paris 7, UMR-S665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| |
Collapse
|
20
|
Joseph AP, Agarwal G, Mahajan S, Gelly JC, Swapna LS, Offmann B, Cadet F, Bornot A, Tyagi M, Valadié H, Schneider B, Etchebest C, Srinivasan N, De Brevern AG. A short survey on protein blocks. Biophys Rev 2010; 2:137-147. [PMID: 21731588 DOI: 10.1007/s12551-010-0036-1] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Protein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and β-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as "structural alphabets". We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- DSIMB, Dynamique des Structures et Interactions des Macromolécules Biologiques Université Paris-Diderot - Paris VII INTS INSERM : U665 INTS, 6 rue Alexandre Cabanel, 75739 Paris Cedex 15 FRANCE,FR
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Influence of assignment on the prediction of transmembrane helices in protein structures. Amino Acids 2010; 39:1241-54. [DOI: 10.1007/s00726-010-0559-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Accepted: 03/08/2010] [Indexed: 02/01/2023]
|
22
|
Pandini A, Fornili A, Kleinjung J. Structural alphabets derived from attractors in conformational space. BMC Bioinformatics 2010; 11:97. [PMID: 20170534 PMCID: PMC2838871 DOI: 10.1186/1471-2105-11-97] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 02/20/2010] [Indexed: 11/20/2022] Open
Abstract
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.
Collapse
Affiliation(s)
- Alessandro Pandini
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, UK
| | | | | |
Collapse
|
23
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Analysis of loop boundaries using different local structure assignment methods. Protein Sci 2009; 18:1869-81. [PMID: 19606500 DOI: 10.1002/pro.198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
24
|
Helles G, Fonseca R. Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks. BMC Bioinformatics 2009; 10:338. [PMID: 19835576 PMCID: PMC2771020 DOI: 10.1186/1471-2105-10-338] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 10/16/2009] [Indexed: 11/10/2022] Open
Abstract
Background Predicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments make up nearly 40% of proteins and they do not have any apparent recurrent patterns, which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been done previously, none have, to our knowledge, presented comparable results for the probability distribution of dihedral angles. Results In this paper we develop an artificial neural network that uses an input-window of amino acids to predict a dihedral angle probability distribution for the middle residue in the input-window. The trained neural network shows a significant improvement (4-68%) in predicting the most probable bin (covering a 30° × 30° area of the dihedral angle space) for all amino acids in the data set compared to baseline statistics. An accuracy comparable to that of secondary structure prediction (≈ 80%) is achieved by observing the 20 bins with highest output values. Conclusion Many different protein structure prediction methods exist and each uses different tools and auxiliary predictions to help determine the native structure. In this work the sequence is used to predict local context dependent dihedral angle propensities in coil-regions. This predicted distribution can potentially improve tertiary structure prediction methods that are based on sampling the backbone dihedral angles of individual amino acids. The predicted distribution may also help predict local structure fragments used in fragment assembly methods.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Department of Computer Science, Universitetsparken 1, 2100 Copenhagen, Denmark.
| | | |
Collapse
|
25
|
Bornot A, Etchebest C, de Brevern AG. A new prediction strategy for long local protein structures using an original description. Proteins 2009; 76:570-87. [PMID: 19241475 DOI: 10.1002/prot.22370] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A relevant and accurate description of three-dimensional (3D) protein structures can be achieved by characterizing recurrent local structures. In a previous study, we developed a library of 120 3D structural prototypes encompassing all known 11-residues long local protein structures and ensuring a good quality of structural approximation. A local structure prediction method was also proposed. Here, overlapping properties of local protein structures in global ones are taken into account to characterize frequent local networks. At the same time, we propose a new long local structure prediction strategy which involves the use of evolutionary information coupled with Support Vector Machines (SVMs). Our prediction is evaluated by a stringent geometrical assessment. Every local structure prediction with a Calpha RMSD less than 2.5 A from the true local structure is considered as correct. A global prediction rate of 63.1% is then reached, corresponding to an improvement of 7.7 points compared with the previous strategy. In the same way, the prediction of 88.33% of the 120 structural classes is improved with 8.65% mean gain. 85.33% of proteins have better prediction results with a 9.43% average gain. An analysis of prediction rate per local network also supports the global improvement and gives insights into the potential of our method for predicting super local structures. Moreover, a confidence index for the direct estimation of prediction quality is proposed. Finally, our method is proved to be very competitive with cutting-edge strategies encompassing three categories of local structure predictions.
Collapse
Affiliation(s)
- Aurélie Bornot
- INSERM UMR-S, Université Paris Diderot, Institut National de la Transfusion Sanguine, France.
| | | | | |
Collapse
|
26
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Protein short loop prediction in terms of a structural alphabet. Comput Biol Chem 2009; 33:329-33. [PMID: 19625218 DOI: 10.1016/j.compbiolchem.2009.06.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Revised: 06/17/2009] [Accepted: 06/17/2009] [Indexed: 11/20/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play crucial biological roles. To bypass the limitation of secondary structure description, we previously defined a structural alphabet composed of 16 structural prototypes, called Protein Blocks (PBs). It leads to an accurate description of every region of 3D protein backbones and has been used in local structure prediction. In the present study, we used our structural alphabet to predict the loops connecting two repetitive structures. Thus, we showed interest to take into account the flanking regions, leading to prediction rate improvement up to 19.8%, but we also underline the sensitivity of such an approach. This research can be used to propose different structures for the loops and to probe and sample their flexibility. It is a useful tool for ab initio loop prediction and leads to insights into flexible docking approach.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
27
|
Abstract
The Duffy Antigen/Receptor for Chemokine (DARC) is a seven segment transmembrane protein. It was firstly discovered as a blood group antigen and was the first specific gene locus assigned to a specific autosome in man. It became more famous as an erythrocyte receptor for malaria parasites (Plasmodium vivax and Plasmodium knowlesi), and finally for chemokines. DARC is an unorthodox chemokine receptor as (i) it binds chemokines of both CC and CXC classes and (ii) it lacks the Asp-Arg-Tyr consensus motif in its second cytoplasmic loop hence cannot couple to G proteins and activate their signaling pathways. DARC had also been associated to cancer progression, numerous inflammatory diseases, and possibly to AIDS. In this review, we will summarize important biological data on DARC. Then we shall focus on recent development of the elaboration and analyzes of structural models of DARC. We underline the difficulty to propose pertinent structural models of transmembrane protein using comparative modeling process, and other dedicated approaches as the Protein Blocks. The chosen structural models encompass most of the biochemical data known to date. Finally, we present recent development of protein-protein docking between DARC structural models and CXCL-8 structures. We propose a hierarchical search based on separated rigid and flexible docking.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- INSERM UMR-S 665, Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine, 6, rue Alexandre Cabanel, 75739 Paris 15, France.
| | | | | | | | | |
Collapse
|
28
|
Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie 2009; 91:876-87. [PMID: 19383526 DOI: 10.1016/j.biochi.2009.04.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 04/13/2009] [Indexed: 11/18/2022]
Abstract
Three-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains.
Collapse
Affiliation(s)
- Guilhem Faure
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire (EBGM), DSIMB, Université Paris Diderot - Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France
| | | | | |
Collapse
|
29
|
Benros C, de Brevern AG, Hazout S. Analyzing the sequence–structure relationship of a library of local structural prototypes. J Theor Biol 2009; 256:215-26. [DOI: 10.1016/j.jtbi.2008.08.032] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Revised: 08/23/2008] [Accepted: 08/31/2008] [Indexed: 10/21/2022]
|
30
|
Le Q, Pollastri G, Koehl P. Structural alphabets for protein structure classification: a comparison study. J Mol Biol 2008; 387:431-50. [PMID: 19135454 DOI: 10.1016/j.jmb.2008.12.044] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Revised: 12/16/2008] [Accepted: 12/17/2008] [Indexed: 11/26/2022]
Abstract
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 A over 2225 test proteins. The approximation is best for all alpha-proteins, while relatively poorer for all beta-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.
Collapse
Affiliation(s)
- Quan Le
- Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Dublin, Ireland.
| | | | | |
Collapse
|
31
|
Yang J. Comprehensive description of protein structures using protein folding shape code. Proteins 2008; 71:1497-518. [DOI: 10.1002/prot.21932] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
32
|
De Brevern AG, Etchebest C, Benros C, Hazout S. "Pinning strategy": a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 2007; 32:51-70. [PMID: 17426380 DOI: 10.1007/s12038-007-0006-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 C alpha in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs)are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called "pinning strategy" that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.
Collapse
Affiliation(s)
- A G De Brevern
- 1 INSERM, U726, Equipe de Bioinformatique Genomique et Moleculaire (EBGM), Universite Paris 7,case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France.
| | | | | | | |
Collapse
|
33
|
Pollastri G, Martin AJM, Mooney C, Vullo A. Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information. BMC Bioinformatics 2007; 8:201. [PMID: 17570843 PMCID: PMC1913928 DOI: 10.1186/1471-2105-8-201] [Citation(s) in RCA: 85] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2007] [Accepted: 06/14/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. RESULTS Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available. CONCLUSION The predictive system are publicly available at the address http://distill.ucd.ie.
Collapse
Affiliation(s)
- Gianluca Pollastri
- Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
| | - Alberto JM Martin
- Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
| | - Catherine Mooney
- Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
| | - Alessandro Vullo
- Complex and Adaptive Systems Laboratory, School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland
| |
Collapse
|
34
|
Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007; 36:1059-69. [PMID: 17565494 DOI: 10.1007/s00249-007-0188-5] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Revised: 05/05/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Collapse
Affiliation(s)
- C Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Denis DIDEROT, Paris 7, case 7113, 2, place Jussieu, 75251, Paris, France
| | | | | | | | | |
Collapse
|
35
|
Thukral L, Shenoy SR, Bhushan K, Jayaram B. ProRegIn: A regularity index for the selection of native-like tertiary structures of proteins. J Biosci 2007; 32:71-81. [PMID: 17426381 DOI: 10.1007/s12038-007-0007-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Automated protein tertiary structure prediction from sequence information alone remains an elusive goal to computational prescriptions. Dividing the problem into three stages viz. secondary structure prediction, generation of plausible main chain loop dihedrals and side chain dihedral optimization, considerable progress has been achieved in our laboratory (http://www.scfbio-iitd.res.in/bhageerath/index.jsp) and elsewhere for proteins with less than 100 amino acids. As a part of our on-going efforts in this direction and to facilitate tertiary structure selection/rejection in containing the combinatorial explosion of trial structures for a specified amino acid sequence, we describe here a web-enabled tool ProRegIn (Protein Regularity Index) developed based on the regularity in the Phi, Psi dihedral angles of the amino acids that constitute loop regions. We have analysed the dihedrals in loop regions in a non-redundant dataset of 7351 proteins drawn from the Protein Data Bank and categorized them as helix-like or sheet-like (regular) or irregular. We noticed that the regularity thus defined exceeds 86% for Phi barring glycine and 70% for Psi for all the amino acid side chains including glycine, compelling us to reexamine the conventional view that loops are irregular regions structurally. The regularity index is presented here as a simple tool that finds its application in protein structure analysis as a discriminatory scoring function for rapid screening before the more compute intensive atomic level energy calculations could be undertaken. The tool is made freely accessible over the internet at www.scfbio-iitd.res.in/software/proregin.jsp.
Collapse
Affiliation(s)
- Lipi Thukral
- Department of Chemistry and Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi 110 016, India
| | | | | | | |
Collapse
|
36
|
Discovering structural motifs using a structural alphabet: application to magnesium-binding sites. BMC Bioinformatics 2007; 8:106. [PMID: 17389049 PMCID: PMC1851716 DOI: 10.1186/1471-2105-8-106] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2006] [Accepted: 03/28/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND For many metalloproteins, sequence motifs characteristic of metal-binding sites have not been found or are so short that they would not be expected to be metal-specific. Striking examples of such metalloproteins are those containing Mg2+, one of the most versatile metal cofactors in cellular biochemistry. Even when Mg2+-proteins share insufficient sequence homology to identify Mg2+-specific sequence motifs, they may still share similarity in the Mg2+-binding site structure. However, no structural motifs characteristic of Mg2+-binding sites have been reported. Thus, our aims are (i) to develop a general method for discovering structural patterns/motifs characteristic of ligand-binding sites, given the 3D protein structures, and (ii) to apply it to Mg2+-proteins sharing <30% sequence identity. Our motif discovery method employs structural alphabet encoding to convert 3D structures to the corresponding 1D structural letter sequences, where the Mg2+-structural motifs are identified as recurring structural patterns. RESULTS The structural alphabet-based motif discovery method has revealed the structural preference of Mg2+-binding sites for certain local/secondary structures: compared to all residues in the Mg2+-proteins, both first and second-shell Mg2+-ligands prefer loops to helices. Even when the Mg2+-proteins share no significant sequence homology, some of them share a similar Mg2+-binding site structure: 4 Mg2+-structural motifs, comprising 21% of the binding sites, were found. In particular, one of the Mg2+-structural motifs found maps to a specific functional group, namely, hydrolases. Furthermore, 2 of the motifs were not found in non metalloproteins or in Ca2+-binding proteins. The structural motifs discovered thus capture some essential biochemical and/or evolutionary properties, and hence may be useful for discovering proteins where Mg2+ plays an important biological role. CONCLUSION The structural motif discovery method presented herein is general and can be applied to any set of proteins with known 3D structures. This new method is timely considering the increasing number of structures for proteins with unknown function that are being solved from structural genomics incentives. For such proteins, which share no significant sequence homology to proteins of known function, the presence of a structural motif that maps to a specific protein function in the structure would suggest likely active/binding sites and a particular biological function.
Collapse
|
37
|
Tyagi M, Gowri VS, Srinivasan N, de Brevern AG, Offmann B. A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications. Proteins 2006; 65:32-9. [PMID: 16894618 DOI: 10.1002/prot.21087] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Analysis of protein structures based on backbone structural patterns known as structural alphabets have been shown to be very useful. Among them, a set of 16 pentapeptide structural motifs known as protein blocks (PBs) has been identified and upon which backbone model of most protein structures can be built. PBs allows simplification of 3D space onto 1D space in the form of sequence of PBs. Here, for the first time, substitution probabilities of PBs in a large number of aligned homologous protein structures have been studied and are expressed as a simplified 16 x 16 substitution matrix. The matrix was validated by benchmarking how well it can align sequences of PBs rather like amino acid alignment to identify structurally equivalent regions in closely or distantly related proteins using dynamic programming approach. The alignment results obtained are very comparable to well established structure comparison methods like DALI and STAMP. Other interesting applications of the matrix have been investigated. We first show that, in variable regions between two superimposed homologous proteins, one can distinguish between local conformational differences and rigid-body displacement of a conserved motif by comparing the PBs and their substitution scores. Second, we demonstrate, with the example of aspartic proteinases, that PBs can be efficiently used to detect the lobe/domain flexibility in the multidomain proteins. Lastly, using protein kinase as an example, we identify regions of conformational variations and rigid body movements in the enzyme as it is changed to the active state from an inactive state.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | | | |
Collapse
|
38
|
Tyagi M, Sharma P, Swamy CS, Cadet F, Srinivasan N, de Brevern AG, Offmann B. Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet. Nucleic Acids Res 2006; 34:W119-23. [PMID: 16844973 PMCID: PMC1538797 DOI: 10.1093/nar/gkl199] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis. Using the well-documented 16 motifs of Protein Blocks (PBs) as structural alphabet, we have developed a methodology to compare protein structures that are encoded as sequences of PBs by aligning them using dynamic programming which uses a substitution matrix for PBs. This methodology is implemented in the applications available in Protein Block Expert (PBE) server. PBE addresses common issues in the field of protein structure analysis such as comparison of proteins structures and identification of protein structures in structural databanks that resemble a given structure. PBE-T provides facility to transform any PDB file into sequences of PBs. PBE-ALIGNc performs comparison of two protein structures based on the alignment of their corresponding PB sequences. PBE-ALIGNm is a facility for mining SCOP database for similar structures based on the alignment of PBs. Besides, PBE provides an interface to a database (PBE-SAdb) of preprocessed PB sequences from SCOP culled at 95% and of all-against-all pairwise PB alignments at family and superfamily levels. PBE server is freely available at .
Collapse
Affiliation(s)
- M. Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Bioinformatics Team, Université de La RéunionBP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | - P. Sharma
- Laboratoire de Biochimie et Génétique Moléculaire, Bioinformatics Team, Université de La RéunionBP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | - C. S. Swamy
- Molecular Biophysics Unit, Indian Institute of ScienceBangalore 560 012, India
| | - F. Cadet
- Laboratoire de Biochimie et Génétique Moléculaire, Bioinformatics Team, Université de La RéunionBP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | - N. Srinivasan
- Laboratoire de Biochimie et Génétique Moléculaire, Bioinformatics Team, Université de La RéunionBP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
- Molecular Biophysics Unit, Indian Institute of ScienceBangalore 560 012, India
| | - A. G. de Brevern
- INSERM, U726, Equipe de Bioinformatique et Génomique Moléculaire (EBGM), Université Paris 7—Denis Diderotcase 7113, 2, place Jussieu, 75251 Paris Cedex 05, France
| | - B. Offmann
- Laboratoire de Biochimie et Génétique Moléculaire, Bioinformatics Team, Université de La RéunionBP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
- To whom correspondence should be addressed. Tel: +262 262 93 8641; Fax: +262 262 93 8237;
| |
Collapse
|
39
|
Bornot A, de Brevern AG. Protein beta-turn assignments. Bioinformation 2006; 1:153-5. [PMID: 17597878 PMCID: PMC1891681 DOI: 10.6026/97320630001153] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2006] [Revised: 05/24/2006] [Accepted: 05/24/2006] [Indexed: 11/27/2022] Open
Abstract
A classical way to analyze protein 3D structures or models is to investigate their secondary structures. Their predictions are also widely used as
a help to build new 3D models. Thus, hundreds of prediction methods have been proposed. Nonetheless before predicting, secondary structure assignment
is required even if not trivial. Therefore numerous but diverging assignment methods have been developed. β-turns constitute the third most important
secondary structures. However, no analysis to compare the β-turn distributions according to different secondary structure assignment methods has ever
been done. We propose in this paper to analyze and evaluate the results of such a comparison. We highlight some important divergence that could have
important consequence for the analysis and prediction of β-turns.
Collapse
Affiliation(s)
| | - Alexandre G de Brevern
- Alexandre G. de Brevern
E-mail:
; Phone: + 33 1 44 27 77 31, Fax: +33 1 43 26 38 30; Corresponding author
| |
Collapse
|
40
|
Benros C, de Brevern AG, Etchebest C, Hazout S. Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 2006; 62:865-80. [PMID: 16385557 DOI: 10.1002/prot.20815] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 A Calpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 A from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
Collapse
Affiliation(s)
- Cristina Benros
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris 7, Paris, France.
| | | | | | | |
Collapse
|
41
|
Fuchs PFJ, Alix AJP. High accuracy prediction of beta-turns and their types using propensities and multiple alignments. Proteins 2006; 59:828-39. [PMID: 15822097 DOI: 10.1002/prot.20461] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We have developed a method that predicts both the presence and the type of beta-turns, using a straightforward approach based on propensities and multiple alignments. The propensities were calculated classically, but the way to use them for prediction was completely new: starting from a tetrapeptide sequence on which one wants to evaluate the presence of a beta-turn, the propensity for a given residue is modified by taking into account all the residues present in the multiple alignment at this position. The evaluation of a score is then done by weighting these propensities by the use of Position-specific score matrices generated by PSI-BLAST. The introduction of secondary structure information predicted by PSIPRED or SSPRO2 as well as taking into account the flanking residues around the tetrapeptide improved the accuracy greatly. This latter evaluated on a database of 426 reference proteins (previously used on other studies) by a sevenfold crossvalidation gave very good results with a Matthews Correlation Coefficient (MCC) of 0.42 and an overall prediction accuracy of 74.8%; this places our method among the best ones. A jackknife test was also done, which gave results within the same range. This shows that it is possible to reach neural networks accuracy with considerably less computional cost and complexity. Furthermore, propensities remain excellent descriptors of amino acid tendencies to belong to beta-turns, which can be useful for peptide or protein engineering and design. For beta-turn type prediction, we reached the best accuracy ever published in terms of MCC (except for the irregular type IV) in the range of 0.25-0.30 for types I, II, and I' and 0.13-0.15 for types VIII, II', and IV. To our knowledge, our method is the only one available on the Web that predicts types I' and II'. The accuracy evaluated on two larger databases of 547 and 823 proteins was not improved significantly. All of this was implemented into a Web server called COUDES (French acronym for: Chercher Ou Une Deviation Existe Surement), which is available at the following URL: http://bioserv.rpbs.jussieu.fr/Coudes/index.html within the new bioinformatics platform RPBS.
Collapse
Affiliation(s)
- Patrick F J Fuchs
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM EMI 0346, Université Paris 7, Paris, France.
| | | |
Collapse
|
42
|
Etchebest C, Benros C, Hazout S, de Brevern AG. A structural alphabet for local protein structures: improved prediction methods. Proteins 2006; 59:810-27. [PMID: 15822101 DOI: 10.1002/prot.20458] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.
Collapse
Affiliation(s)
- Catherine Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris, France
| | | | | | | |
Collapse
|
43
|
Kruus E, Thumfort P, Tang C, Wingreen NS. Gibbs sampling and helix-cap motifs. Nucleic Acids Res 2005; 33:5343-53. [PMID: 16174845 PMCID: PMC1234247 DOI: 10.1093/nar/gki842] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2005] [Revised: 08/08/2005] [Accepted: 08/30/2005] [Indexed: 11/25/2022] Open
Abstract
Protein backbones have characteristic secondary structures, including alpha-helices and beta-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of alpha-helix caps, we test whether the information content of the sequence-structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of +/-1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.
Collapse
Affiliation(s)
- Erik Kruus
- NEC Laboratories America, Inc. 4 Independence Way, Princeton, NJ 08544, USA.
| | | | | | | |
Collapse
|
44
|
Martin J, Letellier G, Marin A, Taly JF, de Brevern AG, Gibrat JF. Protein secondary structure assignment revisited: a detailed analysis of different assignment methods. BMC STRUCTURAL BIOLOGY 2005; 5:17. [PMID: 16164759 PMCID: PMC1249586 DOI: 10.1186/1472-6807-5-17] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/26/2005] [Accepted: 09/15/2005] [Indexed: 11/28/2022]
Abstract
Background A number of methods are now available to perform automatic assignment of periodic secondary structures from atomic coordinates, based on different characteristics of the secondary structures. In general these methods exhibit a broad consensus as to the location of most helix and strand core segments in protein structures. However the termini of the segments are often ill-defined and it is difficult to decide unambiguously which residues at the edge of the segments have to be included. In addition, there is a "twilight zone" where secondary structure segments depart significantly from the idealized models of Pauling and Corey. For these segments, one has to decide whether the observed structural variations are merely distorsions or whether they constitute a break in the secondary structure. Methods To address these problems, we have developed a method for secondary structure assignment, called KAKSI. Assignments made by KAKSI are compared with assignments given by DSSP, STRIDE, XTLSSTR, PSEA and SECSTR, as well as secondary structures found in PDB files, on 4 datasets (X-ray structures with different resolution range, NMR structures). Results A detailed comparison of KAKSI assignments with those of STRIDE and PSEA reveals that KAKSI assigns slightly longer helices and strands than STRIDE in case of one-to-one correspondence between the segments. However, KAKSI tends also to favor the assignment of several short helices when STRIDE and PSEA assign longer, kinked, helices. Helices assigned by KAKSI have geometrical characteristics close to those described in the PDB. They are more linear than helices assigned by other methods. The same tendency to split long segments is observed for strands, although less systematically. We present a number of cases of secondary structure assignments that illustrate this behavior. Conclusion Our method provides valuable assignments which favor the regularity of secondary structure segments.
Collapse
Affiliation(s)
- Juliette Martin
- INRA, Unité Mathématiques Informatique et Génome, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| | - Guillaume Letellier
- INRA, Unité Mathématiques Informatique et Génome, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| | - Antoine Marin
- INRA, Unité Mathématiques Informatique et Génome, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| | - Jean-François Taly
- INRA, Unité Mathématiques Informatique et Génome, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| | - Alexandre G de Brevern
- INSERM U726, Equipe de Bioinformatique Génomique et Moléculaire, Université Paris 7, case 7113, 2 place Jussieu, 75251 Paris cedex 05, France
| | - Jean-François Gibrat
- INRA, Unité Mathématiques Informatique et Génome, Domaine de Vilvert, 78352 Jouy en Josas Cedex, France
| |
Collapse
|
45
|
de Brevern AG, Wong H, Tournamille C, Colin Y, Le Van Kim C, Etchebest C. A structural model of a seven-transmembrane helix receptor: The Duffy antigen/receptor for chemokine (DARC). Biochim Biophys Acta Gen Subj 2005; 1724:288-306. [PMID: 16046070 DOI: 10.1016/j.bbagen.2005.05.016] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2005] [Revised: 05/13/2005] [Accepted: 05/16/2005] [Indexed: 01/28/2023]
Abstract
The Duffy antigen/receptor for chemokine (DARC) is an erythrocyte receptor for malaria parasites (Plasmodium vivax and Plasmodium knowlesi) and for chemokines. In contrast to other chemokine receptors, DARC is a promiscuous receptor that binds chemokines of both CC and CXC classes. The four extracellular domains (ECDs) of DARC are essential for its interaction with chemokines, whilst the first (ECD1) is sufficient for the interaction with malaria erythrocyte-binding protein. In this study, we elaborate and analyze structural models of the DARC. The construction of the 3D models is based on a comparative modeling process and on the use of many procedures to predict transmembrane segments and to detect far homologous proteins with known structures. Threading, ab initio, secondary structure and Protein Blocks approaches are used to build a very large number of models. The conformational exploration of the ECDs is performed with simulated annealing. The second and fourth ECDs are strongly constrained. On the contrary, the ECD1 is highly flexible, but seems composed of three consecutive regions: a small beta-sheet, a linker region and a structured loop. The chosen structural models encompass most of the biochemical features and reflect the known experimental data. They may be used to analyze functional interaction properties.
Collapse
Affiliation(s)
- A G de Brevern
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM U 726, Université Denis DIDEROT-Paris 7, case 7113, 2, place Jussieu, 75251 Paris, France.
| | | | | | | | | | | |
Collapse
|
46
|
de Brevern AG. New assessment of a structural alphabet. In Silico Biol 2005; 5:283-9. [PMID: 15996119 PMCID: PMC2001288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
A statistical analysis of the Protein Databank (PDB) structures had led us to define a set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one defined by the (phi, psi) dihedral angles of 5 consecutive residues. Here, we analyze the effect of the enlargement of the PDB on the PBs' definition. The results highlight the quality of the 3D approximation ensured by the PBs. These last could be of great interest in ab initio modeling.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Equipe de Bioinformatique Genomique et Moleculaire (EBGM), INSERM U 726, Universite Denis DIDEROT - Paris 7, case 7113, 2, place Jussieu, 75251 Paris, France.
| |
Collapse
|
47
|
de Brevern AG, Benros C, Gautier R, Valadié H, Hazout S, Etchebest C. Local backbone structure prediction of proteins. In Silico Biol 2004; 4:381-6. [PMID: 15724288 PMCID: PMC1995003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
A statistical analysis of the PDB structures has led us to define a new set of small 3D structural prototypes called Protein Blocks (PBs). This structural alphabet includes 16 PBs, each one is defined by the (phi, psi) dihedral angles of 5 consecutive residues. The amino acid distributions observed in sequence windows encompassing these PBs are used to predict by a Bayesian approach the local 3D structure of proteins from the sole knowledge of their sequences. LocPred is a software which allows the users to submit a protein sequence and performs a prediction in terms of PBs. The prediction results are given both textually and graphically.
Collapse
Affiliation(s)
- Alexandre G de Brevern
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E03-46, Université Denis Diderot - Paris 7, 75251 Paris, France.
| | | | | | | | | | | |
Collapse
|