1
|
Zhao K, Zhao P, Wang S, Xia Y, Zhang G. FoldPAthreader: predicting protein folding pathway using a novel folding force field model derived from known protein universe. Genome Biol 2024; 25:152. [PMID: 38862984 PMCID: PMC11167914 DOI: 10.1186/s13059-024-03291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
Protein folding has become a tractable problem with the significant advances in deep learning-driven protein structure prediction. Here we propose FoldPAthreader, a protein folding pathway prediction method that uses a novel folding force field model by exploring the intrinsic relationship between protein evolution and folding from the known protein universe. Further, the folding force field is used to guide Monte Carlo conformational sampling, driving the protein chain fold into its native state by exploring potential intermediates. On 30 example targets, FoldPAthreader successfully predicts 70% of the proteins whose folding pathway is consistent with biological experimental data.
Collapse
Affiliation(s)
- Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Pengxin Zhao
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Suhui Wang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Yuhao Xia
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, HangZhou, 310023, China.
| |
Collapse
|
2
|
Lu Y, Li W, Li Y, Zhai W, Zhou X, Wu Z, Jiang S, Liu T, Wang H, Hu R, Zhou Y, Zou J, Hu P, Guan G, Xu Q, Canário AVM, Chen L. Population genomics of an icefish reveals mechanisms of glacier-driven adaptive radiation in Antarctic notothenioids. BMC Biol 2022; 20:231. [PMID: 36224580 PMCID: PMC9560024 DOI: 10.1186/s12915-022-01432-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 10/03/2022] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND Antarctica harbors the bulk of the species diversity of the dominant teleost fish suborder-Notothenioidei. However, the forces that shape their evolution are still under debate. RESULTS We sequenced the genome of an icefish, Chionodraco hamatus, and used population genomics and demographic modelling of sequenced genomes of 52 C. hamatus individuals collected mainly from two East Antarctic regions to investigate the factors driving speciation. Results revealed four icefish populations with clear reproduction separation were established 15 to 50 kya (kilo years ago) during the last glacial maxima (LGM). Selection sweeps in genes involving immune responses, cardiovascular development, and photoperception occurred differentially among the populations and were correlated with population-specific microbial communities and acquisition of distinct morphological features in the icefish taxa. Population and species-specific antifreeze glycoprotein gene expansion and glacial cycle-paced duplication/degeneration of the zona pellucida protein gene families indicated fluctuating thermal environments and periodic influence of glacial cycles on notothenioid divergence. CONCLUSIONS We revealed a series of genomic evidence indicating differential adaptation of C. hamatus populations and notothenioid species divergence in the extreme and unique marine environment. We conclude that geographic separation and adaptation to heterogeneous pathogen, oxygen, and light conditions of local habitats, periodically shaped by the glacial cycles, were the key drivers propelling species diversity in Antarctica.
Collapse
Affiliation(s)
- Ying Lu
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Wenhao Li
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Yalin Li
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Wanying Zhai
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Xuming Zhou
- Institute of Zoology, Chinese Academy of Science, Beijing, China
| | - Zhichao Wu
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Shouwen Jiang
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Taigang Liu
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
- College of Information Technology, Shanghai Ocean University, Shanghai, China
| | - Huamin Wang
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Ruiqin Hu
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Yan Zhou
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Jun Zou
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Peng Hu
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Guijun Guan
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China
| | - Qianghua Xu
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China.
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China.
| | - Adelino V M Canário
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China.
- Centre of Marine Sciences (CCMAR-CIMAR LA), University of Algarve, Faro, Portugal.
| | - Liangbiao Chen
- Key Laboratory of Exploration and Utilization of Aquatic Genetic Resources (Ministry of Education), Shanghai Ocean University, Shanghai, China.
- International Research Center for Marine Biosciences (Ministry of Science and Technology), Shanghai Ocean University, Shanghai, China.
| |
Collapse
|
3
|
Maljković MM, Mitić NS, de Brevern AG. Prediction of structural alphabet protein blocks using data mining. Biochimie 2022; 197:74-85. [PMID: 35143919 DOI: 10.1016/j.biochi.2022.01.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 01/22/2022] [Accepted: 01/31/2022] [Indexed: 11/17/2022]
Abstract
3D protein structures determine proteins' biological functions. The 3D structure of the protein backbone can be approximated using the prototypes of local protein conformations. Sets of these prototypes are called structural alphabets (SAs). Amongst several approaches to the prediction of 3D structures from amino acid sequences, one approach is based on the prediction of SA prototypes for a given amino acid sequence. Protein Blocks (PBs) is the most known SA, and it is composed of 16 prototypes of five consecutive amino acids which were identified as optimal prototypes considering the ability to correctly approximate the local structure and the prediction accuracy of prototypes from an amino acid sequence. We developed models for PBs prediction from sequence information using different data mining approaches and machine learning algorithms. Besides the amino acid sequences, the results of the following tools were used to train the models: the Spider3 predictor of protein structure properties, several predictors of the protein's intrinsically disordered regions, and a tool for finding repeats in amino acid sequences. The highest accuracy of the constructed models is 80%, which is a significant improvement compared to the previous best available prediction, whose accuracy was 61%. Analyzing the models constructed by applying different algorithms, it was noticed that the significance of input attributes differs among the models constructed by algorithms. Using the information about amino acids belonging to intrinsically disordered regions and repeats improves the precision of prediction for some PBs using the CART classification algorithm, while this is not the case with the C5.0 classification algorithm. Improved prediction approaches can have interesting applications in protein structural model approaches or computational protein design.
Collapse
Affiliation(s)
- Mirjana M Maljković
- Faculty of Mathematics, University of Belgrade, Studentski Trg 16, 11000, Belgrade, Serbia.
| | - Nenad S Mitić
- Faculty of Mathematics, University of Belgrade, Studentski Trg 16, 11000, Belgrade, Serbia
| | - Alexandre G de Brevern
- Université de Paris, INSERM UMR_S 1134, DSIMB, Université de la Réunion, INTS6, Rue Alexandre Cabanel, 75015, Paris, France
| |
Collapse
|
4
|
Dabravolski SA, Isayenkov SV. Evolution of the Cytokinin Dehydrogenase (CKX) Domain. J Mol Evol 2021; 89:665-677. [PMID: 34757471 DOI: 10.1007/s00239-021-10035-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2021] [Accepted: 10/30/2021] [Indexed: 01/05/2023]
Abstract
Plant hormone cytokinins are important regulators of plant development, response to environmental stresses and interplay with other plant hormones. Cytokinin dehydrogenases (CKXs) are proteins responsible for the irreversible break-down of cytokinins to the adenine and aldehyde. Even though plant CKXs have been extensively studied, homologous proteins from other taxa remain mainly uncharacterised. Here we present our study on the molecular evolution and divergence of the CKX from bacteria, fungi, amoeba and viridiplantae. Although CKXs are present in eukaryotes and prokaryotes, they are missing in algae and metazoan taxa. The prevalent domain architecture consists of the FAD-binding and cytokinin binding domains, whereas some bacteria appear to have only cytokinin binding domain proteins. The CKXs play important role in the various aspects of plant life including control of plant development, response to biotic and abiotic stress, influence nutrition. Results of our study suggested that CKX originates from the FAD-linked C-terminal oxidase and has a defence-oriented function. The obtained results significantly extend the current understanding of the cytokinin dehydrogenases structure-function from the relationship to homologues from other taxa and provide a starting point baseline for their future functional characterization.
Collapse
Affiliation(s)
- Siarhei A Dabravolski
- Department of Clinical Diagnostics, Vitebsk State Academy of Veterinary Medicine [UO VGAVM], Dovatora str. 7/11, 21002, Vitebsk, Belarus
| | - Stanislav V Isayenkov
- International Research Centre for Environmental Membrane Biology, Foshan University, Foshan, China.
- Department of Plant Food Products and Biofortification, Institute of Food Biotechnology and Genomics, NAS of Ukraine, Osipovskogo str., 2a, Kyiv-123, Kyiv, 04123, Ukraine.
| |
Collapse
|
5
|
Milchevskaya V, Nikitin AM, Lukshin SA, Filatov IV, Kravatsky YV, Tumanyan VG, Esipova NG, Milchevskiy YV. Structural coordinates: A novel approach to predict protein backbone conformation. PLoS One 2021; 16:e0239793. [PMID: 34014953 PMCID: PMC8136669 DOI: 10.1371/journal.pone.0239793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 04/14/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Local protein structure is usually described via classifying each peptide to a unique class from a set of pre-defined structures. These classifications may differ in the number of structural classes, the length of peptides, or class attribution criteria. Most methods that predict the local structure of a protein from its sequence first rely on some classification and only then proceed to the 3D conformation assessment. However, most classification methods rely on homologous proteins’ existence, unavoidably lose information by attributing a peptide to a single class or suffer from a suboptimal choice of the representative classes. Results To alleviate the above challenges, we propose a method that constructs a peptide’s structural representation from the sequence, reflecting its similarity to several basic representative structures. For 5-mer peptides and 16 representative structures, we achieved the Q16 classification accuracy of 67.9%, which is higher than what is currently reported in the literature. Our prediction method does not utilize information about protein homologues but relies only on the amino acids’ physicochemical properties and the resolved structures’ statistics. We also show that the 3D coordinates of a peptide can be uniquely recovered from its structural coordinates, and show the required conditions under various geometric constraints.
Collapse
Affiliation(s)
- Vladislava Milchevskaya
- Institute of Medical Statistics and Bioinformatics, Faculty of Medicine, University of Cologne, Cologne, Germany
- * E-mail: (VM); (YVM)
| | | | | | - Ivan V. Filatov
- Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | | | | | | | - Yury V. Milchevskiy
- Engelhardt Institute of Molecular Biology, Moscow, Russia
- * E-mail: (VM); (YVM)
| |
Collapse
|
6
|
Dabravolski SA. Evolutionary aspects of the Viridiplantae nitroreductases. J Genet Eng Biotechnol 2020; 18:60. [PMID: 33025290 PMCID: PMC7538488 DOI: 10.1186/s43141-020-00073-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2020] [Accepted: 09/14/2020] [Indexed: 11/10/2022]
Abstract
Background Nitroreductases are a family of evolutionarily related proteins catalyzing the reduction of nitro-substituted compounds. Nitroreductases are widespread enzymes, but nearly all modern research and practical application have been concentrated on the bacterial proteins, mainly nitroreductases of Escherichia coli. The main aim of this study is to describe the phylogenic distribution of the nitroreductases in the photosynthetic eukaryotes (Viridiplantae) to highlight their structural similarity and areas for future research and application. Results This study suggests that homologs of nitroreductase proteins are widely presented also in Viridiplantae. Maximum likelihood phylogenetic tree reconstruction method and comparison of the structural models suggest close evolutional relation between cyanobacterial and Viridiplantae nitroreductases. Conclusions This study provides the first attempt to understand the evolution of nitroreductase protein family in Viridiplantae. Our phylogeny estimation and preservation of the chloroplasts/mitochondrial localization indicate the evolutional origin of the plant nitroreductases from the cyanobacterial endosymbiont. A defined high level of the similarity on the structural level suggests conservancy also for the functions. Directions for the future research and industrial application of the Viridiplantae nitroreductases are discussed.
Collapse
Affiliation(s)
- Siarhei A Dabravolski
- Department of Clinical Diagnostics, Vitebsk State Academy of Veterinary Medicine [UO VGAVM], 7/11 Dovatora St., 210026, Vitebsk, Belarus.
| |
Collapse
|
7
|
Wen Z, He J, Huang SY. Topology-independent and global protein structure alignment through an FFT-based algorithm. Bioinformatics 2020; 36:478-486. [PMID: 31384919 DOI: 10.1093/bioinformatics/btz609] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Revised: 07/22/2019] [Accepted: 08/02/2019] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION Protein structure alignment is one of the fundamental problems in computational structure biology. A variety of algorithms have been developed to address this important issue in the past decade. However, due to their heuristic nature, current structure alignment methods may suffer from suboptimal alignment and/or over-fragmentation and thus lead to a biologically wrong alignment in some cases. To overcome these limitations, we have developed an accurate topology-independent and global structure alignment method through an FFT-based exhaustive search algorithm, which is referred to as FTAlign. RESULTS Our FTAlign algorithm was extensively tested on six commonly used datasets and compared with seven state-of-the-art structure alignment approaches, TMalign, DeepAlign, Kpax, 3DCOMB, MICAN, SPalignNS and CLICK. It was shown that FTAlign outperformed the other methods in reproducing manually curated alignments and obtained a high success rate of 96.7 and 90.0% on two gold-standard benchmarks, MALIDUP and MALISAM, respectively. Moreover, FTAlign also achieved the overall best performance in terms of biologically meaningful structure overlap (SO) and TMscore on both the sequential alignment test sets including MALIDUP, MALISAM and 64 difficult cases from HOMSTRAD, and the non-sequential sets including MALIDUP-NS, MALISAM-NS, 199 topology-different cases, where FTAlign especially showed more advantage for non-sequential alignment. Despite its global search feature, FTAlign is also computationally efficient and can normally complete a pairwise alignment within one second. AVAILABILITY AND IMPLEMENTATION http://huanglab.phys.hust.edu.cn/ftalign/.
Collapse
Affiliation(s)
- Zeyu Wen
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Jiahua He
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, People's Republic of China
| |
Collapse
|
8
|
Faure G, Joseph AP, Craveur P, Narwani TJ, Srinivasan N, Gelly JC, Rebehmed J, de Brevern AG. iPBAvizu: a PyMOL plugin for an efficient 3D protein structure superimposition approach. SOURCE CODE FOR BIOLOGY AND MEDICINE 2019; 14:5. [PMID: 31700529 PMCID: PMC6825713 DOI: 10.1186/s13029-019-0075-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 10/14/2019] [Indexed: 11/10/2022]
Abstract
Background Protein 3D structure is the support of its function. Comparison of 3D protein structures provides insight on their evolution and their functional specificities and can be done efficiently via protein structure superimposition analysis. Multiple approaches have been developed to perform such task and are often based on structural superimposition deduced from sequence alignment, which does not take into account structural features. Our methodology is based on the use of a Structural Alphabet (SA), i.e. a library of 3D local protein prototypes able to approximate protein backbone. The interest of a SA is to translate into 1D sequences into the 3D structures. Results We used Protein blocks (PB), a widely used SA consisting of 16 prototypes, each representing a conformation of the pentapeptide skeleton defined in terms of dihedral angles. Proteins are described using PB from which we have previously developed a sequence alignment procedure based on dynamic programming with a dedicated PB Substitution Matrix. We improved the procedure with a specific two-step search: (i) very similar regions are selected using very high weights and aligned, and (ii) the alignment is completed (if possible) with less stringent parameters. Our approach, iPBA, has shown to perform better than other available tools in benchmark tests. To facilitate the usage of iPBA, we designed and implemented iPBAvizu, a plugin for PyMOL that allows users to run iPBA in an easy way and analyse protein superimpositions. Conclusions iPBAvizu is an implementation of iPBA within the well-known and widely used PyMOL software. iPBAvizu enables to generate iPBA alignments, create and interactively explore structural superimposition, and assess the quality of the protein alignments.
Collapse
Affiliation(s)
- Guilhem Faure
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France
| | - Agnel Praveen Joseph
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France.,INSERM UMR_S 1134, DSIMB, Université de Paris, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, F-75739, Paris cedex 15, France.,Laboratoire d'Excellence GR-Ex, F-75739 Paris, France.,4Birkbeck College, University of London, London, UK
| | - Pierrick Craveur
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France.,INSERM UMR_S 1134, DSIMB, Université de Paris, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, F-75739, Paris cedex 15, France.,Laboratoire d'Excellence GR-Ex, F-75739 Paris, France.,5Molecular Graphics Laboratory, Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037 USA
| | - Tarun J Narwani
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France.,INSERM UMR_S 1134, DSIMB, Université de Paris, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, F-75739, Paris cedex 15, France.,Laboratoire d'Excellence GR-Ex, F-75739 Paris, France
| | | | - Jean-Christophe Gelly
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France.,INSERM UMR_S 1134, DSIMB, Université de Paris, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, F-75739, Paris cedex 15, France.,Laboratoire d'Excellence GR-Ex, F-75739 Paris, France
| | - Joseph Rebehmed
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France.,INSERM UMR_S 1134, DSIMB, Université de Paris, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, F-75739, Paris cedex 15, France.,Laboratoire d'Excellence GR-Ex, F-75739 Paris, France.,7Department of Computer Science and Mathematics, Lebanese American University, Beirut, Lebanon
| | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, Univ Paris, Univ de la Réunion, Univ des Antilles, F-75739 Paris, France.,INSERM UMR_S 1134, DSIMB, Université de Paris, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, F-75739, Paris cedex 15, France.,Laboratoire d'Excellence GR-Ex, F-75739 Paris, France
| |
Collapse
|
9
|
Vetrivel I, de Brevern AG, Cadet F, Srinivasan N, Offmann B. Structural variations within proteins can be as large as variations observed across their homologues. Biochimie 2019; 167:162-170. [PMID: 31560932 DOI: 10.1016/j.biochi.2019.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 09/18/2019] [Indexed: 10/26/2022]
Abstract
Understanding the structural plasticity of proteins is key to understanding the intricacies of their functions and mechanistic basis. In the current study, we analyzed the available multiple crystal structures of the same protein for the structural differences. For this purpose we used an abstraction of protein structures referred as Protein Blocks (PBs) that was previously established. We also characterized the nature of the structural variations for a few proteins using molecular dynamics simulations. In both the cases, the structural variations were summarized in the form of substitution matrices of PBs. We show that certain conformational states are preferably replaced by other specific conformational states. Interestingly, these structural variations are highly similar to those previously observed across structures of homologous proteins (r2 = 0.923) or across the ensemble of conformations from NMR data (r2 = 0.919). Thus our study quantitatively shows that overall trends of structural changes in a given protein are nearly identical to the trends of structural differences that occur in the topologically equivalent positions in homologous proteins. Specific case studies are used to illustrate the nature of these structural variations.
Collapse
Affiliation(s)
- Iyanar Vetrivel
- Université de Nantes, UFIP UMR 6286 CNRS, UFR Sciences et Techniques, 2 Chemin de La Houssinière, Nantes, France
| | - Alexandre G de Brevern
- INSERM UMR_S 1134, DSIMB Team, Laboratory of Excellence, GR-Ex, Univ Paris Diderot, Univ Sorbonne Paris Cité, INTS, 6 Rue Alexandre Cabanel, Paris, France
| | - Frédéric Cadet
- University of Paris, UMR_S1134, BIGR, Inserm, F-75015, Paris, France; DSIMB, UMR_S1134, BIGR, Inserm, Laboratory of Excellence GR-Ex, Faculty of Sciences and Technology, University of La Reunion, F-97715, Saint-Denis, France; PEACCEL, Protein Engineering Accelerator, 6 Square Albin Cachot, Box 42, 75013, Paris, France
| | | | - Bernard Offmann
- Université de Nantes, UFIP UMR 6286 CNRS, UFR Sciences et Techniques, 2 Chemin de La Houssinière, Nantes, France.
| |
Collapse
|
10
|
Van de Weyer AL, Monteiro F, Furzer OJ, Nishimura MT, Cevik V, Witek K, Jones JDG, Dangl JL, Weigel D, Bemm F. A Species-Wide Inventory of NLR Genes and Alleles in Arabidopsis thaliana. Cell 2019; 178:1260-1272.e14. [PMID: 31442410 PMCID: PMC6709784 DOI: 10.1016/j.cell.2019.07.038] [Citation(s) in RCA: 200] [Impact Index Per Article: 40.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 06/13/2019] [Accepted: 07/19/2019] [Indexed: 12/18/2022]
Abstract
Infectious disease is both a major force of selection in nature and a prime cause of yield loss in agriculture. In plants, disease resistance is often conferred by nucleotide-binding leucine-rich repeat (NLR) proteins, intracellular immune receptors that recognize pathogen proteins and their effects on the host. Consistent with extensive balancing and positive selection, NLRs are encoded by one of the most variable gene families in plants, but the true extent of intraspecific NLR diversity has been unclear. Here, we define a nearly complete species-wide pan-NLRome in Arabidopsis thaliana based on sequence enrichment and long-read sequencing. The pan-NLRome largely saturates with approximately 40 well-chosen wild strains, with half of the pan-NLRome being present in most accessions. We chart NLR architectural diversity, identify new architectures, and quantify selective forces that act on specific NLRs and NLR domains. Our study provides a blueprint for defining pan-NLRomes.
Collapse
Affiliation(s)
- Anna-Lena Van de Weyer
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| | - Freddy Monteiro
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599-3280, USA; Center for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, 08193 Barcelona, Spain
| | - Oliver J Furzer
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599-3280, USA; The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK
| | - Marc T Nishimura
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA
| | - Volkan Cevik
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK; Milner Centre for Evolution & Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | - Kamil Witek
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK
| | - Jonathan D G Jones
- The Sainsbury Laboratory, University of East Anglia, Norwich Research Park, Norwich NR4 7UH, UK.
| | - Jeffery L Dangl
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA.
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany.
| | - Felix Bemm
- Department of Molecular Biology, Max Planck Institute for Developmental Biology, 72076 Tübingen, Germany
| |
Collapse
|
11
|
Narwani TJ, Craveur P, Shinada NK, Floch A, Santuz H, Vattekatte AM, Srinivasan N, Rebehmed J, Gelly JC, Etchebest C, de Brevern AG. Discrete analyses of protein dynamics. J Biomol Struct Dyn 2019; 38:2988-3002. [PMID: 31361191 DOI: 10.1080/07391102.2019.1650112] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tarun Jairaj Narwani
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Pierrick Craveur
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nicolas K Shinada
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Discngine, SAS, Paris, France
| | - Aline Floch
- Laboratoire D'Excellence GR-Ex, Paris, France.,Etablissement Français du Sang Ile de France, Créteil, France.,IMRB - INSERM U955 Team 2 « Transfusion et Maladies du Globule Rouge », Paris Est- Créteil Univ, Créteil, France.,UPEC, Université Paris Est-Créteil, Créteil, France
| | - Hubert Santuz
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Akhila Melarkode Vattekatte
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | | | - Joseph Rebehmed
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| | - Catherine Etchebest
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | - Alexandre G de Brevern
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| |
Collapse
|
12
|
A minimum set of stable blocks for rational design of polypeptide chains. Biochimie 2019; 160:88-92. [DOI: 10.1016/j.biochi.2019.02.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/13/2019] [Indexed: 12/30/2022]
|
13
|
Vetrivel I, Mahajan S, Tyagi M, Hoffmann L, Sanejouand YH, Srinivasan N, de Brevern AG, Cadet F, Offmann B. Knowledge-based prediction of protein backbone conformation using a structural alphabet. PLoS One 2017; 12:e0186215. [PMID: 29161266 PMCID: PMC5697859 DOI: 10.1371/journal.pone.0186215] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 09/27/2017] [Indexed: 01/19/2023] Open
Abstract
Libraries of structural prototypes that abstract protein local structures are known as structural alphabets and have proven to be very useful in various aspects of protein structure analyses and predictions. One such library, Protein Blocks, is composed of 16 standard 5-residues long structural prototypes. This form of analyzing proteins involves drafting its structure as a string of Protein Blocks. Predicting the local structure of a protein in terms of protein blocks is the general objective of this work. A new approach, PB-kPRED is proposed towards this aim. It involves (i) organizing the structural knowledge in the form of a database of pentapeptide fragments extracted from all protein structures in the PDB and (ii) applying a knowledge-based algorithm that does not rely on any secondary structure predictions and/or sequence alignment profiles, to scan this database and predict most probable backbone conformations for the protein local structures. Though PB-kPRED uses the structural information from homologues in preference, if available. The predictions were evaluated rigorously on 15,544 query proteins representing a non-redundant subset of the PDB filtered at 30% sequence identity cut-off. We have shown that the kPRED method was able to achieve mean accuracies ranging from 40.8% to 66.3% depending on the availability of homologues. The impact of the different strategies for scanning the database on the prediction was evaluated and is discussed. Our results highlight the usefulness of the method in the context of proteins without any known structural homologues. A scoring function that gives a good estimate of the accuracy of prediction was further developed. This score estimates very well the accuracy of the algorithm (R2 of 0.82). An online version of the tool is provided freely for non-commercial usage at http://www.bo-protscience.fr/kpred/.
Collapse
Affiliation(s)
- Iyanar Vetrivel
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| | - Swapnil Mahajan
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
- DSIMB, INSERM, UMR S-1134, Laboratory of Excellence, GR-Ex, Université de La Réunion, Faculty of Sciences and Technology, Saint Denis Cedex, La Réunion, France
| | - Manoj Tyagi
- Université de La Réunion, Saint Denis Cedex, La Réunion, France
| | - Lionel Hoffmann
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| | - Yves-Henri Sanejouand
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| | | | - Alexandre G. de Brevern
- INSERM UMR_S 1134, DSIMB team, Laboratory of Excellence, GR-Ex, Univ Paris Diderot, Univ Sorbonne Paris Cité, INTS, rue Alexandre Cabanel, Paris, France
| | - Frédéric Cadet
- DSIMB, INSERM, UMR S-1134, Laboratory of Excellence, GR-Ex, Université de La Réunion, Faculty of Sciences and Technology, Saint Denis Cedex, La Réunion, France
- PEACCEL SAS, Paris, France
| | - Bernard Offmann
- Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines (UFIP), UMR 6286 CNRS, UFR Sciences et Techniques, 2, chemin de la Houssinière, France
| |
Collapse
|
14
|
Sarkar B, Kulharia M, Mantha AK. Understanding human thiol dioxygenase enzymes: structure to function, and biology to pathology. Int J Exp Pathol 2017; 98:52-66. [PMID: 28439920 DOI: 10.1111/iep.12222] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 01/18/2017] [Indexed: 12/15/2022] Open
Abstract
Amino acid metabolism is a significant metabolic activity in humans, especially of sulphur-containing amino acids, methionine and cysteine (Cys). Cys is cytotoxic and neurotoxic in nature; hence, mammalian cells maintain a constant intracellular level of Cys. Metabolism of Cys is mainly regulated by two thiol dioxygenases: cysteine dioxygenase (CDO) and 2-aminoethanethiol dioxygenase (ADO). CDO and ADO are the only human thiol dioxygenases reported with a role in Cys metabolism and localized to mitochondria. This metabolic pathway is important in various human disorders, as it is responsible for the synthesis of antioxidant glutathione and is also for the synthesis of hypotaurine and taurine. CDO is the most extensively studied protein, whose high-resolution crystallographic structures have been solved. As compared to CDO, ADO is less studied, even though it has a key role in cysteamine metabolism. To further understand ADO's structure and function, the three-dimensional structures have been predicted from I-TASSER and SWISS-MODEL servers and validated with PROCHECK software. Structural superimposition approach using iPBA web server further confirmed near-identical structures (including active sites) for the predicted protein models of ADO as compared to CDO. In addition, protein-protein interaction and their association in patho-physiology are crucial in understanding protein functions. Both ADO and CDO interacting partner profiles have been presented using STRING database. In this study, we have predicted a 3D model structure for ADO and summarized the biological roles and the pathological consequences which are associated with the altered expression and functioning of ADO and CDO in case of cancer, neurodegenerative disorders and other human diseases.
Collapse
Affiliation(s)
- Bibekananda Sarkar
- Center for Animal Sciences, School of Basic and Applied Sciences, Central University of Punjab, Bathinda, Punjab, India
| | - Mahesh Kulharia
- Center for Computational Sciences, School of Basic and Applied Sciences, Central University of Punjab, Bathinda, Punjab, India
| | - Anil K Mantha
- Center for Animal Sciences, School of Basic and Applied Sciences, Central University of Punjab, Bathinda, Punjab, India
| |
Collapse
|
15
|
Noël F, Malpertuy A, de Brevern AG. Global analysis of VHHs framework regions with a structural alphabet. Biochimie 2016; 131:11-19. [PMID: 27613403 DOI: 10.1016/j.biochi.2016.09.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Revised: 09/05/2016] [Accepted: 09/05/2016] [Indexed: 02/08/2023]
Abstract
The VHHs are antigen-binding region/domain of camelid heavy chain antibodies (HCAb). They have many interesting biotechnological and biomedical properties due to their small size, high solubility and stability, and high affinity and specificity for their antigens. HCAb and classical IgGs are evolutionary related and share a common fold. VHHs are composed of regions considered as constant, called the frameworks (FRs) connected by Complementarity Determining Regions (CDRs), a highly variable region that provide interaction with the epitope. Actually, no systematic structural analyses had been performed on VHH structures despite a significant number of structures. This work is the first study to analyse the structural diversity of FRs of VHHs. Using a structural alphabet that allows approximating the local conformation, we show that each of the four FRs do not have a unique structure but exhibit many structural variant patterns. Moreover, no direct simple link between the local conformational change and amino acid composition can be detected. These results indicate that long-range interactions affect the local conformation of FRs and impact the building of structural models.
Collapse
Affiliation(s)
- Floriane Noël
- INSERM, U 1134, DSIMB, F-75739 Paris, France; Univ Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France; Laboratoire d'Excellence GR-Ex, F-75739 Paris, France
| | | | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, F-75739 Paris, France; Univ Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France; Laboratoire d'Excellence GR-Ex, F-75739 Paris, France.
| |
Collapse
|
16
|
iDPF-PseRAAAC: A Web-Server for Identifying the Defensin Peptide Family and Subfamily Using Pseudo Reduced Amino Acid Alphabet Composition. PLoS One 2015; 10:e0145541. [PMID: 26713618 PMCID: PMC4694767 DOI: 10.1371/journal.pone.0145541] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Accepted: 12/04/2015] [Indexed: 11/29/2022] Open
Abstract
Defensins as one of the most abundant classes of antimicrobial peptides are an essential part of the innate immunity that has evolved in most living organisms from lower organisms to humans. To identify specific defensins as interesting antifungal leads, in this study, we constructed a more rigorous benchmark dataset and the iDPF-PseRAAAC server was developed to predict the defensin family and subfamily. Using reduced dipeptide compositions were used, the overall accuracy of proposed method increased to 95.10% for the defensin family, and 98.39% for the vertebrate subfamily, which is higher than the accuracy from other methods. The jackknife test shows that more than 4% improvement was obtained comparing with the previous method. A free online server was further established for the convenience of most experimental scientists at http://wlxy.imu.edu.cn/college/biostation/fuwu/iDPF-PseRAAAC/index.asp. A friendly guide is provided to describe how to use the web server. We anticipate that iDPF-PseRAAAC may become a useful high-throughput tool for both basic research and drug design.
Collapse
|
17
|
Joo H, Chavan AG, Fraga KJ, Tsai J. An amino acid code for irregular and mixed protein packing. Proteins 2015; 83:2147-61. [PMID: 26370334 DOI: 10.1002/prot.24929] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Revised: 09/01/2015] [Accepted: 09/02/2015] [Indexed: 11/10/2022]
Abstract
To advance our understanding of protein tertiary structure, the development of the knob-socket model is completed in an analysis of the packing in irregular coil and turn secondary structure packing as well as between mixed secondary structure. The knob-socket model simplifies packing based on repeated patterns of two motifs: a three-residue socket for packing within secondary (2°) structure and a four-residue knob-socket for tertiary (3°) packing. For coil and turn secondary structure, knob-sockets allow identification of a correlation between amino acid composition and tertiary arrangements in space. Coil contributes almost as much as α-helices to tertiary packing. In irregular sockets, Gly, Pro, Asp, and Ser are favored, while in irregular knobs, the preference order is Arg, Asp, Pro, Asn, Thr, Leu, and Gly. Cys, His,Met, and Trp are not favored in either. In mixed packing, the knob amino acid preferences are a function of the socket that they are packing into, whereas the amino acid composition of the sockets does not depend on the secondary structure of the knob. A unique motif of a coil knob with an XYZ β-sheet socket may potentially function to inhibit β-sheet extension. In addition, analysis of the preferred crossing angles for strands within a β-sheet and mixed α-helice/β-sheet identifies canonical packing patterns useful in protein design. Lastly, the knob-socket model abstracts the complexity of protein tertiary structure into an intuitive packing surface topology map.
Collapse
Affiliation(s)
- Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, California, 95211
| | - Archana G Chavan
- Department of Chemistry, University of the Pacific, Stockton, California, 95211
| | - Keith J Fraga
- Department of Chemistry, University of the Pacific, Stockton, California, 95211
| | - Jerry Tsai
- Department of Chemistry, University of the Pacific, Stockton, California, 95211
| |
Collapse
|
18
|
Mahajan S, de Brevern AG, Sanejouand YH, Srinivasan N, Offmann B. Use of a structural alphabet to find compatible folds for amino acid sequences. Protein Sci 2014; 24:145-53. [PMID: 25297700 DOI: 10.1002/pro.2581] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 10/06/2014] [Indexed: 01/01/2023]
Abstract
The structural annotation of proteins with no detectable homologs of known 3D structure identified using sequence-search methods is a major challenge today. We propose an original method that computes the conditional probabilities for the amino-acid sequence of a protein to fit to known protein 3D structures using a structural alphabet, known as "Protein Blocks" (PBs). PBs constitute a library of 16 local structural prototypes that approximate every part of protein backbone structures. It is used to encode 3D protein structures into 1D PB sequences and to capture sequence to structure relationships. Our method relies on amino acid occurrence matrices, one for each PB, to score global and local threading of query amino acid sequences to protein folds encoded into PB sequences. It does not use any information from residue contacts or sequence-search methods or explicit incorporation of hydrophobic effect. The performance of the method was assessed with independent test datasets derived from SCOP 1.75A. With a Z-score cutoff that achieved 95% specificity (i.e., less than 5% false positives), global and local threading showed sensitivity of 64.1% and 34.2%, respectively. We further tested its performance on 57 difficult CASP10 targets that had no known homologs in PDB: 38 compatible templates were identified by our approach and 66% of these hits yielded correctly predicted structures. This method scales-up well and offers promising perspectives for structural annotations at genomic level. It has been implemented in the form of a web-server that is freely available at http://www.bo-protscience.fr/forsa.
Collapse
Affiliation(s)
- Swapnil Mahajan
- Université de La Réunion, DSIMB, UMR-S S1134, Saint Denis Messag Cedex 09, La Réunion, F-97715, France; INSERM, UMR-S 1134, DSIMB, F-75739, Paris, France; Laboratoire d'Excellence, GR-Ex, Paris, F-75739, France; Université de Nantes, UFIP CNRS UMR 6286 Faculté des Sciences et Techniques, 2 rue de la Houssinière, 44392, Nantes Cedex 03, France
| | | | | | | | | |
Collapse
|
19
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
20
|
Mao W, Cong P, Wang Z, Lu L, Zhu Z, Li T. NMRDSP: an accurate prediction of protein shape strings from NMR chemical shifts and sequence data. PLoS One 2013; 8:e83532. [PMID: 24376713 PMCID: PMC3871590 DOI: 10.1371/journal.pone.0083532] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2013] [Accepted: 11/04/2013] [Indexed: 11/28/2022] Open
Abstract
Shape string is structural sequence and is an extremely important structure representation of protein backbone conformations. Nuclear magnetic resonance chemical shifts give a strong correlation with the local protein structure, and are exploited to predict protein structures in conjunction with computational approaches. Here we demonstrate a novel approach, NMRDSP, which can accurately predict the protein shape string based on nuclear magnetic resonance chemical shifts and structural profiles obtained from sequence data. The NMRDSP uses six chemical shifts (HA, H, N, CA, CB and C) and eight elements of structure profiles as features, a non-redundant set (1,003 entries) as the training set, and a conditional random field as a classification algorithm. For an independent testing set (203 entries), we achieved an accuracy of 75.8% for S8 (the eight states accuracy) and 87.8% for S3 (the three states accuracy). This is higher than only using chemical shifts or sequence data, and confirms that the chemical shift and the structure profile are significant features for shape string prediction and their combination prominently improves the accuracy of the predictor. We have constructed the NMRDSP web server and believe it could be employed to provide a solid platform to predict other protein structures and functions. The NMRDSP web server is freely available at http://cal.tongji.edu.cn/NMRDSP/index.jsp.
Collapse
Affiliation(s)
- Wusong Mao
- Department of Chemistry, Tongji University, Shanghai, China
| | - Peisheng Cong
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| | - Zhiheng Wang
- Department of Chemistry, Tongji University, Shanghai, China
| | - Longjian Lu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Zhongliang Zhu
- Department of Chemistry, Tongji University, Shanghai, China
| | - Tonghua Li
- Department of Chemistry, Tongji University, Shanghai, China
- * E-mail: (PC); (TL)
| |
Collapse
|
21
|
Craveur P, Joseph AP, Rebehmed J, de Brevern AG. β-Bulges: extensive structural analyses of β-sheets irregularities. Protein Sci 2013; 22:1366-78. [PMID: 23904395 DOI: 10.1002/pro.2324] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Revised: 07/19/2013] [Accepted: 07/22/2013] [Indexed: 12/30/2022]
Abstract
β-Sheets are quite frequent in protein structures and are stabilized by regular main-chain hydrogen bond patterns. Irregularities in β-sheets, named β-bulges, are distorted regions between two consecutive hydrogen bonds. They disrupt the classical alternation of side chain direction and can alter the directionality of β-strands. They are implicated in protein-protein interactions and are introduced to avoid β-strand aggregation. Five different types of β-bulges are defined. Previous studies on β-bulges were performed on a limited number of protein structures or one specific family. These studies evoked a potential conservation during evolution. In this work, we analyze the β-bulge distribution and conservation in terms of local backbone conformations and amino acid composition. Our dataset consists of 66 times more β-bulges than the last systematic study (Chan et al. Protein Science 1993, 2:1574-1590). Novel amino acid preferences are underlined and local structure conformations are highlighted by the use of a structural alphabet. We observed that β-bulges are preferably localized at the N- and C-termini of β-strands, but contrary to the earlier studies, no significant conservation of β-bulges was observed among structural homologues. Displacement of β-bulges along the sequence was also investigated by Molecular Dynamics simulations.
Collapse
Affiliation(s)
- Pierrick Craveur
- INSERM, U665, DSIMB, F-75739, Paris, France; University of Paris Diderot, Sorbonne Paris Cité, UMR_S 665, F-75739, Paris, France; Institut National de la Transfusion Sanguine (INTS), F-75739, Paris, France; Laboratoire d'Excellence GR-Ex, F-75739, Paris, France
| | | | | | | |
Collapse
|
22
|
Léonard S, Joseph AP, Srinivasan N, Gelly JC, de Brevern AG. mulPBA: an efficient multiple protein structure alignment method based on a structural alphabet. J Biomol Struct Dyn 2013; 32:661-8. [PMID: 23659291 DOI: 10.1080/07391102.2013.787026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The increasing number of available protein structures requires efficient tools for multiple structure comparison. Indeed, multiple structural alignments are essential for the analysis of function, evolution and architecture of protein structures. For this purpose, we proposed a new web server called multiple Protein Block Alignment (mulPBA). This server implements a method based on a structural alphabet to describe the backbone conformation of a protein chain in terms of dihedral angles. This 'sequence-like' representation enables the use of powerful sequence alignment methods for primary structure comparison, followed by an iterative refinement of the structural superposition. This approach yields alignments superior to most of the rigid-body alignment methods and highly comparable with the flexible structure comparison approaches. We implement this method in a web server designed to do multiple structure superimpositions from a set of structures given by the user. Outputs are given as both sequence alignment and superposed 3D structures visualized directly by static images generated by PyMol or through a Jmol applet allowing dynamic interaction. Multiple global quality measures are given. Relatedness between structures is indicated by a distance dendogram. Superimposed structures in PDB format can be also downloaded, and the results are quickly obtained. mulPBA server can be accessed at www.dsimb.inserm.fr/dsimb_tools/mulpba/ .
Collapse
Affiliation(s)
- Sylvain Léonard
- a INSERM UMR-S 665, DSIMB , 6, rue Alexandre Cabanel, F-75739 , Paris , France
| | | | | | | | | |
Collapse
|
23
|
Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012; 7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open
Abstract
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Hélène Valadié
- INSERM UMR-S 726, DSIMB, Université Paris Diderot - Paris 7, Paris, France
| | | | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
24
|
Joseph AP, Srinivasan N, de Brevern AG. Progressive structure-based alignment of homologous proteins: Adopting sequence comparison strategies. Biochimie 2012; 94:2025-34. [PMID: 22676903 DOI: 10.1016/j.biochi.2012.05.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 05/21/2012] [Indexed: 12/30/2022]
Abstract
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| | | | | |
Collapse
|
25
|
Joseph AP, Srinivasan N, de Brevern AG. Cis-trans peptide variations in structurally similar proteins. Amino Acids 2012; 43:1369-81. [PMID: 22227866 DOI: 10.1007/s00726-011-1211-9] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2011] [Accepted: 12/22/2011] [Indexed: 12/30/2022]
Abstract
The presence of energetically less favourable cis peptides in protein structures has been observed to be strongly associated with its structural integrity and function. Inter-conversion between the cis and trans conformations also has an important role in the folding process. In this study, we analyse the extent of conservation of cis peptides among similar folds. We look at both the amino acid preferences and local structural changes associated with such variations. Nearly 34% of the Xaa-Proline cis bonds are not conserved in structural relatives; Proline also has a high tendency to get replaced by another amino acid in the trans conformer. At both positions bounding the peptide bond, Glycine has a higher tendency to lose the cis conformation. The cis conformation of more than 30% of β turns of type VIb and IV are not found to be conserved in similar structures. A different view using Protein Block-based description of backbone conformation, suggests that many of the local conformational changes are highly different from the general local structural variations observed among structurally similar proteins. Changes between cis and trans conformations are found to be associated with the evolution of new functions facilitated by local structural changes. This is most frequent in enzymes where new catalytic activity emerges with local changes in the active site. Cis-trans changes are also seen to facilitate inter-domain and inter-protein interactions. As in the case of folding, cis-trans conversions have been used as an important driving factor in evolution.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, Université Denis Diderot-Paris 7, INTS, 6 rue Alexandre Cabanel, Paris Cedex 15, France
| | | | | |
Collapse
|
26
|
Suresh V, Ganesan K, Parthasarathy S. PDB-2-PB: a curated online protein block sequence database. J Appl Crystallogr 2011. [DOI: 10.1107/s0021889811052356] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
This article describes the development of a curated online protein block sequence database, PDB-2-PB. The protein block sequences for protein structures with complete backbone coordinates have been encoded using the encoding procedure of de Brevern, Etchebest & Hazout [Proteins(2000),41, 271–287]. In the current release of the PDB-2-PB database (version 1.0), the protein entries from a recent release of the World Wide Protein Data Bank (wwPDB), which has 74 297 solved PDB entries as of 7 July 2011, have been used as a primary source. The PDB-2-PB database stores the protein block sequences for all the chains present in a protein structure. PDB-2-PB version 1.0 has the curated protein block sequences for 103 252 PDB chain entries (93 547 X-ray, 7033 NMR and 2672 other experimental chain entries). From the PDB-2-PB database, users can extract the curated protein block sequence and its corresponding amino acid sequence, which is extracted from the PDB ATOM records. Users can download these sequences either by using the PDB code or by using various parameters listed in the database. The PDB-2-PB database is freely available at http://bioinfo.bdu.ac.in/~pb/.
Collapse
|
27
|
Gelly JC, Joseph AP, Srinivasan N, de Brevern AG. iPBA: a tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res 2011; 39:W18-23. [PMID: 21586582 PMCID: PMC3125758 DOI: 10.1093/nar/gkr333] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
With the immense growth in the number of available protein structures, fast and accurate structure comparison has been essential. We propose an efficient method for structure comparison, based on a structural alphabet. Protein Blocks (PBs) is a widely used structural alphabet with 16 pentapeptide conformations that can fairly approximate a complete protein chain. Thus a 3D structure can be translated into a 1D sequence of PBs. With a simple Needleman–Wunsch approach and a raw PB substitution matrix, PB-based structural alignments were better than many popular methods. iPBA web server presents an improved alignment approach using (i) specialized PB Substitution Matrices (SM) and (ii) anchor-based alignment methodology. With these developments, the quality of ∼88% of alignments was improved. iPBA alignments were also better than DALI, MUSTANG and GANGSTA+ in >80% of the cases. The webserver is designed to for both pairwise comparisons and database searches. Outputs are given as sequence alignment and superposed 3D structures displayed using PyMol and Jmol. A local alignment option for detecting subs-structural similarity is also embedded. As a fast and efficient ‘sequence-based’ structure comparison tool, we believe that it will be quite useful to the scientific community. iPBA can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/ipba/.
Collapse
Affiliation(s)
- Jean-Christophe Gelly
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine, 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France
| | | | | | | |
Collapse
|
28
|
Joseph AP, Srinivasan N, de Brevern AG. Improvement of protein structure comparison using a structural alphabet. Biochimie 2011; 93:1434-45. [PMID: 21569819 DOI: 10.1016/j.biochi.2011.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 04/12/2011] [Indexed: 12/29/2022]
Abstract
The three dimensional structure of a protein provides major insights into its function. Protein structure comparison has implications in functional and evolutionary studies. A structural alphabet (SA) is a library of local protein structure prototypes that can abstract every part of protein main chain conformation. Protein Blocks (PBs) is a widely used SA, composed of 16 prototypes, each representing a pentapeptide backbone conformation defined in terms of dihedral angles. Through this description, the 3D structural information can be translated into a 1D sequence of PBs. In a previous study, we have used this approach to compare protein structures encoded in terms of PBs. A classical sequence alignment procedure based on dynamic programming was used, with a dedicated PB Substitution Matrix (SM). PB-based pairwise structural alignment method gave an excellent performance, when compared to other established methods for mining. In this study, we have (i) refined the SMs and (ii) improved the Protein Block Alignment methodology (named as iPBA). The SM was normalized in regards to sequence and structural similarity. Alignment of protein structures often involves similar structural regions separated by dissimilar stretches. A dynamic programming algorithm that weighs these local similar stretches has been designed. Amino acid substitutions scores were also coupled linearly with the PB substitutions. iPBA improves (i) the mining efficiency rate by 6.8% and (ii) more than 82% of the alignments have a better quality. A higher efficiency in aligning multi-domain proteins could be also demonstrated. The quality of alignment is better than DALI and MUSTANG in 81.3% of the cases. Thus our study has resulted in an impressive improvement in the quality of protein structural alignment.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France.
| | | | | |
Collapse
|
29
|
Species specific amino acid sequence–protein local structure relationships: An analysis in the light of a structural alphabet. J Theor Biol 2011; 276:209-17. [DOI: 10.1016/j.jtbi.2011.01.047] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2010] [Revised: 01/28/2011] [Accepted: 01/31/2011] [Indexed: 11/24/2022]
|
30
|
Mansiaux Y, Joseph AP, Gelly JC, de Brevern AG. Assignment of PolyProline II conformation and analysis of sequence--structure relationship. PLoS One 2011; 6:e18401. [PMID: 21483785 PMCID: PMC3069088 DOI: 10.1371/journal.pone.0018401] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2010] [Accepted: 03/07/2011] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely α-helices and β-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein-protein interactions. METHODOLOGY/PRINCIPAL FINDINGS A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined. CONCLUSIONS/SIGNIFICANCE The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence-structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field.
Collapse
Affiliation(s)
- Yohann Mansiaux
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Jean-Christophe Gelly
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Université Paris Diderot - Paris 7, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
31
|
Agarwal G, Mahajan S, Srinivasan N, de Brevern AG. Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks. PLoS One 2011; 6:e17826. [PMID: 21445259 PMCID: PMC3060819 DOI: 10.1371/journal.pone.0017826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 02/15/2011] [Indexed: 11/18/2022] Open
Abstract
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Collapse
Affiliation(s)
- Garima Agarwal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Swapnil Mahajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore, India
| | | | - Alexandre G. de Brevern
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot - Paris 7, UMR-S665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| |
Collapse
|
32
|
Joseph AP, Agarwal G, Mahajan S, Gelly JC, Swapna LS, Offmann B, Cadet F, Bornot A, Tyagi M, Valadié H, Schneider B, Etchebest C, Srinivasan N, De Brevern AG. A short survey on protein blocks. Biophys Rev 2010; 2:137-147. [PMID: 21731588 DOI: 10.1007/s12551-010-0036-1] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Protein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and β-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as "structural alphabets". We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- DSIMB, Dynamique des Structures et Interactions des Macromolécules Biologiques Université Paris-Diderot - Paris VII INTS INSERM : U665 INTS, 6 rue Alexandre Cabanel, 75739 Paris Cedex 15 FRANCE,FR
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
Motivation: Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities. Results: We propose a new projection-based approach that can rapidly detect global as well as local structural similarities. Local structural search is enabled by a topology-inspired writhe decomposition protocol that produces a small number of fragments while ensuring that similar structures are cut in a similar manner. In benchmark tests, we show that our method, writher, improves accuracy over existing projection methods in terms of recognizing scop domains out of multi-domain proteins, while maintaining accuracy comparable with existing projection methods in a standard single-domain benchmark test. Availability: The source code is available at the following website: http://compbio.berkeley.edu/proj/writher/ Contact:dzhi@compbio.berkeley.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Degui Zhi
- Department of Plant and Microbial Biology, UC Berkeley and Physical Biosciences Division, LBNL, Berkeley, CA 94720, USA.
| | | | | |
Collapse
|
34
|
Pandini A, Fornili A, Kleinjung J. Structural alphabets derived from attractors in conformational space. BMC Bioinformatics 2010; 11:97. [PMID: 20170534 PMCID: PMC2838871 DOI: 10.1186/1471-2105-11-97] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 02/20/2010] [Indexed: 11/20/2022] Open
Abstract
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.
Collapse
Affiliation(s)
- Alessandro Pandini
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, UK
| | | | | |
Collapse
|
35
|
Zuo YC, Li QZ. Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet. Peptides 2009; 30:1788-93. [PMID: 19591890 DOI: 10.1016/j.peptides.2009.06.032] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/09/2009] [Revised: 06/27/2009] [Accepted: 06/30/2009] [Indexed: 11/17/2022]
Abstract
Defensins are essentially ancient natural antibiotics with potent activity extending from lower organisms to humans. They can inhibit the growth or virulence of micro-organisms directly or indirectly enhance the host's immune system. The successful prediction of defensin peptides will provide very useful information and insights for the basic research of defensins. In this study, by selecting the N-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as the feature parameters, the increment of diversity (ID) is firstly developed to predict defensins family and subfamily. The jackknife test based on 2-peptide composition of reduced amino acid alphabet (RAAA) with 13 reduced amino acids shows that the overall accuracy of prediction are 91.36% for defensin family, and 94.21% for defensin subfamily. The results indicate that ID_RAAA is a simple and efficient prediction method for defensin peptides.
Collapse
Affiliation(s)
- Yong-Chun Zuo
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, Inner Mongolia 010021, China
| | | |
Collapse
|
36
|
Bornot A, Etchebest C, de Brevern AG. A new prediction strategy for long local protein structures using an original description. Proteins 2009; 76:570-87. [PMID: 19241475 DOI: 10.1002/prot.22370] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A relevant and accurate description of three-dimensional (3D) protein structures can be achieved by characterizing recurrent local structures. In a previous study, we developed a library of 120 3D structural prototypes encompassing all known 11-residues long local protein structures and ensuring a good quality of structural approximation. A local structure prediction method was also proposed. Here, overlapping properties of local protein structures in global ones are taken into account to characterize frequent local networks. At the same time, we propose a new long local structure prediction strategy which involves the use of evolutionary information coupled with Support Vector Machines (SVMs). Our prediction is evaluated by a stringent geometrical assessment. Every local structure prediction with a Calpha RMSD less than 2.5 A from the true local structure is considered as correct. A global prediction rate of 63.1% is then reached, corresponding to an improvement of 7.7 points compared with the previous strategy. In the same way, the prediction of 88.33% of the 120 structural classes is improved with 8.65% mean gain. 85.33% of proteins have better prediction results with a 9.43% average gain. An analysis of prediction rate per local network also supports the global improvement and gives insights into the potential of our method for predicting super local structures. Moreover, a confidence index for the direct estimation of prediction quality is proposed. Finally, our method is proved to be very competitive with cutting-edge strategies encompassing three categories of local structure predictions.
Collapse
Affiliation(s)
- Aurélie Bornot
- INSERM UMR-S, Université Paris Diderot, Institut National de la Transfusion Sanguine, France.
| | | | | |
Collapse
|
37
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Protein short loop prediction in terms of a structural alphabet. Comput Biol Chem 2009; 33:329-33. [PMID: 19625218 DOI: 10.1016/j.compbiolchem.2009.06.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Revised: 06/17/2009] [Accepted: 06/17/2009] [Indexed: 11/20/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play crucial biological roles. To bypass the limitation of secondary structure description, we previously defined a structural alphabet composed of 16 structural prototypes, called Protein Blocks (PBs). It leads to an accurate description of every region of 3D protein backbones and has been used in local structure prediction. In the present study, we used our structural alphabet to predict the loops connecting two repetitive structures. Thus, we showed interest to take into account the flanking regions, leading to prediction rate improvement up to 19.8%, but we also underline the sensitivity of such an approach. This research can be used to propose different structures for the loops and to probe and sample their flexibility. It is a useful tool for ab initio loop prediction and leads to insights into flexible docking approach.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
38
|
Fast Structural Alignment of Biomolecules Using a Hash Table, N-Grams and String Descriptors. ALGORITHMS 2009. [DOI: 10.3390/a2020692] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
39
|
Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie 2009; 91:876-87. [PMID: 19383526 DOI: 10.1016/j.biochi.2009.04.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 04/13/2009] [Indexed: 11/18/2022]
Abstract
Three-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains.
Collapse
Affiliation(s)
- Guilhem Faure
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire (EBGM), DSIMB, Université Paris Diderot - Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France
| | | | | |
Collapse
|
40
|
Benros C, de Brevern AG, Hazout S. Analyzing the sequence–structure relationship of a library of local structural prototypes. J Theor Biol 2009; 256:215-26. [DOI: 10.1016/j.jtbi.2008.08.032] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Revised: 08/23/2008] [Accepted: 08/31/2008] [Indexed: 10/21/2022]
|
41
|
Zimmermann O, Hansmann UHE. LOCUSTRA: accurate prediction of local protein structure using a two-layer support vector machine approach. J Chem Inf Model 2008; 48:1903-8. [PMID: 18763837 DOI: 10.1021/ci800178a] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Constraint generation for 3d structure prediction and structure-based database searches benefit from fine-grained prediction of local structure. In this work, we present LOCUSTRA, a novel scheme for the multiclass prediction of local structure that uses two layers of support vector machines (SVM). Using a 16-letter structural alphabet from de Brevern et al. (Proteins: Struct., Funct., Bioinf. 2000, 41, 271-287), we assess its prediction ability for an independent test set of 222 proteins and compare our method to three-class secondary structure prediction and direct prediction of dihedral angles. The prediction accuracy is Q16=61.0% for the 16 classes of the structural alphabet and Q3=79.2% for a simple mapping to the three secondary classes helix, sheet, and coil. We achieve a mean phi(psi) error of 24.74 degrees (38.35 degrees) and a median RMSDA (root-mean-square deviation of the (dihedral) angles) per protein chain of 52.1 degrees. These results compare favorably with related approaches. The LOCUSTRA web server is freely available to researchers at http://www.fz-juelich.de/nic/cbb/service/service.php.
Collapse
Affiliation(s)
- Olav Zimmermann
- John von Neumann Institut für Computing, Research Centre Jülich, 52425 Jülich, Germany.
| | | |
Collapse
|
42
|
Ku SY, Hu YJ. Protein structure search and local structure characterization. BMC Bioinformatics 2008; 9:349. [PMID: 18721472 PMCID: PMC2529324 DOI: 10.1186/1471-2105-9-349] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2008] [Accepted: 08/22/2008] [Indexed: 11/10/2022] Open
Abstract
Background Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. Results We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at . Conclusion The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.
Collapse
Affiliation(s)
- Shih-Yen Ku
- Department of Computer Science, National Chiao Tung University, 1001 University Rd. Hsinchu, Taiwan.
| | | |
Collapse
|
43
|
Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 2008; 90:626-39. [DOI: 10.1016/j.biochi.2007.11.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]
|