1
|
Huynh AT, Nguyen TTN, Villegas CA, Montemorso S, Strauss B, Pearson RA, Graham JG, Oribello J, Suresh R, Lustig B, Wang N. Prediction and confirmation of a switch-like region within the N-terminal domain of hSIRT1. Biochem Biophys Rep 2022; 30:101275. [PMID: 35592613 PMCID: PMC9112024 DOI: 10.1016/j.bbrep.2022.101275] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 05/02/2022] [Accepted: 05/04/2022] [Indexed: 11/28/2022] Open
Abstract
Many proteins display conformational changes resulting from allosteric regulation. Often only a few residues are crucial in conveying these structural and functional allosteric changes. These regions that undergo a significant change in structure upon receiving an input signal, such as molecular recognition, are defined as switch-like regions. Identifying these key residues within switch-like regions can help elucidate the mechanism of allosteric regulation and provide guidance for synthetic regulation. In this study, we combine a novel computational workflow with biochemical methods to identify a switch-like region in the N-terminal domain of human SIRT1 (hSIRT1), a lysine deacetylase that plays important roles in regulating cellular pathways. Based on primary sequence, computational methods predicted a region between residues 186-193 in hSIRT1 to exhibit switch-like behavior. Mutations were then introduced in this region and the resulting mutants were tested for allosteric reactions to resveratrol, a known hSIRT1 allosteric regulator. After fine-tuning the mutations based on comparison of known secondary structures, we were able to pinpoint M193 as the residue essential for allosteric regulation, likely by communicating the allosteric signal. Mutation of this residue maintained enzyme activity but abolished allosteric regulation by resveratrol. Our findings suggest a method to predict switch-like regions in allosterically regulated enzymes based on the primary sequence. If further validated, this could be an efficient way to identify key residues in enzymes for therapeutic drug targeting and other applications.
Collapse
Affiliation(s)
- Angelina T. Huynh
- Department of Chemistry, San José State University, San José, California, 95192, USA
| | - Thi-Tina N. Nguyen
- Department of Biological Sciences, San José State University, San José, California, 95192, USA
| | - Carina A. Villegas
- Department of Biological Sciences, San José State University, San José, California, 95192, USA
| | - Saira Montemorso
- Department of Chemistry, San José State University, San José, California, 95192, USA
| | - Benjamin Strauss
- Department of Computer Science, San José State University, San José, California, 95192, USA
| | - Richard A. Pearson
- Department of Chemistry, San José State University, San José, California, 95192, USA
| | - Jason G. Graham
- Department of Biomedical, Chemical, and Materials Engineering, San José State University, San José, California, 95192, USA
| | - Jonathan Oribello
- Department of Chemistry, San José State University, San José, California, 95192, USA
| | - Rohit Suresh
- Department of Chemistry, San José State University, San José, California, 95192, USA
| | - Brooke Lustig
- Department of Chemistry, San José State University, San José, California, 95192, USA
| | - Ningkun Wang
- Department of Chemistry, San José State University, San José, California, 95192, USA
| |
Collapse
|
2
|
Veevers R, Cawley G, Hayward S. Investigation of sequence features of hinge-bending regions in proteins with domain movements using kernel logistic regression. BMC Bioinformatics 2020; 21:137. [PMID: 32272894 PMCID: PMC7147021 DOI: 10.1186/s12859-020-3464-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 03/20/2020] [Indexed: 11/12/2022] Open
Abstract
Background Hinge-bending movements in proteins comprising two or more domains form a large class of functional movements. Hinge-bending regions demarcate protein domains and collectively control the domain movement. Consequently, the ability to recognise sequence features of hinge-bending regions and to be able to predict them from sequence alone would benefit various areas of protein research. For example, an understanding of how the sequence features of these regions relate to dynamic properties in multi-domain proteins would aid in the rational design of linkers in therapeutic fusion proteins. Results The DynDom database of protein domain movements comprises sequences annotated to indicate whether the amino acid residue is located within a hinge-bending region or within an intradomain region. Using statistical methods and Kernel Logistic Regression (KLR) models, this data was used to determine sequence features that favour or disfavour hinge-bending regions. This is a difficult classification problem as the number of negative cases (intradomain residues) is much larger than the number of positive cases (hinge residues). The statistical methods and the KLR models both show that cysteine has the lowest propensity for hinge-bending regions and proline has the highest, even though it is the most rigid amino acid. As hinge-bending regions have been previously shown to occur frequently at the terminal regions of the secondary structures, the propensity for proline at these regions is likely due to its tendency to break secondary structures. The KLR models also indicate that isoleucine may act as a domain-capping residue. We have found that a quadratic KLR model outperforms a linear KLR model and that improvement in performance occurs up to very long window lengths (eighty residues) indicating long-range correlations. Conclusion In contrast to the only other approach that focused solely on interdomain hinge-bending regions, the method provides a modest and statistically significant improvement over a random classifier. An explanation of the KLR results is that in the prediction of hinge-bending regions a long-range correlation is at play between a small number amino acids that either favour or disfavour hinge-bending regions. The resulting sequence-based prediction tool, HingeSeek, is available to run through a webserver at hingeseek.cmp.uea.ac.uk.
Collapse
Affiliation(s)
- Ruth Veevers
- Computational Biology Laboratory, School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
| | - Gavin Cawley
- Computational Biology Laboratory, School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| | - Steven Hayward
- Computational Biology Laboratory, School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK.
| |
Collapse
|
3
|
Characterization and Prediction of Protein Flexibility Based on Structural Alphabets. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4628025. [PMID: 27660756 PMCID: PMC5021887 DOI: 10.1155/2016/4628025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 08/02/2016] [Indexed: 11/25/2022]
Abstract
Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility.
Collapse
|
4
|
Górecki A, Bonarek P, Górka AK, Figiel M, Wilamowski M, Dziedzicka-Wasylewska M. Intrinsic disorder of human Yin Yang 1 protein. Proteins 2015; 83:1284-96. [PMID: 25963536 DOI: 10.1002/prot.24822] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 04/27/2015] [Accepted: 05/02/2015] [Indexed: 01/26/2023]
Abstract
YY1 (Yin Yang 1) is a zinc finger protein with an essential role in various biological functions via DNA- and protein-protein interactions with numerous partners. YY1 is involved in the regulation of a broad spectrum of cellular processes such as embryogenesis, proliferation, tumorigenesis, and snRNA transcription. The more than 100 reported targets of the YY1 protein suggest that it contains intrinsically disordered regions that are involved in such diverse interactions. Here, we present a study of the structural properties of human YY1 using several biochemical and biophysical techniques (fluorescence, circular dichroism, gel filtration chromatography, proteolytic susceptibility) together with various bioinformatics approaches. To facilitate our exploration of the YY1 structure, the full-length protein as well as an N-terminal fragment (residues 1-295) and the C-terminal DNA binding domain were used. We found the N-terminus to be a non-compact fragment of YY1 with little residual secondary structure and lacking a well-defined tertiary structure. The results of our study indicate that YY1 belongs to the family of intrinsically disordered proteins (IDPs), which exist natively in a partially unfolded conformation.
Collapse
Affiliation(s)
- Andrzej Górecki
- Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, 30-387, Poland
| | - Piotr Bonarek
- Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, 30-387, Poland
| | - Adam Kazimierz Górka
- Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, 30-387, Poland
| | - Małgorzata Figiel
- Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, 30-387, Poland
| | - Mateusz Wilamowski
- Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, 30-387, Poland
| | - Marta Dziedzicka-Wasylewska
- Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, Kraków, 30-387, Poland
| |
Collapse
|
5
|
Diaz C, Corentin H, Thierry V, Chantal A, Tanguy B, David S, Jean-Marc H, Pascual F, Françoise B, Edgardo F. Virtual screening on an α-helix to β-strand switchable region of the FGFR2 extracellular domain revealed positive and negative modulators. Proteins 2014; 82:2982-97. [PMID: 25082719 DOI: 10.1002/prot.24657] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 06/30/2014] [Accepted: 07/03/2014] [Indexed: 12/15/2022]
Abstract
The secondary structure of some protein segments may vary between α-helix and β-strand. To predict these switchable segments, we have developed an algorithm, Switch-P, based solely on the protein sequence. This algorithm was used on the extracellular parts of FGF receptors. For FGFR2, it predicted that β4 and β5 strands of the third Ig-like domain were highly switchable. These two strands possess a high number of somatic mutations associated with cancer. Analysis of PDB structures of FGF receptors confirmed the switchability prediction for β5. We thus evaluated if compound-driven α-helix/β-strand switching of β5 could modulate FGFR2 signaling. We performed the virtual screening of a library containing 1.4 million of chemical compounds with two models of the third Ig-like domain of FGFR2 showing different secondary structures for β5, and we selected 32 compounds. Experimental testing using proliferation assays with FGF7-stimulated SNU-16 cells and a FGFR2-dependent Erk1/2 phosphorylation assay with FGFR2-transfected L6 cells, revealed activators and inhibitors of FGFR2. Our method for the identification of switchable proteinic regions, associated with our virtual screening approach, provides an opportunity to discover new generation of drugs with under-explored mechanism of action.
Collapse
Affiliation(s)
- Constantino Diaz
- Exploratory Unit, Sanofi-Aventis Research and Development, 195 Route d'Espagne, 31036, Toulouse, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Chang DTH, Yao TJ, Fan CY, Chiang CY, Bai YH. AH-DB: collecting protein structure pairs before and after binding. Nucleic Acids Res 2012; 40:D472-8. [PMID: 22084200 PMCID: PMC3245139 DOI: 10.1093/nar/gkr940] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2011] [Revised: 10/10/2011] [Accepted: 10/12/2011] [Indexed: 01/29/2023] Open
Abstract
This work presents the Apo-Holo DataBase (AH-DB, http://ahdb.ee.ncku.edu.tw/ and http://ahdb.csbb.ntu.edu.tw/), which provides corresponding pairs of protein structures before and after binding. Conformational transitions are commonly observed in various protein interactions that are involved in important biological functions. For example, copper-zinc superoxide dismutase (SOD1), which destroys free superoxide radicals in the body, undergoes a large conformational transition from an 'open' state (apo structure) to a 'closed' state (holo structure). Many studies have utilized collections of apo-holo structure pairs to investigate the conformational transitions and critical residues. However, the collection process is usually complicated, varies from study to study and produces a small-scale data set. AH-DB is designed to provide an easy and unified way to prepare such data, which is generated by identifying/mapping molecules in different Protein Data Bank (PDB) entries. Conformational transitions are identified based on a refined alignment scheme to overcome the challenge that many structures in the PDB database are only protein fragments and not complete proteins. There are 746,314 apo-holo pairs in AH-DB, which is about 30 times those in the second largest collection of similar data. AH-DB provides sophisticated interfaces for searching apo-holo structure pairs and exploring conformational transitions from apo structures to the corresponding holo structures.
Collapse
Affiliation(s)
- Darby Tien-Hao Chang
- Department of Electrical Engineering, National Cheng Kung University, Tainan 70101, Taiwan.
| | | | | | | | | |
Collapse
|
7
|
Wang Z, Zhao F, Peng J, Xu J. Protein 8-class secondary structure prediction using conditional neural fields. Proteomics 2011; 11:3786-92. [PMID: 21805636 DOI: 10.1002/pmic.201100196] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Revised: 06/16/2011] [Accepted: 07/01/2011] [Indexed: 11/10/2022]
Abstract
Compared with the protein 3-class secondary structure (SS) prediction, the 8-class prediction gains less attention and is also much more challenging, especially for proteins with few sequence homologs. This paper presents a new probabilistic method for 8-class SS prediction using conditional neural fields (CNFs), a recently invented probabilistic graphical model. This CNF method not only models the complex relationship between sequence features and SS, but also exploits the interdependency among SS types of adjacent residues. In addition to sequence profiles, our method also makes use of non-evolutionary information for SS prediction. Tested on the CB513 and RS126 data sets, our method achieves Q8 accuracy of 64.9 and 64.7%, respectively, which are much better than the SSpro8 web server (51.0 and 48.0%, respectively). Our method can also be used to predict other structure properties (e.g. solvent accessibility) of a protein or the SS of RNA.
Collapse
Affiliation(s)
- Zhiyong Wang
- Toyota Technological Institute at Chicago, 6045 S Kenwood, Chicago, IL 60637, USA
| | | | | | | |
Collapse
|
8
|
Murayama T, Kurebayashi N, Oba T, Oyamada H, Oguchi K, Sakurai T, Ogawa Y. Role of amino-terminal half of the S4-S5 linker in type 1 ryanodine receptor (RyR1) channel gating. J Biol Chem 2011; 286:35571-35577. [PMID: 21862589 DOI: 10.1074/jbc.m111.255240] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The type 1 ryanodine receptor (RyR1) is a Ca(2+) release channel found in the sarcoplasmic reticulum of skeletal muscle and plays a pivotal role in excitation-contraction coupling. The RyR1 channel is activated by a conformational change of the dihydropyridine receptor upon depolarization of the transverse tubule, or by Ca(2+) itself, i.e. Ca(2+)-induced Ca(2+) release (CICR). The molecular events transmitting such signals to the ion gate of the channel are unknown. The S4-S5 linker, a cytosolic loop connecting the S4 and S5 transmembrane segments in six-transmembrane type channels, forms an α-helical structure and mediates signal transmission in a wide variety of channels. To address the role of the S4-S5 linker in RyR1 channel gating, we performed alanine substitution scan of N-terminal half of the putative S4-S5 linker (Thr(4825)-Ser(4829)) that exhibits high helix probability. The mutant RyR1 was expressed in HEK cells, and CICR activity was investigated by caffeine-induced Ca(2+) release, single-channel current recordings, and [(3)H]ryanodine binding. Four mutants (T4825A, I4826A, S4828A, and S4829A) had reduced CICR activity without changing Ca(2+) sensitivity, whereas the L4827A mutant formed a constitutive active channel. T4825I, a disease-associated mutation for malignant hyperthermia, exhibited enhanced CICR activity. An α-helical wheel representation of the N-terminal S4-S5 linker provides a rational explanation to the observed activities of the mutants. These results suggest that N-terminal half of the S4-S5 linker may form an α-helical structure and play an important role in RyR1 channel gating.
Collapse
Affiliation(s)
- Takashi Murayama
- Department of Cellular and Molecular Pharmacology, Juntendo University Graduate School of Medicine, Tokyo 113-8421, Japan.
| | - Nagomi Kurebayashi
- Department of Cellular and Molecular Pharmacology, Juntendo University Graduate School of Medicine, Tokyo 113-8421, Japan
| | - Toshiharu Oba
- Department of Cell Physiology, Nagoya City University Graduate School of Medical Sciences, Nagoya 467-8601, Japan
| | - Hideto Oyamada
- Department of Pharmacology, School of Medicine, Showa University, Tokyo 142-8555, Japan
| | - Katsuji Oguchi
- Department of Pharmacology, School of Medicine, Showa University, Tokyo 142-8555, Japan
| | - Takashi Sakurai
- Department of Cellular and Molecular Pharmacology, Juntendo University Graduate School of Medicine, Tokyo 113-8421, Japan
| | - Yasuo Ogawa
- Department of Cellular and Molecular Pharmacology, Juntendo University Graduate School of Medicine, Tokyo 113-8421, Japan
| |
Collapse
|
9
|
Bornot A, Etchebest C, de Brevern AG. Predicting protein flexibility through the prediction of local structures. Proteins 2010; 79:839-52. [PMID: 21287616 DOI: 10.1002/prot.22922] [Citation(s) in RCA: 76] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2010] [Revised: 09/28/2010] [Accepted: 09/29/2010] [Indexed: 11/06/2022]
Abstract
Protein structures are valuable tools for understanding protein function. However, protein dynamics is also considered a key element in protein function. Therefore, in addition to structural analysis, fully understanding protein function at the molecular level now requires accounting for flexibility. However, experimental techniques that produce both types of information simultaneously are still limited. Prediction approaches are useful alternative tools for obtaining otherwise unavailable data. It has been shown that protein structure can be described by a limited set of recurring local structures. In this context, we previously established a library composed of 120 overlapping long structural prototypes (LSPs) representing fragments of 11 residues in length and covering all known local protein structures. On the basis of the close sequence-structure relationship observed in LSPs, we developed a novel prediction method that proposes structural candidates in terms of LSPs along a given sequence. The prediction accuracy rate was high given the number of structural classes. In this study, we use this methodology to predict protein flexibility. We first examine flexibility according to two different descriptors, the B-factor and root mean square fluctuations from molecular dynamics simulations. We then show the relevance of using both descriptors together. We define three flexibility classes and propose a method based on the LSP prediction method for predicting flexibility along the sequence. The prediction rate reaches 49.6%. This method competes rather efficiently with the most recent, cutting-edge methods based on true flexibility data learning with sophisticated algorithms. Accordingly, flexibility information should be taken into account in structural prediction assessments.
Collapse
Affiliation(s)
- Aurélie Bornot
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), University Paris-Diderot, Institut National de Transfusion Sanguine, INTS, 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France
| | | | | |
Collapse
|
10
|
Hirose S, Yokota K, Kuroda Y, Wako H, Endo S, Kanai S, Noguchi T. Prediction of protein motions from amino acid sequence and its application to protein-protein interaction. BMC STRUCTURAL BIOLOGY 2010; 10:20. [PMID: 20626880 PMCID: PMC3245509 DOI: 10.1186/1472-6807-10-20] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Accepted: 07/13/2010] [Indexed: 11/10/2022]
Abstract
BACKGROUND Structural flexibility is an important characteristic of proteins because it is often associated with their function. The movement of a polypeptide segment in a protein can be broken down into two types of motions: internal and external ones. The former is deformation of the segment itself, but the latter involves only rotational and translational motions as a rigid body. Normal Model Analysis (NMA) can derive these two motions, but its application remains limited because it necessitates the gathering of complete structural information. RESULTS In this work, we present a novel method for predicting two kinds of protein motions in ordered structures. The prediction uses only information from the amino acid sequence. We prepared a dataset of the internal and external motions of segments in many proteins by application of NMA. Subsequently, we analyzed the relation between thermal motion assessed from X-ray crystallographic B-factor and internal/external motions calculated by NMA. Results show that attributes of amino acids related to the internal motion have different features from those related to the B-factors, although those related to the external motion are correlated strongly with the B-factors. Next, we developed a method to predict internal and external motions from amino acid sequences based on the Random Forest algorithm. The proposed method uses information associated with adjacent amino acid residues and secondary structures predicted from the amino acid sequence. The proposed method exhibited moderate correlation between predicted internal and external motions with those calculated by NMA. It has the highest prediction accuracy compared to a naïve model and three published predictors. CONCLUSIONS Finally, we applied the proposed method predicting the internal motion to a set of 20 proteins that undergo large conformational change upon protein-protein interaction. Results show significant overlaps between the predicted high internal motion regions and the observed conformational change regions.
Collapse
Affiliation(s)
- Shuichi Hirose
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST),2-42, Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| | | | | | | | | | | | | |
Collapse
|
11
|
Dan A, Ofran Y, Kliger Y. Large-scale analysis of secondary structure changes in proteins suggests a role for disorder-to-order transitions in nucleotide binding proteins. Proteins 2010; 78:236-48. [PMID: 19676113 DOI: 10.1002/prot.22531] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Conformational changes in proteins often involve secondary structure transitions. Such transitions can be divided into two types: disorder-to-order changes, in which a disordered segment acquires an ordered secondary structure (e.g., disorder to alpha-helix, disorder to beta-strand), and order-to-order changes, where a segment switches from one ordered secondary structure to another (e.g., alpha-helix to beta-strand, alpha-helix to turn). In this study, we explore the distribution of these transitions in the proteome. Using a comprehensive, yet highly conservative method, we compared solved three-dimensional structures of identical protein sequences, looking for differences in the secondary structures with which they were assigned. Protein chains in which such secondary structure transitions were detected, were classified into two sets according to the type of transition that is involved (disorder-to-order or order-to-order), allowing us to characterize each set by examining enrichment of gene ontology terms. The results reveal that the disorder-to-order set is significantly enriched with nucleotide binding proteins, whereas the order-to-order set is more diverse. Remarkably, further examination reveals that >22% of the purine nucleotide binding proteins include segments which undergo disorder-to-order transitions, suggesting that such transitions play an important role in this process.
Collapse
Affiliation(s)
- Adi Dan
- Compugen Ltd., Tel Aviv, 69512, Israel
| | | | | |
Collapse
|
12
|
Liu YC, Yang MH, Lin WL, Huang CK, Oyang YJ. A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins. BMC Genomics 2009; 10 Suppl 3:S22. [PMID: 19958486 PMCID: PMC2788375 DOI: 10.1186/1471-2164-10-s3-s22] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background Proteins are dynamic macromolecules which may undergo conformational transitions upon changes in environment. As it has been observed in laboratories that protein flexibility is correlated to essential biological functions, scientists have been designing various types of predictors for identifying structurally flexible regions in proteins. In this respect, there are two major categories of predictors. One category of predictors attempts to identify conformationally flexible regions through analysis of protein tertiary structures. Another category of predictors works completely based on analysis of the polypeptide sequences. As the availability of protein tertiary structures is generally limited, the design of predictors that work completely based on sequence information is crucial for advances of molecular biology research. Results In this article, we propose a novel approach to design a sequence-based predictor for identifying conformationally ambivalent regions in proteins. The novelty in the design stems from incorporating two classifiers based on two distinctive supervised learning algorithms that provide complementary prediction powers. Experimental results show that the overall performance delivered by the hybrid predictor proposed in this article is superior to the performance delivered by the existing predictors. Furthermore, the case study presented in this article demonstrates that the proposed hybrid predictor is capable of providing the biologists with valuable clues about the functional sites in a protein chain. The proposed hybrid predictor provides the users with two optional modes, namely, the high-sensitivity mode and the high-specificity mode. The experimental results with an independent testing data set show that the proposed hybrid predictor is capable of delivering sensitivity of 0.710 and specificity of 0.608 under the high-sensitivity mode, while delivering sensitivity of 0.451 and specificity of 0.787 under the high-specificity mode. Conclusion Though experimental results show that the hybrid approach designed to exploit the complementary prediction powers of distinctive supervised learning algorithms works more effectively than conventional approaches, there exists a large room for further improvement with respect to the achieved performance. In this respect, it is of interest to investigate the effects of exploiting additional physiochemical properties that are related to conformational ambivalence. Furthermore, it is of interest to investigate the effects of incorporating lately-developed machine learning approaches, e.g. the random forest design and the multi-stage design. As conformational transition plays a key role in carrying out several essential types of biological functions, the design of more advanced predictors for identifying conformationally ambivalent regions in proteins deserves our continuous attention.
Collapse
Affiliation(s)
- Yu-Cheng Liu
- Institute of Biomedical Engineering, National Taiwan University, Taipei, Taiwan, Republic of China.
| | | | | | | | | |
Collapse
|
13
|
Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B. Improved disorder prediction by combination of orthogonal approaches. PLoS One 2009; 4:e4433. [PMID: 19209228 PMCID: PMC2635965 DOI: 10.1371/journal.pone.0004433] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 12/15/2008] [Indexed: 12/15/2022] Open
Abstract
Disordered proteins are highly abundant in regulatory processes such as transcription and cell-signaling. Different methods have been developed to predict protein disorder often focusing on different types of disordered regions. Here, we present MD, a novel META-Disorder prediction method that molds various sources of information predominantly obtained from orthogonal prediction methods, to significantly improve in performance over its constituents. In sustained cross-validation, MD not only outperforms its origins, but it also compares favorably to other state-of-the-art prediction methods in a variety of tests that we applied. Availability: http://www.rostlab.org/services/md/
Collapse
Affiliation(s)
- Avner Schlessinger
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America.
| | | | | | | | | |
Collapse
|
14
|
Kuznetsov IB, McDuffie M. FlexPred: a web-server for predicting residue positions involved in conformational switches in proteins. Bioinformation 2008; 3:134-6. [PMID: 19238251 PMCID: PMC2639688 DOI: 10.6026/97320630003134] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2008] [Accepted: 11/01/2008] [Indexed: 11/23/2022] Open
Abstract
Conformational switches observed in the protein backbone play a key role in a variety of fundamental biological activities.
This paper describes a web-server that implements a pattern recognition algorithm trained on the examples from the Database
of Macromolecular Movements to predict residue positions involved in conformational switches. Prediction can be performed at
an adjustable false positive rate using a user-supplied protein sequence in FASTA format or a structure in a Protein Data
Bank (PDB) file. If a protein sequence is submitted, then the web-server uses sequence-derived information only (such as
evolutionary conservation of residue positions). If a PDB file is submitted, then the web-server uses sequence-derived
information and residue solvent accessibility calculated from this file.
Collapse
Affiliation(s)
- Igor B Kuznetsov
- GenNY*sis Center for Excellence in Cancer Genomics, Department of Epidemiology and Biostatistics, One Discovery Drive, University at Albany, Rensselaer, NY 12144, USA.
| | | |
Collapse
|
15
|
A stringent test for hydrophobicity scales: two proteins with 88% sequence identity but different structure and function. Proc Natl Acad Sci U S A 2008; 105:9233-7. [PMID: 18591657 DOI: 10.1073/pnas.0803264105] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions (protein functionalities) are mediated by water, which compacts individual proteins and promotes close and temporarily stable large-area protein-protein interfaces. In their classic article, Kyte and Doolittle (KD) concluded that the "simplicity and graphic nature of hydrophobicity scales make them very useful tools for the evaluation of protein structures." In practice, however, attempts to develop hydrophobicity scales (for example, compatible with classical force fields (CFF) in calculating the energetics of protein folding) have encountered many difficulties. Here, we suggest an entirely different approach based on the idea that proteins are self-organized networks, subject to evolving finite-scale criticality (like some network glasses). We test this proposal against two small proteins that are delicately balanced between alpha and alpha/beta structures, with different functions encoded with only 12% of their amino acids. This example explains why protein structure prediction is so challenging, and it provides a severe test for the accuracy and content of hydrophobicity scales. This method confirms KD's evaluation and at the same time suggests that protein structure, dynamics, and function can be best discussed without using CFF.
Collapse
|
16
|
Kuznetsov IB. Ordered conformational change in the protein backbone: Prediction of conformationally variable positions from sequence and low-resolution structural data. Proteins 2008; 72:74-87. [DOI: 10.1002/prot.21899] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
17
|
Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC STRUCTURAL BIOLOGY 2007; 7:25. [PMID: 17437643 PMCID: PMC1863424 DOI: 10.1186/1472-6807-7-25] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2006] [Accepted: 04/16/2007] [Indexed: 11/12/2022]
Abstract
Background Traditionally, it is believed that the native structure of a protein corresponds to a global minimum of its free energy. However, with the growing number of known tertiary (3D) protein structures, researchers have discovered that some proteins can alter their structures in response to a change in their surroundings or with the help of other proteins or ligands. Such structural shifts play a crucial role with respect to the protein function. To this end, we propose a machine learning method for the prediction of the flexible/rigid regions of proteins (referred to as FlexRP); the method is based on a novel sequence representation and feature selection. Knowledge of the flexible/rigid regions may provide insights into the protein folding process and the 3D structure prediction. Results The flexible/rigid regions were defined based on a dataset, which includes protein sequences that have multiple experimental structures, and which was previously used to study the structural conservation of proteins. Sequences drawn from this dataset were represented based on feature sets that were proposed in prior research, such as PSI-BLAST profiles, composition vector and binary sequence encoding, and a newly proposed representation based on frequencies of k-spaced amino acid pairs. These representations were processed by feature selection to reduce the dimensionality. Several machine learning methods for the prediction of flexible/rigid regions and two recently proposed methods for the prediction of conformational changes and unstructured regions were compared with the proposed method. The FlexRP method, which applies Logistic Regression and collocation-based representation with 95 features, obtained 79.5% accuracy. The two runner-up methods, which apply the same sequence representation and Support Vector Machines (SVM) and Naïve Bayes classifiers, obtained 79.2% and 78.4% accuracy, respectively. The remaining considered methods are characterized by accuracies below 70%. Finally, the Naïve Bayes method is shown to provide the highest sensitivity for the prediction of flexible regions, while FlexRP and SVM give the highest sensitivity for rigid regions. Conclusion A new sequence representation that uses k-spaced amino acid pairs is shown to be the most efficient in the prediction of the flexible/rigid regions of protein sequences. The proposed FlexRP method provides the highest prediction accuracy of about 80%. The experimental tests show that the FlexRP and SVM methods achieved high overall accuracy and the highest sensitivity for rigid regions, while the best quality of the predictions for flexible regions is achieved by the Naïve Bayes method.
Collapse
|