51
|
Protein disorder--a breakthrough invention of evolution? Curr Opin Struct Biol 2011; 21:412-8. [PMID: 21514145 DOI: 10.1016/j.sbi.2011.03.014] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/29/2011] [Accepted: 03/29/2011] [Indexed: 11/21/2022]
Abstract
As an operational definition, we refer to regions in proteins that do not adopt regular three-dimensional structures in isolation, as disordered regions. An antipode to disorder would be 'well-structured' rather than 'ordered'. Here, we argue for the following three hypotheses. Firstly, it is more useful to picture disorder as a distinct phenomenon in structural biology than as an extreme example of protein flexibility. Secondly, there are many very different flavors of protein disorder, nevertheless, it seems advantageous to portray the universe of all possible proteins in terms of two main types: well-structured, disordered. There might be a third type 'other' but we have so far no positive evidence for this. Thirdly, nature uses protein disorder as a tool to adapt to different environments. Protein disorder is evolutionarily conserved and this maintenance of disorder is highly nontrivial. Increasingly integrating protein disorder into the toolbox of a living cell was a crucial step in the evolution from simple bacteria to complex eukaryotes. We need new advanced computational methods to study this new milestone in the advance of protein biology.
Collapse
|
52
|
Aravind L, Abhiman S, Iyer LM. Natural history of the eukaryotic chromatin protein methylation system. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2011; 101:105-76. [PMID: 21507350 DOI: 10.1016/b978-0-12-387685-0.00004-4] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In eukaryotes, methylation of nucleosomal histones and other nuclear proteins is a central aspect of chromatin structure and dynamics. The past 15 years have seen an enormous advance in our understanding of the biochemistry of these modifications, and of their role in establishing the epigenetic code. We provide a synthetic overview, from an evolutionary perspective, of the main players in the eukaryotic chromatin protein methylation system, with an emphasis on catalytic domains. Several components of the eukaryotic protein methylation system had their origins in bacteria. In particular, the Rossmann fold protein methylases (PRMTs and DOT1), and the LSD1 and jumonji-related demethylases and oxidases, appear to have emerged in the context of bacterial peptide methylation and hydroxylation systems. These systems were originally involved in synthesis of peptide secondary metabolites, such as antibiotics, toxins, and siderophores. The peptidylarginine deiminases appear to have been acquired by animals from bacterial enzymes that modify cell-surface proteins. SET domain methylases, which display the β-clip fold, apparently first emerged in prokaryotes from the SAF superfamily of carbohydrate-binding domains. However, even in bacteria, a subset of the SET domains might have evolved a chromatin-related role in conjunction with a BAF60a/b-like SWIB domain protein and topoisomerases. By the time of the last eukaryotic common ancestor, multiple SET and PRMT methylases were already in place and are likely to have mediated methylation at the H3K4, H3K9, H3K36, and H4K20 positions, and carried out both asymmetric and symmetric arginine dimethylation. Inference of H3K27 methylation in the ancestral eukaryote appears uncertain, though it was certainly in place a little later in eukaryotic evolution. Current data suggest that unlike SET methylases, which are universally present in eukaryotes, demethylases are not. They appear to be absent in the earliest-branching eukaryotic lineages, and emerged later along with several other chromatin proteins, such as the Dot1-methylase, prior to divergence of the kinetoplastid-heterolobosean lineage from the remaining eukaryotes. This period also corresponds to the point of origin of DNA cytosine methylation by DNMT1. Origin of major lineages of SET domains such as the Trithorax, Su(var)3-9, Ash1, SMYD, and TTLL12 and E(Z) might have played the initial role in the establishment of multiple distinct heterochromatic and euchromatic states that are likely to have been present, in some form, through much of eukaryotic evolution. Elaboration of these chromatin states might have gone hand-in-hand with acquisition of multiple jumonji-related and LSD1-like demethylases, and functional linkages with the DNA methylation and RNAi systems. Throughout eukaryotic evolution, there were several lineage-specific expansions of SET domain proteins, which might be related to a special transcription regulation process in trypanosomes, acquisition of new meiotic recombination hotspots in animals, and methylation and associated modifications of the diatom silaffin proteins involved in silica biomineralization. The use of specific domains to "read" the methylation marks appears to have been present in the ancestral eukaryote itself. Of these the chromo-like domains appear to have been acquired from bacterial secreted proteins that might have a role in binding cell-surface peptides or peptidoglycan. Domain architectures of the primary enzymes involved in the eukaryotic protein methylation system indicate key features relating to interactions with each other and other modifications in chromatin, such as acetylation. They also emphasize the profound functional distinction between the role of demethylation and deacetylation in regulation of chromatin dynamics.
Collapse
Affiliation(s)
- L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | | | | |
Collapse
|
53
|
Niu S, Huang T, Feng K, Cai Y, Li Y. Prediction of Tyrosine Sulfation with mRMR Feature Selection and Analysis. J Proteome Res 2010; 9:6490-7. [DOI: 10.1021/pr1007152] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Shen Niu
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China, Institute of Systems Biology, Shanghai University, Shanghai 200444, P. R. China, Shanghai Center for Bioinformation Technology, Shanghai 200235, P. R. China, and Centre for Computational Systems Biology, Fudan University, Shanghai 200433, P. R. China
| | - Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China, Institute of Systems Biology, Shanghai University, Shanghai 200444, P. R. China, Shanghai Center for Bioinformation Technology, Shanghai 200235, P. R. China, and Centre for Computational Systems Biology, Fudan University, Shanghai 200433, P. R. China
| | - Kaiyan Feng
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China, Institute of Systems Biology, Shanghai University, Shanghai 200444, P. R. China, Shanghai Center for Bioinformation Technology, Shanghai 200235, P. R. China, and Centre for Computational Systems Biology, Fudan University, Shanghai 200433, P. R. China
| | - Yudong Cai
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China, Institute of Systems Biology, Shanghai University, Shanghai 200444, P. R. China, Shanghai Center for Bioinformation Technology, Shanghai 200235, P. R. China, and Centre for Computational Systems Biology, Fudan University, Shanghai 200433, P. R. China
| | - Yixue Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P. R. China, Institute of Systems Biology, Shanghai University, Shanghai 200444, P. R. China, Shanghai Center for Bioinformation Technology, Shanghai 200235, P. R. China, and Centre for Computational Systems Biology, Fudan University, Shanghai 200433, P. R. China
| |
Collapse
|
54
|
Habchi J, Mamelli L, Darbon H, Longhi S. Structural disorder within Henipavirus nucleoprotein and phosphoprotein: from predictions to experimental assessment. PLoS One 2010; 5:e11684. [PMID: 20657787 PMCID: PMC2908138 DOI: 10.1371/journal.pone.0011684] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2010] [Accepted: 06/21/2010] [Indexed: 12/30/2022] Open
Abstract
Henipaviruses are newly emerged viruses within the Paramyxoviridae family. Their negative-strand RNA genome is packaged by the nucleoprotein (N) within alpha-helical nucleocapsid that recruits the polymerase complex made of the L protein and the phosphoprotein (P). To date structural data on Henipaviruses are scarce, and their N and P proteins have never been characterized so far. Using both computational and experimental approaches we herein show that Henipaviruses N and P proteins possess large intrinsically disordered regions. By combining several disorder prediction methods, we show that the N-terminal domain of P (PNT) and the C-terminal domain of N (NTAIL) are both mostly disordered, although they contain short order-prone segments. We then report the cloning, the bacterial expression, purification and characterization of Henipavirus PNT and NTAIL domains. By combining gel filtration, dynamic light scattering, circular dichroism and nuclear magnetic resonance, we show that both NTAIL and PNT belong to the premolten globule sub-family within the class of intrinsically disordered proteins. This study is the first reported experimental characterization of Henipavirus P and N proteins. The evidence that their respective N-terminal and C-terminal domains are highly disordered under native conditions is expected to be invaluable for future structural studies by helping to delineate N and P protein domains amenable to crystallization. In addition, following previous hints establishing a relationship between structural disorder and protein interactivity, the present results suggest that Henipavirus PNT and NTAIL domains could be involved in manifold protein-protein interactions.
Collapse
Affiliation(s)
- Johnny Habchi
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Campus de Luminy, Marseille, France
| | - Laurent Mamelli
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Campus de Luminy, Marseille, France
| | - Hervé Darbon
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Campus de Luminy, Marseille, France
| | - Sonia Longhi
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Campus de Luminy, Marseille, France
| |
Collapse
|
55
|
Fong JH, Panchenko AR. Intrinsic disorder and protein multibinding in domain, terminal, and linker regions. MOLECULAR BIOSYSTEMS 2010; 6:1821-8. [PMID: 20544079 DOI: 10.1039/c005144f] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Intrinsic disorder is believed to contribute to the ability of some proteins to interact with multiple partners which is important for protein functional promiscuity and regulation of the cross-talk between pathways. To better understand the mechanisms of molecular recognition through disordered regions, here, we systematically investigate the coupling between disorder and binding within domain families in a structure interaction network and in terminal and inter-domain linker regions. We showed that the canonical domain-domain interaction model should take into account contributions of N- and C-termini and inter-domain linkers, which may form all or part of the binding interfaces. For the majority of proteins, binding interfaces on domain and terminal regions were predicted to be less disordered than non-interface regions. Analysis of all domain families revealed several exceptions, such as kinases, DNA/RNA binding proteins, certain enzymes, and regulatory proteins, which are candidates for disorder-to-order transitions that can occur upon binding. Domain interfaces that bind single or multiple partners do not exhibit significant difference in disorder content if normalized by the number of interactions. In general, protein families with more diverse interactions exhibit less average disorder over all members of the family. Our results shed light on recent controversies regarding the relationship between disorder and binding of multiple partners at common interfaces. In particular, they support the hypothesis that protein domains with many interacting partners should have a pleiotropic effect on functional pathways and consequently might be more constrained in evolution.
Collapse
Affiliation(s)
- Jessica H Fong
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
56
|
Haritos VS, Niranjane A, Weisman S, Trueman HE, Sriskantha A, Sutherland TD. Harnessing disorder: onychophorans use highly unstructured proteins, not silks, for prey capture. Proc Biol Sci 2010; 277:3255-63. [PMID: 20519222 DOI: 10.1098/rspb.2010.0604] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Onychophora are ancient, carnivorous soft-bodied invertebrates which capture their prey in slime that originates from dedicated glands located on either side of the head. While the biochemical composition of the slime is known, its unusual nature and the mechanism of ensnaring thread formation have remained elusive. We have examined gene expression in the slime gland from an Australian onychophoran, Euperipatoides rowelli, and matched expressed sequence tags to separated proteins from the slime. The analysis revealed three categories of protein present: unique high-molecular-weight proline-rich proteins, and smaller concentrations of lectins and small peptides, the latter two likely to act as protease inhibitors and antimicrobial agents. The predominant proline-rich proteins (200 kDa+) are composed of tandem repeated motifs and distinguished by an unusually high proline and charged residue content. Unlike the highly structured proteins such as silks used for prey capture by spiders and insects, these proteins lack ordered secondary structure over their entire length. We propose that on expulsion of slime from the gland onto prey, evaporative water loss triggers a glass transition change in the protein solution, resulting in adhesive and enmeshing thread formation, assisted by cross-linking of complementary charged and hydrophobic regions of the protein. Euperipatoides rowelli has developed an entirely new method of capturing prey by harnessing disordered proteins rather than structured, silk-like proteins.
Collapse
|
57
|
Uversky VN, Dunker AK. Understanding protein non-folding. BIOCHIMICA ET BIOPHYSICA ACTA 2010; 1804:1231-64. [PMID: 20117254 PMCID: PMC2882790 DOI: 10.1016/j.bbapap.2010.01.017] [Citation(s) in RCA: 901] [Impact Index Per Article: 64.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/21/2009] [Revised: 01/09/2010] [Accepted: 01/21/2010] [Indexed: 02/07/2023]
Abstract
This review describes the family of intrinsically disordered proteins, members of which fail to form rigid 3-D structures under physiological conditions, either along their entire lengths or only in localized regions. Instead, these intriguing proteins/regions exist as dynamic ensembles within which atom positions and backbone Ramachandran angles exhibit extreme temporal fluctuations without specific equilibrium values. Many of these intrinsically disordered proteins are known to carry out important biological functions which, in fact, depend on the absence of a specific 3-D structure. The existence of such proteins does not fit the prevailing structure-function paradigm, which states that a unique 3-D structure is a prerequisite to function. Thus, the protein structure-function paradigm has to be expanded to include intrinsically disordered proteins and alternative relationships among protein sequence, structure, and function. This shift in the paradigm represents a major breakthrough for biochemistry, biophysics and molecular biology, as it opens new levels of understanding with regard to the complex life of proteins. This review will try to answer the following questions: how were intrinsically disordered proteins discovered? Why don't these proteins fold? What is so special about intrinsic disorder? What are the functional advantages of disordered proteins/regions? What is the functional repertoire of these proteins? What are the relationships between intrinsically disordered proteins and human diseases?
Collapse
Affiliation(s)
- Vladimir N Uversky
- Institute for Intrinsically Disordered Protein Research, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | | |
Collapse
|
58
|
Xue B, Li L, Meroueh SO, Uversky VN, Dunker AK. Analysis of structured and intrinsically disordered regions of transmembrane proteins. MOLECULAR BIOSYSTEMS 2010; 5:1688-1702. [PMID: 19585006 DOI: 10.1039/b905913j] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Integral membrane proteins display two major types of transmembrane structure, helical bundles and beta barrels. The main functional roles of transmembrane proteins are the transport of small molecules and cell signaling, and sometimes these two roles are coupled. For cytosolic, water-soluble proteins, signaling and regulatory functions are often carried out by intrinsically disordered regions. Our long range goal is to determine whether integral membrane proteins likewise use disordered regions for signaling and regulation. Here we carried out a systematic bioinformatics investigation of intrinsically disordered regions obtained from integral membrane proteins for which crystal structures have been determined, and for which the intrinsic disorder was identified as missing electron density. We found 120 disorder-containing integral membrane proteins having a total of 33675 residues, with 3209 of the residues distributed among 240 different disordered regions. These disordered regions were compared with those obtained from water-soluble proteins with regards to their amino acid compositional biases, and to the accuracies of various disorder predictors. The results of these analyses show that the disordered regions from helical bundle integral membrane proteins, those from beta barrel integral membrane proteins, and those from water soluble proteins all exhibit statistically distinct amino acid compositional biases. Despite these differences in composition, current algorithms make reasonably accurate predictions of disorder for these membrane proteins. Although the small size of the current data sets are limiting, these results suggest that developing new predictors that make use of data from disordered regions in helical bundles and beta barrels, especially as these datasets increase in size, will likely lead to significantly more accurate disorder predictions for these two classes of integral membrane proteins.
Collapse
Affiliation(s)
- Bin Xue
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
| | | | | | | | | |
Collapse
|
59
|
Arnot CJ, Gay NJ, Gangloff M. Molecular mechanism that induces activation of Spätzle, the ligand for the Drosophila Toll receptor. J Biol Chem 2010; 285:19502-9. [PMID: 20378549 DOI: 10.1074/jbc.m109.098186] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The Drosophila Toll receptor is activated by an endogenous cytokine ligand Spätzle. Active ligand is generated in response to positional cues in embryonic dorso-ventral patterning and microbial pathogens in the insect immune response. Spätzle is secreted as a pro-protein and is processed into an active form by the serine endoproteases Easter and Spätzle-processing enzyme during dorso-ventral patterning and infection, respectively. Here, we provide evidence for the molecular mechanism of this activation process. We show that the Spätzle prodomain masks a predominantly hydrophobic region of Spätzle and that proteolysis causes a conformational change that exposes determinants that are critical for binding to the Toll receptor. We also gather that a conserved sequence motif in the prodomain presents features of an amphipathic helix likely to bind a hydrophobic cleft in Spätzle thereby occluding the putative Toll binding region. This mechanism of activation has a striking similarity to that of coagulogen, a clotting factor of the horseshoe crab, an invertebrate that has changed little in 400 million years. Taken together, our findings demonstrate that an ancient passive defense system has been adapted during evolution and converted for use in a critical pathway of innate immune signaling and embryonic morphogenesis.
Collapse
Affiliation(s)
- Christopher J Arnot
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, United Kingdom
| | | | | |
Collapse
|
60
|
Schaefer C, Schlessinger A, Rost B. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be. ACTA ACUST UNITED AC 2010; 26:625-31. [PMID: 20081223 PMCID: PMC2828120 DOI: 10.1093/bioinformatics/btq012] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Motivation: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder. Results: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely. Contact:schaefer@rostlab.org Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christian Schaefer
- Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics (C2B2), Columbia University, 1130 St Nicholas Ave., New York, NY 10032, USA.
| | | | | |
Collapse
|
61
|
Pentony MM, Ward J, Jones DT. Computational resources for the prediction and analysis of native disorder in proteins. Methods Mol Biol 2010; 604:369-93. [PMID: 20013384 DOI: 10.1007/978-1-60761-444-9_25] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Proteomics attempts to characterise the gene products expressed in a cell or tissue via a range of biophysical techniques including crystallography and NMR and, more relevantly to this volume, chromatography and mass spectrometry. It is becoming increasingly clear that the native states of segments of many of the cellular proteins are not stable, folded structures, and much of the proteome is in an unfolded, disordered state. These proteins and their disordered segments have functionally interesting properties and provide novel challenges for the biophysical techniques that are used to study them. This chapter focuses on computational approaches to predicting such regions and analyzing the functions linked to them, and has implications for protein scientists who wish to study such properties as molecular recognition and post-translational modifications. We also discuss resources where the results of predictions have been collated, making them publicly available to the wider biological community.
Collapse
Affiliation(s)
- Melissa M Pentony
- Department of Computer Science, University College London, Gower Street, London, UK
| | | | | |
Collapse
|
62
|
Swaminathan K, Adamczak R, Porollo A, Meller J. Enhanced Prediction of Conformational Flexibility and Phosphorylation in Proteins. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2010; 680:307-19. [DOI: 10.1007/978-1-4419-5913-3_35] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
|
63
|
Rossi P, Swapna GVT, Huang YJ, Aramini JM, Anklin C, Conover K, Hamilton K, Xiao R, Acton TB, Ertekin A, Everett JK, Montelione GT. A microscale protein NMR sample screening pipeline. JOURNAL OF BIOMOLECULAR NMR 2010; 46:11-22. [PMID: 19915800 PMCID: PMC2797623 DOI: 10.1007/s10858-009-9386-z] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2009] [Accepted: 10/14/2009] [Indexed: 05/14/2023]
Abstract
As part of efforts to develop improved methods for NMR protein sample preparation and structure determination, the Northeast Structural Genomics Consortium (NESG) has implemented an NMR screening pipeline for protein target selection, construct optimization, and buffer optimization, incorporating efficient microscale NMR screening of proteins using a micro-cryoprobe. The process is feasible because the newest generation probe requires only small amounts of protein, typically 30-200 microg in 8-35 microl volume. Extensive automation has been made possible by the combination of database tools, mechanization of key process steps, and the use of a micro-cryoprobe that gives excellent data while requiring little optimization and manual setup. In this perspective, we describe the overall process used by the NESG for screening NMR samples as part of a sample optimization process, assessing optimal construct design and solution conditions, as well as for determining protein rotational correlation times in order to assess protein oligomerization states. Database infrastructure has been developed to allow for flexible implementation of new screening protocols and harvesting of the resulting output. The NESG micro NMR screening pipeline has also been used for detergent screening of membrane proteins. Descriptions of the individual steps in the NESG NMR sample design, production, and screening pipeline are presented in the format of a standard operating procedure.
Collapse
Affiliation(s)
- Paolo Rossi
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - G. V. T. Swapna
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Yuanpeng J. Huang
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - James M. Aramini
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Clemens Anklin
- Bruker Biospin Corporation, 15 Fortune Drive, Billerica, MA 01821 USA
| | - Kenith Conover
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Keith Hamilton
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Rong Xiao
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Thomas B. Acton
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Asli Ertekin
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - John K. Everett
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
| | - Gaetano T. Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, 679 Hoes Lane, Piscataway, NJ 08854 USA
- Northeast Structural Genomics Consortium, Piscataway, NJ USA
- Department of Biochemistry, Robert Wood Johnson Medical School, UMDNJ, Piscataway, NJ 08854 USA
| |
Collapse
|
64
|
Abstract
In recent years it was shown that a large number of proteins are either fully or partially disordered. Intrinsically disordered proteins are ubiquitary proteins that fulfill essential biological functions while lacking a stable 3D structure. Despite the large abundance of disorder, disordered regions are still poorly detected. The identification of disordered regions facilitates the functional annotation of proteins and is instrumental in delineating boundaries of protein domains amenable to crystallization. This chapter focuses on the methods currently employed for predicting disorder and identifying regions involved in induced folding.
Collapse
Affiliation(s)
- Sonia Longhi
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Marseille, France
| | | | | |
Collapse
|
65
|
Lee JH, Kim HJ, Kim HD, Lee BC, Chun JS, Park CS. Modulation of the conductance-voltage relationship of the BK(Ca) channel by shortening the cytosolic loop connecting two RCK domains. Biophys J 2009; 97:730-7. [PMID: 19651031 DOI: 10.1016/j.bpj.2009.04.058] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2008] [Revised: 04/20/2009] [Accepted: 04/24/2009] [Indexed: 12/25/2022] Open
Abstract
Calcium-dependent gating of large-conductance calcium-activated potassium (BK(Ca)) channels is mediated by the intracellular carboxyl terminus, which contains two domains of regulator of K(+) conductance (RCK). In mammalian BK(Ca) channels, the two RCK domains are separated by a protein segment of 101 residues that is poorly conserved in evolution and predicted to have no regular secondary structures. We investigated the functional importance of this loop using a series of deletion mutations. We found that the length, rather than the specific sequence at the central region of the segment, is critical for the functionality of the channel. As the length of the loop is progressively shorted, the conductance-voltage relationship gradually shifts toward more positive voltages with a minimum length of 70 amino acids, in an apparent response to increased tension within the loop. Thus, the functional activity of the BK(Ca) channel can be modulated by altering the tension of this loop region.
Collapse
Affiliation(s)
- Ju-Ho Lee
- Department of Life Science, Gwangju Institute of Science and Technology, Gwangju, Korea
| | | | | | | | | | | |
Collapse
|
66
|
Gerard FCA, Ribeiro EDA, Leyrat C, Ivanov I, Blondel D, Longhi S, Ruigrok RWH, Jamin M. Modular organization of rabies virus phosphoprotein. J Mol Biol 2009; 388:978-96. [PMID: 19341745 DOI: 10.1016/j.jmb.2009.03.061] [Citation(s) in RCA: 89] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Revised: 03/23/2009] [Accepted: 03/25/2009] [Indexed: 10/20/2022]
Abstract
A phosphoprotein (P) is found in all viruses of the Mononegavirales order. These proteins form homo-oligomers, fulfil similar roles in the replication cycles of the various viruses, but differ in their length and oligomerization state. Sequence alignments reveal no sequence similarity among proteins from viruses belonging to the same family. Sequence analysis and experimental data show that phosphoproteins from viruses of the Paramyxoviridae contain structured domains alternating with intrinsically disordered regions. Here, we used predictions of disorder of secondary structure, and an analysis of sequence conservation to predict the domain organization of the phosphoprotein from Sendai virus, vesicular stomatitis virus (VSV) and rabies virus (RV P). We devised a new procedure for combining the results from multiple prediction methods and locating the boundaries between disordered regions and structured domains. To validate the proposed modular organization predicted for RV P and to confirm that the putative structured domains correspond to autonomous folding units, we used two-hybrid and biochemical approaches to characterize the properties of several fragments of RV P. We found that both central and C-terminal domains can fold in isolation, that the central domain is the oligomerization domain, and that the C-terminal domain binds to nucleocapsids. Our results suggest a conserved organization of P proteins in the Rhabdoviridae family in concatenated functional domains resembling that of the P proteins in the Paramyxoviridae family.
Collapse
Affiliation(s)
- Francine C A Gerard
- UJF-EMBL-CNRS UMI 3265 - Unit of Virus Host Cell Interactions, Grenoble, France
| | | | | | | | | | | | | | | |
Collapse
|
67
|
A LIM-9 (FHL)/SCPL-1 (SCP) complex interacts with the C-terminal protein kinase regions of UNC-89 (obscurin) in Caenorhabditis elegans muscle. J Mol Biol 2009; 386:976-88. [PMID: 19244614 DOI: 10.1016/j.jmb.2009.01.016] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The C. elegans gene unc-89 encodes a set of mostly giant polypeptides (up to 900 kDa) that contain multiple immunoglobulin (Ig) and fibronectin type 3 (Fn3), a triplet of SH3-DH-PH, and two protein kinase domains. The loss of function mutant phenotype and localization of antibodies to UNC-89 proteins indicate that the function of UNC-89 is to help organize sarcomeric A-bands, especially M-lines. Recently, we reported that each of the protein kinase domains interacts with SCPL-1, which contains a CTD-type protein phosphatase domain. Here, we report that SCPL-1 interacts with LIM-9 (FHL), a protein that we first discovered as an interactor of UNC-97 (PINCH) and UNC-96, components of an M-line costamere in nematode muscle. We show that LIM-9 can interact with UNC-89 through its first kinase domain and a portion of unique sequence lying between the two kinase domains. All the interactions were confirmed by biochemical methods. A yeast three-hybrid assay demonstrates a ternary complex between the two protein kinase regions and SCPL-1. Evidence that the UNC-89/SCPL-1 interaction occurs in vivo was provided by showing that over-expression of SCPL-1 results in disorganization of UNC-89 at M-lines. We suggest two structural models for the interactions of SCPL-1 and LIM-9 with UNC-89 at the M-line.
Collapse
|
68
|
Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B. Improved disorder prediction by combination of orthogonal approaches. PLoS One 2009; 4:e4433. [PMID: 19209228 PMCID: PMC2635965 DOI: 10.1371/journal.pone.0004433] [Citation(s) in RCA: 161] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 12/15/2008] [Indexed: 12/15/2022] Open
Abstract
Disordered proteins are highly abundant in regulatory processes such as transcription and cell-signaling. Different methods have been developed to predict protein disorder often focusing on different types of disordered regions. Here, we present MD, a novel META-Disorder prediction method that molds various sources of information predominantly obtained from orthogonal prediction methods, to significantly improve in performance over its constituents. In sustained cross-validation, MD not only outperforms its origins, but it also compares favorably to other state-of-the-art prediction methods in a variety of tests that we applied. Availability: http://www.rostlab.org/services/md/
Collapse
Affiliation(s)
- Avner Schlessinger
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America.
| | | | | | | | | |
Collapse
|
69
|
Han P, Zhang X, Feng ZP. Predicting disordered regions in proteins using the profiles of amino acid indices. BMC Bioinformatics 2009; 10 Suppl 1:S42. [PMID: 19208144 PMCID: PMC2648739 DOI: 10.1186/1471-2105-10-s1-s42] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Intrinsically unstructured or disordered proteins are common and functionally important. Prediction of disordered regions in proteins can provide useful information for understanding protein function and for high-throughput determination of protein structures. RESULTS In this paper, algorithms are presented to predict long and short disordered regions in proteins, namely the long disordered region prediction algorithm DRaai-L and the short disordered region prediction algorithm DRaai-S. These algorithms are developed based on the Random Forest machine learning model and the profiles of amino acid indices representing various physiochemical and biochemical properties of the 20 amino acids. CONCLUSION Experiments on DisProt3.6 and CASP7 demonstrate that some sets of the amino acid indices have strong association with the ordered and disordered status of residues. Our algorithms based on the profiles of these amino acid indices as input features to predict disordered regions in proteins outperform that based on amino acid composition and reduced amino acid composition, and also outperform many existing algorithms. Our studies suggest that the profiles of amino acid indices combined with the Random Forest learning model is an important complementary method for pinpointing disordered regions in proteins.
Collapse
Affiliation(s)
- Pengfei Han
- School of Computer Science and IT, RMIT University, Melbourne, VIC 3001, Australia.
| | | | | |
Collapse
|
70
|
Han P, Zhang X, Norton RS, Feng ZP. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinformatics 2009; 10:8. [PMID: 19128505 PMCID: PMC2637845 DOI: 10.1186/1471-2105-10-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2008] [Accepted: 01/07/2009] [Indexed: 12/02/2022] Open
Abstract
Background Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies. Results A new algorithm, IUPforest-L, for predicting long disordered regions using the random forest learning model is proposed in this paper. IUPforest-L is based on the Moreau-Broto auto-correlation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences. In 10-fold cross validation tests, IUPforest-L can achieve an area of 89.5% under the receiver operating characteristic (ROC) curve. Compared with existing disorder predictors, IUPforest-L has high prediction accuracy and is efficient for predicting long disordered regions in large-scale proteomes. Conclusion The random forest model based on the auto-correlation functions of the AAIs within a protein fragment and other physicochemical features could effectively detect long disordered regions in proteins. A new predictor, IUPforest-L, was developed to batch predict long disordered regions in proteins, and the server can be accessed from
Collapse
Affiliation(s)
- Pengfei Han
- The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria 3052, Australia.
| | | | | | | |
Collapse
|
71
|
Schlessinger A, Liu J, Rost B. Natively unstructured loops differ from other loops. PLoS Comput Biol 2008; 3:e140. [PMID: 17658943 PMCID: PMC1924875 DOI: 10.1371/journal.pcbi.0030140] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 06/05/2007] [Indexed: 11/24/2022] Open
Abstract
Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks. The details of protein structures are important for function. Regions that do not adopt any regular structure in isolation (natively unstructured or disordered regions) initially appeared as a curious exception to this structure–function paradigm. It has become increasingly clear that unstructured regions are fundamental to many roles and that they are particularly important for multicellular organisms. Structural biology is just beginning to apprehend the stunning diversity of these roles. Here, we focused on unstructured regions dominated by a particular type of loop, namely the natively unstructured one. We developed a method that succeeded in the distinction between well-structured and natively unstructured loops. For the development, we did not use any experimental data for unstructured regions; when tested on experimental data, the method performed surprisingly well. Due to its different premises, the method captured very different aspects of unstructured regions than other methods that we tested. We applied the new method to two different problems. The first was the identification of proteins that may be difficult targets for structure determination. The second was the identification of worm proteins that have many interaction partners (more than seven) and unstructured regions. Surprisingly, we found unstructured regions of the loopy type in more than 50% of all the promiscuous worm proteins.
Collapse
Affiliation(s)
- Avner Schlessinger
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | | | |
Collapse
|
72
|
Ribeiro EA, Favier A, Gerard FCA, Leyrat C, Brutscher B, Blondel D, Ruigrok RWH, Blackledge M, Jamin M. Solution structure of the C-terminal nucleoprotein-RNA binding domain of the vesicular stomatitis virus phosphoprotein. J Mol Biol 2008; 382:525-38. [PMID: 18657547 DOI: 10.1016/j.jmb.2008.07.028] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2008] [Accepted: 07/07/2008] [Indexed: 10/21/2022]
Abstract
Beyond common features in their genome organization and replication mechanisms, the evolutionary relationships among viruses of the Rhabdoviridae family are difficult to decipher because of the great variability in the amino acid sequence of their proteins. The phosphoprotein (P) of vesicular stomatitis virus (VSV) is an essential component of the RNA transcription and replication machinery; in particular, it contains binding sites for the RNA-dependent RNA polymerase and for the nucleoprotein. Here, we devised a new method for defining boundaries of structured domains from multiple disorder prediction algorithms, and we identified an autonomous folding C-terminal domain in VSV P (P(CTD)). We show that, like the C-terminal domain of rabies virus (RV) P, VSV P(CTD) binds to the viral nucleocapsid (nucleoprotein-RNA complex). We solved the three-dimensional structure of VSV P(CTD) by NMR spectroscopy and found that the topology of its polypeptide chain resembles that of RV P(CTD). The common part of both proteins could be superimposed with a backbone RMSD from mean atomic coordinates of 2.6 A. VSV P(CTD) has a shorter N-terminal helix (alpha(1)) than RV P(CTD); it lacks two alpha-helices (helices alpha(3) and alpha(6) of RV P), and the loop between strands beta(1) and beta(2) is longer than that in RV. Dynamical properties measured by NMR relaxation revealed the presence of fast motions (below the nanosecond timescale) in loop regions (amino acids 209-214) and slower conformational exchange in the N- and C-terminal helices. Characterization of a longer construct indicated that P(CTD) is preceded by a flexible linker. The results presented here support a modular organization of VSV P, with independent folded domains separated by flexible linkers, which is conserved among different genera of Rhabdoviridae and is similar to that proposed for the P proteins of the Paramyxoviridae.
Collapse
Affiliation(s)
- Euripedes A Ribeiro
- UJF-EMBL-CNRS-UMR 5233-Unit of Virus Host Cell Interactions, 6 rue Jules Horowitz, 38042 Grenoble Cedex 9, France
| | | | | | | | | | | | | | | | | |
Collapse
|
73
|
Cortese MS, Uversky VN, Dunker AK. Intrinsic disorder in scaffold proteins: getting more from less. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2008; 98:85-106. [PMID: 18619997 DOI: 10.1016/j.pbiomolbio.2008.05.007] [Citation(s) in RCA: 224] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
Regulation, recognition and cell signaling involve the coordinated actions of many players. Signaling scaffolds, with their ability to bring together proteins belonging to common and/or interlinked pathways, play crucial roles in orchestrating numerous events by coordinating specific interactions among signaling proteins. This review examines the roles of intrinsic disorder (ID) in signaling scaffold protein function. Several well-characterized scaffold proteins with structurally and functionally characterized ID regions are used here to illustrate the importance of ID for scaffolding function. These examples include scaffolds that are mostly disordered, only partially disordered or those in which the ID resides in a scaffold partner. Specific scaffolds discussed include RNase, voltage-activated potassium channels, axin, BRCA1, GSK-3beta, p53, Ste5, titin, Fus3, BRCA1, MAP2, D-AKAP2 and AKAP250. Among the mechanisms discussed are: molecular recognition features, fly-casting, ease of encounter complex formation, structural isolation of partners, modulation of interactions between bound partners, masking of intramolecular interaction sites, maximized interaction surface per residue, toleration of high evolutionary rates, binding site overlap, allosteric modification, palindromic binding, reduced constraints for alternative splicing, efficient regulation via posttranslational modification, efficient regulation via rapid degradation, protection of normally solvent-exposed sites, enhancing the plasticity of interaction and molecular crowding. We conclude that ID can enhance scaffold function by a diverse array of mechanisms. In other words, scaffold proteins utilize several ID-facilitated mechanisms to enhance function, and by doing so, get more functionality from less structure.
Collapse
Affiliation(s)
- Marc S Cortese
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
74
|
Liu J, Zhang Y, Lei X, Zhang Z. Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective. Genome Biol 2008. [PMID: 18397526 DOI: 10.1186/gb‐2008‐9‐4‐r69] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The rates of molecular evolution for protein-coding genes depend on the stringency of functional or structural constraints. The Ka/Ks ratio has been commonly used as an indicator of selective constraints and is typically calculated from interspecies alignments. Recent accumulation of single nucleotide polymorphism (SNP) data has enabled the derivation of Ka/Ks ratios for polymorphism (SNP A/S ratios). RESULTS Using data from the dbSNP database, we conducted the first large-scale survey of SNP A/S ratios for different structural and functional properties. We confirmed that the SNP A/S ratio is largely correlated with Ka/Ks for divergence. We observed stronger selective constraints for proteins that have high mRNA expression levels or broad expression patterns, have no paralogs, arose earlier in evolution, have natively disordered regions, are located in cytoplasm and nucleus, or are related to human diseases. On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins. Our analysis also revealed that histones and protein kinases are among the protein families that are under the strongest selective constraints, whereas olfactory and taste receptors are among the most variable groups. CONCLUSION Our study suggests that the SNP A/S ratio is a robust measure for selective constraints. The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.
Collapse
Affiliation(s)
- Jinfeng Liu
- Department of Bioinformatics, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA
| | | | | | | |
Collapse
|
75
|
Liu J, Zhang Y, Lei X, Zhang Z. Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective. Genome Biol 2008; 9:R69. [PMID: 18397526 PMCID: PMC2643940 DOI: 10.1186/gb-2008-9-4-r69] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2008] [Revised: 03/25/2008] [Accepted: 04/08/2008] [Indexed: 09/03/2023] Open
Abstract
A large-scale survey using single nucleotide polymorphism data from dbSNP provides insights into the evolutionary selection constraints on human proteins of different structural and functional categories. Background The rates of molecular evolution for protein-coding genes depend on the stringency of functional or structural constraints. The Ka/Ks ratio has been commonly used as an indicator of selective constraints and is typically calculated from interspecies alignments. Recent accumulation of single nucleotide polymorphism (SNP) data has enabled the derivation of Ka/Ks ratios for polymorphism (SNP A/S ratios). Results Using data from the dbSNP database, we conducted the first large-scale survey of SNP A/S ratios for different structural and functional properties. We confirmed that the SNP A/S ratio is largely correlated with Ka/Ks for divergence. We observed stronger selective constraints for proteins that have high mRNA expression levels or broad expression patterns, have no paralogs, arose earlier in evolution, have natively disordered regions, are located in cytoplasm and nucleus, or are related to human diseases. On the residue level, we found higher degrees of variation for residues that are exposed to solvent, are in a loop conformation, natively disordered regions or low complexity regions, or are in the signal peptides of secreted proteins. Our analysis also revealed that histones and protein kinases are among the protein families that are under the strongest selective constraints, whereas olfactory and taste receptors are among the most variable groups. Conclusion Our study suggests that the SNP A/S ratio is a robust measure for selective constraints. The correlations between SNP A/S ratios and other variables provide valuable insights into the natural selection of various structural or functional properties, particularly for human-specific genes and constraints within the human lineage.
Collapse
Affiliation(s)
- Jinfeng Liu
- Department of Bioinformatics, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA
| | | | | | | |
Collapse
|
76
|
Higurashi M, Ishida T, Kinoshita K. Identification of transient hub proteins and the possible structural basis for their multiple interactions. Protein Sci 2008; 17:72-8. [PMID: 18156468 DOI: 10.1110/ps.073196308] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Proteins that can interact with multiple partners play central roles in the network of protein-protein interactions. They are called hub proteins, and recently it was suggested that an abundance of intrinsically disordered regions on their surfaces facilitates their binding to multiple partners. However, in those studies, the hub proteins were identified as proteins with multiple partners, regardless of whether the interactions were transient or permanent. As a result, a certain number of hub proteins are subunits of stable multi-subunit proteins, such as supramolecules. It is well known that stable complexes and transient complexes have different structural features, and thus the statistics based on the current definition of hub proteins will hide the true nature of hub proteins. Therefore, in this paper, we first describe a new approach to identify proteins with multiple partners dynamically, using the Protein Data Bank, and then we performed statistical analyses of the structural features of these proteins. We refer to the proteins as transient hub proteins or sociable proteins, to clarify the difference with hub proteins. As a result, we found that the main difference between sociable and nonsociable proteins is not the abundance of disordered regions, in contrast to the previous studies, but rather the structural flexibility of the entire protein. We also found greater predominance of charged and polar residues in sociable proteins than previously reported.
Collapse
Affiliation(s)
- Miho Higurashi
- Institute of Medical Science, The University of Tokyo, Tokyo, Japan
| | | | | |
Collapse
|
77
|
Bannen RM, Bingman CA, Phillips GN. Effect of low-complexity regions on protein structure determination. ACTA ACUST UNITED AC 2008; 8:217-26. [PMID: 18302007 DOI: 10.1007/s10969-008-9039-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 02/05/2008] [Indexed: 11/24/2022]
Abstract
It has been previously shown that protein sequences containing a quasi-repetitive assortment of amino acids are common in genomes and databases such as Swiss-Prot but are under-represented in the structure-based Protein Data Bank (PDB). Structural genomics groups have been using the absence of these "low-complexity" sequences for several years as a way to select proteins that have a good chance of successful structure determination. In this study, we examine the data deposited in the PDB as well as the available data from structural genomics groups in TargetDB and PepcDB to reveal interesting trends that could be taken into consideration when using low-complexity sequences as part of the target selection process.
Collapse
Affiliation(s)
- Ryan M Bannen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53711, USA
| | | | | |
Collapse
|
78
|
Abstract
The recent advance in our understanding of the relation of protein structure and function cautions that many proteins, or regions of proteins, exist and function without a well-defined three-dimensional structure. These intrinsically disordered/unstructured proteins (IDP/IUP) are frequent in proteomes and carry out essential functions, but their lack of stable structures hampers efforts of solving structures at high resolution by x-ray crystallography and/or NMR. Thus, filtering such proteins/regions out of high-throughput structural genomics pipelines would be of significant benefit in terms of cost and success rate. This chapter outlines the theoretical background of structural disorder, and provides practical advice on the application of advanced bioinformatic predictors to this end, that is to recognize fully/mostly disordered proteins or regions, which are incompatible with structure determination. An emphasis is also given to a somewhat different approach, in which ordered/disordered regions are explicitly delineated to the end of making constructs amenable for structure determination even when disordered regions are present.
Collapse
Affiliation(s)
- Zsuzsanna Dosztányi
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest, Hungary
| | | |
Collapse
|
79
|
Iyer LM, Anantharaman V, Wolf MY, Aravind L. Comparative genomics of transcription factors and chromatin proteins in parasitic protists and other eukaryotes. Int J Parasitol 2007; 38:1-31. [PMID: 17949725 DOI: 10.1016/j.ijpara.2007.07.018] [Citation(s) in RCA: 192] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2007] [Revised: 07/26/2007] [Accepted: 07/30/2007] [Indexed: 11/18/2022]
Abstract
Comparative genomics of parasitic protists and their free-living relatives are profoundly impacting our understanding of the regulatory systems involved in transcription and chromatin dynamics. While some parts of these systems are highly conserved, other parts are rapidly evolving, thereby providing the molecular basis for the variety in the regulatory adaptations of eukaryotes. The gross number of specific transcription factors and chromatin proteins are positively correlated with proteome size in eukaryotes. However, the individual types of specific transcription factors show an enormous variety across different eukaryotic lineages. The dominant families of specific transcription factors even differ between sister lineages, and have been shaped by gene loss and lineage-specific expansions. Recognition of this principle has helped in identifying the hitherto unknown, major specific transcription factors of several parasites, such as apicomplexans, Entamoeba histolytica, Trichomonas vaginalis, Phytophthora and ciliates. Comparative analysis of predicted chromatin proteins from protists allows reconstruction of the early evolutionary history of histone and DNA modification, nucleosome assembly and chromatin-remodeling systems. Many key catalytic, peptide-binding and DNA-binding domains in these systems ultimately had bacterial precursors, but were put together into distinctive regulatory complexes that are unique to the eukaryotes. In the case of histone methylases, histone demethylases and SWI2/SNF2 ATPases, proliferation of paralogous families followed by acquisition of novel domain architectures, seem to have played a major role in producing a diverse set of enzymes that create and respond to an epigenetic code of modified histones. The diversification of histone acetylases and DNA methylases appears to have proceeded via repeated emergence of new versions, most probably via transfers from bacteria to different eukaryotic lineages, again resulting in lineage-specific diversity in epigenetic signals. Even though the key histone modifications are universal to eukaryotes, domain architectures of proteins binding post-translationally modified-histones vary considerably across eukaryotes. This indicates that the histone code might be "interpreted" differently from model organisms in parasitic protists and their relatives. The complexity of domain architectures of chromatin proteins appears to have increased during eukaryotic evolution. Thus, Trichomonas, Giardia, Naegleria and kinetoplastids have relatively simple domain architectures, whereas apicomplexans and oomycetes have more complex architectures. RNA-dependent post-transcriptional silencing systems, which interact with chromatin-level regulatory systems, show considerable variability across parasitic protists, with complete loss in many apicomplexans and partial loss in Trichomonas vaginalis. This evolutionary synthesis offers a robust scaffold for future investigation of transcription and chromatin structure in parasitic protists.
Collapse
Affiliation(s)
- Lakshminarayan M Iyer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | | | |
Collapse
|
80
|
Schlessinger A, Punta M, Rost B. Natively unstructured regions in proteins identified from contact predictions. ACTA ACUST UNITED AC 2007; 23:2376-84. [PMID: 17709338 DOI: 10.1093/bioinformatics/btm349] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Natively unstructured (also dubbed intrinsically disordered) regions in proteins lack a defined 3D structure under physiological conditions and often adopt regular structures under particular conditions. Proteins with such regions are overly abundant in eukaryotes, they may increase functional complexity of organisms and they usually evade structure determination in the unbound form. Low propensity for the formation of internal residue contacts has been previously used to predict natively unstructured regions. RESULTS We combined PROFcon predictions for protein-specific contacts with a generic pairwise potential to predict unstructured regions. This novel method, Ucon, outperformed the best available methods in predicting proteins with long unstructured regions. Furthermore, Ucon correctly identified cases missed by other methods. By computing the difference between predictions based on specific contacts (approach introduced here) and those based on generic potentials (realized in other methods), we might identify unstructured regions that are involved in protein-protein binding. We discussed one example to illustrate this ambitious aim. Overall, Ucon added quality and an orthogonal aspect that may help in the experimental study of unstructured regions in network hubs. AVAILABILITY http://www.predictprotein.org/submit_ucon.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Avner Schlessinger
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
| | | | | |
Collapse
|
81
|
John SP, Wang T, Steffen S, Longhi S, Schmaljohn CS, Jonsson CB. Ebola virus VP30 is an RNA binding protein. J Virol 2007; 81:8967-76. [PMID: 17567691 PMCID: PMC1951390 DOI: 10.1128/jvi.02523-06] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
The Ebola virus (EBOV) genome encodes for several proteins that are necessary and sufficient for replication and transcription of the viral RNAs in vitro; NP, VP30, VP35, and L. VP30 acts in trans with an RNA secondary structure upstream of the first transcriptional start site to modulate transcription. Using a bioinformatics approach, we identified a region within the N terminus of VP30 with sequence features that typify intrinsically disordered regions and a putative RNA binding site. To experimentally assess the ability of VP30 to directly interact with the viral RNA, we purified recombinant EBOV VP30 to >90% homogeneity and assessed RNA binding by UV cross-linking and filter-binding assays. VP30 is a strongly acidophilic protein; RNA binding became stronger as pH was decreased. Zn(2+), but not Mg(2+), enhanced activity. Enhancement of transcription by VP30 requires a RNA stem-loop located within nucleotides 54 to 80 of the leader region. VP30 showed low binding affinity to the predicted stem-loop alone or to double-stranded RNA but showed a good binding affinity for the stem-loop when placed in the context of upstream and downstream sequences. To map the region responsible for interacting with RNA, we constructed, purified, and assayed a series of N-terminal deletion mutations of VP30 for RNA binding. The key amino acids supporting RNA binding activity map to residues 26 to 40, a region rich in arginine. Thus, we show for the first time the direct interaction of EBOV VP30 with RNA and the importance of the N-terminal region for binding RNA.
Collapse
Affiliation(s)
- Sinu P John
- Graduate Program in Biochemistry and Molecular Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | | | | | | | | | |
Collapse
|
82
|
Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK. Intrinsic disorder and functional proteomics. Biophys J 2007; 92:1439-56. [PMID: 17158572 PMCID: PMC1796814 DOI: 10.1529/biophysj.106.094045] [Citation(s) in RCA: 549] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Accepted: 11/15/2006] [Indexed: 11/18/2022] Open
Abstract
The recent advances in the prediction of intrinsically disordered proteins and the use of protein disorder prediction in the fields of molecular biology and bioinformatics are reviewed here, especially with regard to protein function. First, a close look is taken at intrinsically disordered proteins and then at the methods used for their experimental characterization. Next, the major statistical properties of disordered regions are summarized, and prediction models developed thus far are described, including their numerous applications in functional proteomics. The future of the prediction of protein disorder and the future uses of such predictions in functional proteomics comprise the last section of this article.
Collapse
Affiliation(s)
- Predrag Radivojac
- School of Informatics, Indiana University, Bloomington, Indiana, USA
| | | | | | | | | | | |
Collapse
|
83
|
Ferrara TM, Flaherty DB, Benian GM. Titin/connectin-related proteins in C. elegans: a review and new findings. J Muscle Res Cell Motil 2007; 26:435-47. [PMID: 16453163 DOI: 10.1007/s10974-005-9027-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Tracey M Ferrara
- Department of Pathology, Emory University, Atlanta, GA 30322, USA
| | | | | |
Collapse
|
84
|
Sitbon E, Pietrokovski S. Occurrence of protein structure elements in conserved sequence regions. BMC STRUCTURAL BIOLOGY 2007; 7:3. [PMID: 17210087 PMCID: PMC1781454 DOI: 10.1186/1472-6807-7-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/04/2006] [Accepted: 01/09/2007] [Indexed: 11/19/2022]
Abstract
BACKGROUND Conserved protein sequence regions are extremely useful for identifying and studying functionally and structurally important regions. By means of an integrated analysis of large-scale protein structure and sequence data, structural features of conserved protein sequence regions were identified. RESULTS Helices and turns were found to be underrepresented in conserved regions, while strands were found to be overrepresented. Similar numbers of loops were found in conserved and random regions. CONCLUSION These results can be understood in light of the structural constraints on different secondary structure elements, and their role in protein structural stabilization and topology. Strands can tolerate fewer sequence changes and nonetheless keep their specific shape and function. They thus tend to be more conserved than helices, which can keep their shape and function with more changes. Loop behavior can be explained by the presence of both constrained and freely changing loops in proteins. Our detailed statistical analysis of diverse proteins links protein evolution to the biophysics of protein thermodynamic stability and folding. The basic structural features of conserved sequence regions are also important determinants of protein structure motifs and their function.
Collapse
Affiliation(s)
- Einat Sitbon
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Shmuel Pietrokovski
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
85
|
Uversky VN, Radivojac P, Iakoucheva LM, Obradovic Z, Dunker AK. Prediction of intrinsic disorder and its use in functional proteomics. Methods Mol Biol 2007; 408:69-92. [PMID: 18314578 DOI: 10.1007/978-1-59745-547-3_5] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The number of experimentally verified, intrinsically disordered (ID) proteins is rapidly rising. Research is often focused on a structural characterization of a given protein, looking for several key features. However, ID proteins with their dynamic structures that interconvert on a number of time-scales are difficult targets for the majority of traditional biophysical and biochemical techniques. Structural and functional analyses of these proteins can be significantly aided by disorder predictions. The current advances in the prediction of ID proteins and the use of protein disorder prediction in the fields of molecular biology and bioinformatics are briefly overviewed herein. A method is provided to utilize intrinsic disorder knowledge to gain structural and functional information related to individual proteins, protein groups, families, classes, and even entire proteomes.
Collapse
|
86
|
Singh GP, Ganapathi M, Dash D. Role of intrinsic disorder in transient interactions of hub proteins. Proteins 2006; 66:761-5. [PMID: 17154416 DOI: 10.1002/prot.21281] [Citation(s) in RCA: 127] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Hubs in the protein-protein interaction network have been classified as "party" hubs, which are highly correlated in their mRNA expression with their partners while "date" hubs show lesser correlation. In this study, we explored the role of intrinsic disorder in date and party hub interactions. The data reveals that intrinsic disorder is significantly enriched in date hub proteins when compared with party hub proteins. Intrinsic disorder has been largely implicated in transient binding interactions. The disorder to order transition, which occurs during binding interactions in disordered regions, renders the interaction highly reversible while maintaining the high specificity. The enrichment of intrinsic disorder in date hubs may facilitate transient interactions, which might be required for date hubs to interact with different partners at different times.
Collapse
Affiliation(s)
- Gajinder Pal Singh
- Institute of Genomics and Integrative Biology (CSIR), Delhi University Campus, Delhi, India
| | | | | |
Collapse
|
87
|
Han P, Zhang X, Norton RS, Feng ZP. Predicting Disordered Regions in Proteins Based on Decision Trees of Reduced Amino Acid Composition. J Comput Biol 2006; 13:1723-34. [PMID: 17238841 DOI: 10.1089/cmb.2006.13.1723] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Intrinsically unstructured proteins (IUPs) are proteins lacking a fixed three dimensional structure or containing long disordered regions. IUPs play an important role in biology and disease. Identifying disordered regions in protein sequences can provide useful information on protein structure and function, and can assist high-throughput protein structure determination. In this paper we present a system for predicting disordered regions in proteins based on decision trees and reduced amino acid composition. Concise rules based on biochemical properties of amino acid side chains are generated for prediction. Coarser information extracted from the composition of amino acids can not only improve the prediction accuracy but also increase the learning efficiency. In cross-validation tests, with four groups of reduced amino acid composition, our system can achieve a recall of 80% at a 13% false positive rate for predicting disordered regions, and the overall accuracy can reach 83.4%. This prediction accuracy is comparable to most, and better than some, existing predictors. Advantages of our approach are high prediction accuracy for long disordered regions and efficiency for large-scale sequence analysis. Our software is freely available for academic use upon request.
Collapse
Affiliation(s)
- Pengfei Han
- School of Computer Science and IT, RMIT University, Melbourne, Victoria, Australia
| | | | | | | |
Collapse
|
88
|
Feng ZP, Zhang X, Han P, Arora N, Anders RF, Norton RS. Abundance of intrinsically unstructured proteins in P. falciparum and other apicomplexan parasite proteomes. Mol Biochem Parasitol 2006; 150:256-67. [PMID: 17010454 DOI: 10.1016/j.molbiopara.2006.08.011] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2006] [Revised: 08/28/2006] [Accepted: 08/28/2006] [Indexed: 11/21/2022]
Abstract
Preliminary sequence analysis of Plasmodium falciparum has shown that the proteome of this organism is enriched in intrinsically unstructured proteins (IUPs), which are either completely disordered or contain large disordered regions. IUPs have been characterized as a unique class of proteins that plays an important role in biology and disease. In this study, the IUP contents in the proteomes of apicomplexan parasites, especially the proteome of P. falciparum and its various life cycle stages, have been evaluated with DisEMBL-1.4. Compared with other proteomes, apicomplexan species are extremely abundant in proteins containing long disordered regions, and the IUP contents in mammalian Plasmodium species are higher than in most other apicomplexan parasites. The proteome of the P. falciparum sporozoite appears to be distinct from the other life cycle stages in having an even higher content of disordered proteins. The abundance of IUPs in the P. falciparum proteome correlates with its enrichment in repetitive sequences. The structural plasticity of IUPs, which allows promiscuous binding interactions, may favour parasite survival both by inhibiting the generation of effective high affinity antibody responses and by facilitating the interactions with host molecules necessary for attachment and invasion of host cells.
Collapse
Affiliation(s)
- Zhi-Ping Feng
- Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic. 3050, Australia.
| | | | | | | | | | | |
Collapse
|
89
|
Han P, Zhang X, Norton RS, Feng ZP. Predicting Disordered Regions in Proteins Based on Decision Trees of Reduced Amino Acid Composition. J Comput Biol 2006. [DOI: 10.1089/cmb.2006.13.1579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Pengfei Han
- School of Computer Science and IT, RMIT University, Melbourne, Victoria, Australia
| | - Xiuzhen Zhang
- School of Computer Science and IT, RMIT University, Melbourne, Victoria, Australia
| | - Raymond S. Norton
- Division of Structural Biology, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| | - Zhi-Ping Feng
- Division of Structural Biology, The Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
| |
Collapse
|
90
|
Sobolevsky Y, Trifonov EN. Protein Modules Conserved Since LUCA. J Mol Evol 2006; 63:622-34. [PMID: 17075700 DOI: 10.1007/s00239-005-0190-4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2005] [Accepted: 12/02/2005] [Indexed: 11/28/2022]
Abstract
Universal scale of the sequence conservation has been recently introduced based on omnipresence of the protein sequence motifs across species. A large spectrum of short sequences, up to eight residues has been found to reside in all or almost all prokaryotic organisms. By this discovery a principally novel quantitative approach is introduced to the problem of reconstruction of the last universal common ancestor (LUCA). The most conserved elements (protein modules) with defined structures and sequences harboring the omnipresent motifs are outlined in this work, by combining the sequence and protein crystal structure data. The structurally conserved modules involve 25-30 amino acid residues and have appearance of closed loops, loop-n-lock structures. This confirms earlier conclusions on the loop-fold structure of globular proteins. Many of the topmost conserved modules represent the primary closed loop prototypes, that have been derived by whole genome sequence searches. The data presented, thus, make a basis for further developments toward the earliest stages of protein evolution.
Collapse
Affiliation(s)
- Yehoshua Sobolevsky
- Genome Diversity Center, Institute of Evolution, University of Haifa, Haifa 31905, Israel
| | | |
Collapse
|
91
|
Coronado JE, Attie O, Epstein SL, Qiu WG, Lipke PN. Composition-modified matrices improve identification of homologs of saccharomyces cerevisiae low-complexity glycoproteins. EUKARYOTIC CELL 2006; 5:628-37. [PMID: 16607010 PMCID: PMC1459670 DOI: 10.1128/ec.5.4.628-637.2006] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Yeast glycoproteins are representative of low-complexity sequences, those sequences rich in a few types of amino acids. Low-complexity protein sequences comprise more than 10% of the proteome but are poorly aligned by existing methods. Under default conditions, BLAST and FASTA use the scoring matrix BLOSUM62, which is optimized for sequences with diverse amino acid compositions. Because low-complexity sequences are rich in a few amino acids, these tools tend to align the most common residues in nonhomologous positions, thereby generating anomalously high scores, deviations from the expected extreme value distribution, and small e values. This anomalous scoring prevents BLOSUM62-based BLAST and FASTA from identifying correct homologs for proteins with low-complexity sequences, including Saccharomyces cerevisiae wall proteins. We have devised and empirically tested scoring matrices that compensate for the overrepresentation of some amino acids in any query sequence in different ways. These matrices were tested for sensitivity in finding true homologs, discrimination against nonhomologous and random sequences, conformance to the extreme value distribution, and accuracy of e values. Of the tested matrices, the two best matrices (called E and gtQ) gave reliable alignments in BLAST and FASTA searches, identified a consistent set of paralogs of the yeast cell wall test set proteins, and improved the consistency of secondary structure predictions for cell wall proteins.
Collapse
Affiliation(s)
- Juan E Coronado
- Department of Biological Sciences, Hunter College, 695 Park Ave., New York, NY 10021, USA
| | | | | | | | | |
Collapse
|
92
|
Ferron F, Longhi S, Canard B, Karlin D. A practical overview of protein disorder prediction methods. Proteins 2006; 65:1-14. [PMID: 16856179 DOI: 10.1002/prot.21075] [Citation(s) in RCA: 205] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In the past few years there has been a growing awareness that a large number of proteins contain long disordered (unstructured) regions that often play a functional role. However, these disordered regions are still poorly detected. Recognition of disordered regions in a protein is important for two main reasons: reducing bias in sequence similarity analysis by avoiding alignment of disordered regions against ordered ones, and helping to delineate boundaries of protein domains to guide structural and functional studies. As none of the available method for disorder prediction can be taken as fully reliable on its own, we present an overview of the methods currently employed highlighting their advantages and drawbacks. We show a few practical examples of how they can be combined to avoid pitfalls and to achieve more reliable predictions.
Collapse
Affiliation(s)
- François Ferron
- Architecture et Fonction des Macromolécules Biologiques, UMR 6098 CNRS et Universités Aix-Marseille I et II, Marseille, France
| | | | | | | |
Collapse
|
93
|
Vullo A, Bortolami O, Pollastri G, Tosatto SCE. Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines. Nucleic Acids Res 2006; 34:W164-8. [PMID: 16844983 PMCID: PMC1538873 DOI: 10.1093/nar/gkl166] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Intrinsically disordered proteins have long stretches of their polypeptide chain, which do not adopt a single native structure composed of stable secondary and tertiary structure in the absence of binding partners. The prediction of intrinsically disordered regions in proteins from sequence is increasingly becoming of interest, as the presence of many such regions in the complete genome sequences are discovered and important functional roles are associated with them. We have developed a machine learning approach based on two support vector machines (SVM) to discriminate disordered regions from sequence. The SVM are trained and benchmarked on two sets, representing long and short disordered regions. A preliminary version of Spritz was shown to perform consistently well at the recent biannual CASP-6 experiment [Critical Assessment of Techniques for Protein Structure Prediction (CASP), 2004]. The fully developed Spritz method is freely available as a web server at and .
Collapse
Affiliation(s)
| | - Oscar Bortolami
- Department of Biology and CRIBI Biotechnology Centre, University of PadovaItaly
| | - Gianluca Pollastri
- To whom correspondence should be addressed. Tel: +353 1 716 2926; fax: +353 1 269 7262;
| | | |
Collapse
|
94
|
Romov PA, Li F, Lipke PN, Epstein SL, Qiu WG. Comparative genomics reveals long, evolutionarily conserved, low-complexity islands in yeast proteins. J Mol Evol 2006; 63:415-25. [PMID: 16927006 DOI: 10.1007/s00239-005-0291-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 04/27/2006] [Indexed: 01/12/2023]
Abstract
Eukaryotic proteomes abound in low-complexity sequences, including tandem repeats and regions with significantly biased amino acid compositions. We assessed the functional importance of compositionally biased sequences in the yeast proteome using an evolutionary analysis of 2838 orthologous open reading frame (ORF) families from three Saccharomyces species (S. cerevisiae, S. bayanus, and S. paradoxus). Sequence conservation was measured by the amino acid sequence variability and by the ratio of nonsynonymous-to-synonymous nucleotide substitutions (K(a)/K(s)) between pairs of orthologous ORFs. A total of 1033 ORF families contained one or more long (at least 45 residues), low-complexity islands as defined by a measure based on the Shannon information index. Low-complexity islands were generally less conserved than ORFs as a whole; on average they were 50% more variable in amino acid sequences and 50% higher in K(a)/K(s) ratios. Fast-evolving low-complexity sequences outnumbered conserved low-complexity sequences by a ratio of 10 to 1. Sequence differences between orthologous ORFs fit well to a selectively neutral Poisson model of sequence divergence. We therefore used the Poisson model to identify conserved low-complexity sequences. ORFs containing the 33 most conserved low-complexity sequences were overrepresented by those encoding nucleic acid binding proteins, cytoskeleton components, and intracellular transporters. While a few conserved low-complexity islands were known functional domains (e.g., DNA/RNA-binding domains), most were uncharacterized. We discuss how comparative genomics of closely related species can be employed further to distinguish functionally important, shorter, low-complexity sequences from the vast majority of such sequences likely maintained by neutral processes.
Collapse
Affiliation(s)
- Philip A Romov
- Department of Computer Science, Hunter College, City University of New York, New York, New York 10021, USA
| | | | | | | | | |
Collapse
|
95
|
Abstract
Structural flexibility has been associated with various biological processes such as molecular recognition and catalytic activity. In silico studies of protein flexibility have attempted to characterize and predict flexible regions based on simple principles. B-values derived from experimental data are widely used to measure residue flexibility. Here, we present the most comprehensive large-scale analysis of B-values. We used this analysis to develop a neural network-based method that predicts flexible-rigid residues from amino acid sequence. The system uses both global and local information (i.e., features from the entire protein such as secondary structure composition, protein length, and fraction of surface residues, and features from a local window of sequence-consecutive residues). The most important local feature was the evolutionary exchange profile reflecting sequence conservation in a family of related proteins. To illustrate its potential, we applied our method to 4 different case studies, each of which related our predictions to aspects of function. The first 2 were the prediction of regions that undergo conformational switches upon environmental changes (switch II region in Ras) and the prediction of surface regions, the rigidity of which is crucial for their function (tunnel in propeller folds). Both were correctly captured by our method. The third study established that residues in active sites of enzymes are predicted by our method to have unexpectedly low B-values. The final study demonstrated how well our predictions correlated with NMR order parameters to reflect motion. Our method had not been set up to address any of the tasks in those 4 case studies. Therefore, we expect that this method will assist in many attempts at inferring aspects of function.
Collapse
Affiliation(s)
- Avner Schlessinger
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
96
|
Sharma S, Ang SL, Shaw M, Mackey DA, Gécz J, McAvoy JW, Craig JE. Nance-Horan syndrome protein, NHS, associates with epithelial cell junctions. Hum Mol Genet 2006; 15:1972-83. [PMID: 16675532 DOI: 10.1093/hmg/ddl120] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Nance-Horan syndrome, characterized by congenital cataracts, craniofacial, dental abnormalities and mental disturbances, is an X-linked disorder with significant phenotypic heterogeneity. Affected individuals have mutations in the NHS (Nance-Horan syndrome) gene typically resulting in premature truncation of the protein. This report underlines the complexity of the regulation of the NHS gene that transcribes several isoforms. We demonstrate the differential expression of the two NHS isoforms, NHS-A and NHS-1A, and differences in the subcellular localization of the proteins encoded by these isoforms. This may in part explain the pleiotropic features of the syndrome. We show that the endogenous and exogenous NHS-A isoform localizes to the cell membrane of mammalian cells in a cell-type-dependent manner and that it co-localizes with the tight junction (TJ) protein ZO-1 in the apical aspect of cell membrane in epithelial cells. We also show that the NHS-1A isoform is a cytoplasmic protein. In the developing mammalian lens, we found continuous expression of NHS that became restricted to the lens epithelium in pre- and postnatal lens. Consistent with the in vitro findings, the NHS-A isoform associates with the apical cell membrane in the lens epithelium. This study suggests that disturbances in intercellular contacts underlie cataractogenesis in the Nance-Horan syndrome. NHS is the first gene localized at TJs that has been implicated in congenital cataracts.
Collapse
Affiliation(s)
- Shiwani Sharma
- Department of Opthalmology, Flinders University, Australia.
| | | | | | | | | | | | | |
Collapse
|
97
|
Nardini M, Svergun D, Konarev PV, Spanò S, Fasano M, Bracco C, Pesce A, Donadini A, Cericola C, Secundo F, Luini A, Corda D, Bolognesi M. The C-terminal domain of the transcriptional corepressor CtBP is intrinsically unstructured. Protein Sci 2006; 15:1042-50. [PMID: 16597837 PMCID: PMC2242513 DOI: 10.1110/ps.062115406] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
C-terminal binding proteins (CtBPs) are moonlighting proteins involved in nuclear transcriptional corepression and in Golgi membrane tubule fission. Structural information on CtBPs is available for their substrate-binding domain, responsible for transcriptional repressor recognition/binding, and for the nucleotide-binding domain, involved in NAD(H)-binding and dimerization. On the contrary, little is known about the structure of CtBP C-terminal region ( approximately 90 residues), hosting sites for post-translational modifications. In the present communication we apply a combined approach based on bioinformatics, nuclear magnetic resonance, circular dichroism spectroscopy, and small-angle X-ray scattering, and we show that the CtBP C-terminal region is intrinsically unstructured in the full-length CtBP and in constructs lacking the substrate- and/or the nucleotide-binding domains. The flexible nature of this protein region, and its structural transitions, may be instrumental for CtBP recognition and binding to diverse molecular partners.
Collapse
Affiliation(s)
- Marco Nardini
- Department of Biomolecular Sciences and Biotechnology, and CNR-INFM, University of Milano, I-20131 Milano, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
98
|
Singh GP, Ganapathi M, Sandhu KS, Dash D. Intrinsic unstructuredness and abundance of PEST motifs in eukaryotic proteomes. Proteins 2006; 62:309-15. [PMID: 16299712 DOI: 10.1002/prot.20746] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The study of unfolded protein regions has gained importance because of their prevalence and important roles in various cellular functions. These regions have characteristically high net charge and low hydrophobicity. The amino acid sequence determines the intrinsic unstructuredness of a region and, therefore, efforts are ongoing to delineate the sequence motifs, which might contribute to protein disorder. We find that PEST motifs are enriched in the characterized disordered regions as compared with globular ones. Analysis of representative PDB chains revealed very few structures containing PEST sequences and the majority of them lacked regular secondary structure. A proteome-wide study in completely sequenced eukaryotes with predicted unfolded and folded proteins shows that PEST proteins make up a large fraction of unfolded dataset as compared with the folded proteins. Our data also reveal the prevalence of PEST proteins in eukaryotic proteomes (approximately 25%). Functional classification of the PEST-containing proteins shows an over- and under-representation in proteins involved in regulation and metabolism, respectively. Furthermore, our analysis shows that predicted PEST regions do not exhibit any preference to be localized in the C terminals of proteins, as reported earlier.
Collapse
Affiliation(s)
- Gajinder Pal Singh
- Institute of Genomics and Integrative Biology (CSIR), Delhi University Campus, Delhi, India
| | | | | | | |
Collapse
|
99
|
Bhalla J, Storchan GB, MacCarthy CM, Uversky VN, Tcherkasskaya O. Local flexibility in molecular function paradigm. Mol Cell Proteomics 2006; 5:1212-23. [PMID: 16571897 DOI: 10.1074/mcp.m500315-mcp200] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
It is generally accepted that the functional activity of biological macromolecules requires tightly packed three-dimensional structures. Recent theoretical and experimental evidence indicates, however, the importance of molecular flexibility for the proper functioning of some proteins. We examined high resolution structures of proteins in various functional categories with respect to the secondary structure assessment. The latter was considered as a characteristic of the inherent flexibility of a polypeptide chain. We found that the proteins in functionally competent conformational states might be comprised of 20-70% flexible residues. For instance, proteins involved in gene regulation, e.g. transcription factors, are on average largely disordered molecules with over 60% of amino acids residing in "coiled" configurations. In contrast, oxygen transporters constitute a class of relatively rigid molecules with only 30% of residues being locally flexible. Phylogenic comparison of a large number of protein families with respect to the propagation of secondary structure illuminates the growing role of the local flexibility in organisms of greater complexity. Furthermore the local flexibility in protein molecules appears to be dependent on the molecular confinement and is essentially larger in extracellular proteins.
Collapse
Affiliation(s)
- Jag Bhalla
- Biochemistry and Molecular & Cellular Biology, Georgetown University School of Medicine, Washington, DC 20007, USA
| | | | | | | | | |
Collapse
|
100
|
Turlure F, Maertens G, Rahman S, Cherepanov P, Engelman A. A tripartite DNA-binding element, comprised of the nuclear localization signal and two AT-hook motifs, mediates the association of LEDGF/p75 with chromatin in vivo. Nucleic Acids Res 2006; 34:1653-65. [PMID: 16549878 PMCID: PMC1405818 DOI: 10.1093/nar/gkl052] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Lens epithelium-derived growth factor p75 (LEDGF/p75) is a DNA-binding, transcriptional co-activator that participates in HIV-1 integration site targeting. Using complementary approaches, we determined the mechanisms of LEDGF/p75 DNA-binding in vitro and chromatin-association in living cells. The binding of highly-purified, recombinant protein was assayed by surface plasmon resonance (SPR) and electrophoretic mobility gel shift. Neither assay revealed evidence for sequence-specific DNA-binding. Residues 146-197 spanning the nuclear localization signal (NLS) and two AT-hook motifs mediated non-specific DNA-binding, and DNA-binding deficient mutants retained the ability to efficiently stimulate HIV-1 integrase activity in vitro. Chromatin-association was assessed by visualizing the localization of EGFP fusion proteins in interphase and mitotic cells. Although a conserved N-terminal PWWP domain was not required for binding to condensed mitotic chromosomes, its deletion subtly affected the nucleoplasmic distribution of the protein during interphase. A dual AT-hook mutant associated normally with chromatin, yet when the mutations were combined with NLS changes or deletion of the PWWP domain, chromatin-binding function was lost. As the PWWP domain did not readily bind free DNA in vitro, our results indicate that chromatin-association is primarily affected through DNA-binding, with the PWWP domain likely contributing a protein interaction to the overall affinity of LEDGF/p75 for human chromatin.
Collapse
Affiliation(s)
| | - Goedele Maertens
- Department of Pathology, Harvard Medical SchoolBoston, MA 02115, USA
| | | | - Peter Cherepanov
- Department of Pathology, Harvard Medical SchoolBoston, MA 02115, USA
| | - Alan Engelman
- Department of Pathology, Harvard Medical SchoolBoston, MA 02115, USA
- To whom correspondence should be addressed. Tel: +1 617 632 4361; Fax: +1 617 632 3113;
| |
Collapse
|