1
|
Abstract
Since the year 2000 a number of large RNA three-dimensional structures have been determined by X-ray crystallography. Structures composed of more than 100 nucleotide residues include the signal recognition particle RNA, group I intron, the GlmS ribozyme, RNAseP RNA, and ribosomal RNAs from Haloarcula morismortui, Escherichia coli, Thermus thermophilus, and Deinococcus radiodurans. These large RNAs are constructed from the same secondary and tertiary structural motifs identified in smaller RNAs but appear to have a larger organizational architecture. They are dominated by long continuous interhelical base stacking, tend to segregate into domains, and are planar in overall shape as opposed to their globular protein counterparts. These findings have consequences in RNA folding, intermolecular interaction, and packing, in addition to studies of design and engineering and structure prediction.
Collapse
Affiliation(s)
- Stephen R Holbrook
- Structural Biology Department, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
| |
Collapse
|
2
|
Ding C, He X, Xiong H, Peng H, Holbrook SR. Transitive closure and metric inequality of weighted graphs: detecting protein interaction modules using cliques. INT J DATA MIN BIOIN 2008; 1:162-77. [PMID: 18399069 DOI: 10.1504/ijdmb.2006.010854] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We study transitivity properties of edge weights in complex networks. We show that enforcing transitivity leads to a transitivity inequality which is equivalent to ultra-metric inequality. This can be used to define transitive closure on weighted undirected graphs, which can be computed using a modified Floyd-Warshall algorithm. These new concepts are extended to dissimilarity graphs and triangle inequalities. From this, we extend the clique concept from unweighted graph to weighted graph. We outline several applications and present results of detecting protein functional modules in a protein interaction network.
Collapse
Affiliation(s)
- Chris Ding
- Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA.
| | | | | | | | | |
Collapse
|
3
|
Wang C, Ding C, Yang Q, Holbrook SR. Consistent dissection of the protein interaction network by combining global and local metrics. Genome Biol 2008; 8:R271. [PMID: 18154653 PMCID: PMC2246273 DOI: 10.1186/gb-2007-8-12-r271] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2007] [Revised: 12/14/2007] [Accepted: 12/21/2007] [Indexed: 11/15/2022] Open
Abstract
A new network decomposition method is proposed that uses both a global metric and a local metric to identify protein interaction modules in the protein interaction network. We propose a new network decomposition method to systematically identify protein interaction modules in the protein interaction network. Our method incorporates both a global metric and a local metric for balance and consistency. We have compared the performance of our method with several earlier approaches on both simulated and real datasets using different criteria, and show that our method is more robust to network alterations and more effective at discovering functional protein modules.
Collapse
Affiliation(s)
- Chunlin Wang
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
4
|
Zeng E, Ding C, Narasimhan G, Holbrook SR. Estimating support for protein-protein interaction data with applications to function prediction. Comput Syst Bioinformatics Conf 2008; 7:73-84. [PMID: 19642270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Almost every cellular process requires the interactions of pairs or larger complexes of proteins. High throughput protein-protein interaction (PPI) data have been generated using techniques such as the yeast two-hybrid systems, mass spectrometry method, and many more. Such data provide us with a new perspective to predict protein functions and to generate protein-protein interaction networks, and many recent algorithms have been developed for this purpose. However, PPI data generated using high throughput techniques contain a large number of false positives. In this paper, we have proposed a novel method to evaluate the support for PPI data based on gene ontology information. If the semantic similarity between genes is computed using gene ontology information and using Resnik's formula, then our results show that we can model the PPI data as a mixture model predicated on the assumption that true protein-protein interactions will have higher support than the false positives in the data. Thus semantic similarity between genes serves as a metric of support for PPI data. Taking it one step further, new function prediction approaches are also being proposed with the help of the proposed metric of the support for the PPI data. These new function prediction approaches outperform their conventional counterparts. New evaluation methods are also proposed.
Collapse
Affiliation(s)
- Erliang Zeng
- Bioinformatics Research Group, School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA.
| | | | | | | |
Collapse
|
5
|
Ames GF, Mimura CS, Holbrook SR, Shyamala V. Traffic ATPases: a superfamily of transport proteins operating from Escherichia coli to humans. Adv Enzymol Relat Areas Mol Biol 2006; 65:1-47. [PMID: 1533298 DOI: 10.1002/9780470123119.ch1] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- G F Ames
- Department of Molecular and Cell Biology, University of California, Berkeley
| | | | | | | |
Collapse
|
6
|
Abstract
MOTIVATION Small non-coding RNA (ncRNA) genes play important regulatory roles in a variety of cellular processes. However, detection of ncRNA genes is a great challenge to both experimental and computational approaches. In this study, we describe a new approach called positive sample only learning (PSoL) to predict ncRNA genes in the Escherichia coli genome. Although PSoL is a machine learning method for classification, it requires no negative training data, which, in general, is hard to define properly and affects the performance of machine learning dramatically. In addition, using the support vector machine (SVM) as the core learning algorithm, PSoL can integrate many different kinds of information to improve the accuracy of prediction. Besides the application of PSoL for predicting ncRNAs, PSoL is applicable to many other bioinformatics problems as well. RESULTS The PSoL method is assessed by 5-fold cross-validation experiments which show that PSoL can achieve about 80% accuracy in recovery of known ncRNAs. We compared PSoL predictions with five previously published results. The PSoL method has the highest percentage of predictions overlapping with those from other methods.
Collapse
Affiliation(s)
- Chunlin Wang
- Physical Biosciences Division, Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA
| | | | | | | |
Collapse
|
7
|
Abstract
RNAs are modular biomolecules, composed largely of conserved structural subunits, or motifs. These structural motifs comprise the secondary structure of RNA and are knit together via tertiary interactions into a compact, functional, three-dimensional structure and are to be distinguished from motifs defined by sequence or function. A relatively small number of structural motifs are found repeatedly in RNA hairpin and internal loops, and are observed to be composed of a limited number of common 'structural elements'. In addition to secondary and tertiary structure motifs, there are functional motifs specific for certain biological roles and binding motifs that serve to complex metals or other ligands. Research is continuing into the identification and classification of RNA structural motifs and is being initiated to predict motifs from sequence, to trace their phylogenetic relationships and to use them as building blocks in RNA engineering.
Collapse
Affiliation(s)
- Donna K Hendrix
- Department of Plant & Microbial Biology, University of California, Berkeley, CA, USA
| | | | | |
Collapse
|
8
|
Kim SH, Shin DH, Liu J, Oganesyan V, Chen S, Xu QS, Kim JS, Das D, Schulze-Gahmen U, Holbrook SR, Holbrook EL, Martinez BA, Oganesyan N, DeGiovanni A, Lou Y, Henriquez M, Huang C, Jancarik J, Pufan R, Choi IG, Chandonia JM, Hou J, Gold B, Yokota H, Brenner SE, Adams PD, Kim R. Structural genomics of minimal organisms and protein fold space. ACTA ACUST UNITED AC 2006; 6:63-70. [PMID: 16211501 DOI: 10.1007/s10969-005-2651-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2004] [Accepted: 02/15/2005] [Indexed: 11/29/2022]
Abstract
The initial aim of the Berkeley Structural Genomics Center is to obtain a near-complete structural complement of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter fewer than 700 genes. To achieve this goal, the current protein targets have been selected starting with those predicted to be most tractable and likely to yield new structural and functional information. During the past 3 years, the semi-automated structural genomics pipeline has been set up from cloning, expression, purification, and ultimately to structural determination. The results from the pipeline substantially increased the coverage of the protein fold space of M. pneumoniae and M. genitalium. Furthermore, about 1/2 of the structures of 'unique' protein sequences revealed new and novel folds, and over 2/3 of the structures of previously annotated 'hypothetical proteins' inferred their molecular functions.
Collapse
Affiliation(s)
- Sung-Hou Kim
- Department of Chemistry, University of California, Berkeley, 94720-5230, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
9
|
Abstract
Sequence alignment underpins common tasks in molecular biology, including genome annotation, molecular phylogenetics, and homology modeling. Fundamental to sequence alignment is the placement of gaps, which represent character insertions or deletions. We assessed the ability of a generalized affine gap cost model to reliably detect remote protein homology and to produce high-quality alignments. Generalized affine gap alignment with optimal gap parameters performed as well as the traditional affine gap model in remote homology detection. Evaluation of alignment quality showed that the generalized affine model aligns fewer residue pairs than the traditional affine model but achieves significantly higher per-residue accuracy. We conclude that generalized affine gap costs should be used when alignment accuracy carries more importance than aligned sequence length.
Collapse
Affiliation(s)
- Marcus A Zachariah
- Department of Plant and Microbial Biology, University of California, Berkeley, USA
| | | | | | | |
Collapse
|
10
|
Jang SB, Jeong MS, Carter RJ, Holbrook EL, Comolli LR, Holbrook SR. Novel crystal form of the ColE1 Rom protein. Acta Crystallogr D Biol Crystallogr 2006; 62:619-27. [PMID: 16699189 DOI: 10.1107/s0907444906012388] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2005] [Accepted: 04/05/2006] [Indexed: 11/10/2022]
Abstract
The RNA I modulator protein (Rom) acts as a co-regulator of ColE1 plasmid copy number by binding to RNA kissing hairpins and stabilizing their interaction. The structure of Rom has been determined in a new crystal form from X-ray diffraction data to 2.5 A resolution. In this structure, a dimer of the 57-amino-acid protein is found in the asymmetric unit. Each subunit consists almost entirely of two antiparallel alpha-helices joined by a short hairpin bend. The dimer contains a non-crystallographic twofold axis and forms a highly regular four-alpha-helical bundle. The structural packing in this novel crystal form is different from previously known Rom structures. The asymmetric unit contains one dimer, giving a crystal volume per protein weight (V(M)) of 1.83 A(3) Da(-1) and a low solvent content of 30%. Strong packing interactions and low solvation are characteristic of the structure. The Rom protein was cocrystallized with the Tar-Tar* kissing hairpin RNA. Although the electron-density maps do not show bound RNA, altered conformations in the side chains of Rom that are known to be involved in RNA binding have been identified. These results provide additional information about Rom protein conformational flexibility and suggest that the presence of a highly charged polymer such as RNA can promote tight packing of an RNA-binding protein, even when the RNA itself is not observed in the crystal.
Collapse
Affiliation(s)
- Se Bok Jang
- Korea Nanobiotechnology Center, Pusan National University, Jangjeon-dong, Keumjeong-gu, Busan 609-735, South Korea.
| | | | | | | | | | | |
Collapse
|
11
|
Leontis NB, Altman RB, Berman HM, Brenner SE, Brown JW, Engelke DR, Harvey SC, Holbrook SR, Jossinet F, Lewis SE, Major F, Mathews DH, Richardson JS, Williamson JR, Westhof E. The RNA Ontology Consortium: an open invitation to the RNA community. RNA 2006; 12:533-41. [PMID: 16484377 PMCID: PMC1421088 DOI: 10.1261/rna.2343206] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
The aim of the RNA Ontology Consortium (ROC) is to create an integrated conceptual framework-an RNA Ontology (RO)-with a common, dynamic, controlled, and structured vocabulary to describe and characterize RNA sequences, secondary structures, three-dimensional structures, and dynamics pertaining to RNA function. The RO should produce tools for clear communication about RNA structure and function for multiple uses, including the integration of RNA electronic resources into the Semantic Web. These tools should allow the accurate description in computer-interpretable form of the coupling between RNA architecture, function, and evolution. The purposes for creating the RO are, therefore, (1) to integrate sequence and structural databases; (2) to allow different computational tools to interoperate; (3) to create powerful software tools that bring advanced computational methods to the bench scientist; and (4) to facilitate precise searches for all relevant information pertaining to RNA. For example, one initial objective of the ROC is to define, identify, and classify RNA structural motifs described in the literature or appearing in databases and to agree on a computer-interpretable definition for each of these motifs. To achieve these aims, the ROC will foster communication and promote collaboration among RNA scientists by coordinating frequent face-to-face workshops to discuss, debate, and resolve difficult conceptual issues. These meeting opportunities will create new directions at various levels of RNA research. The ROC will work closely with the PDB/NDB structural databases and the Gene, Sequence, and Open Biomedical Ontology Consortia to integrate the RO with existing biological ontologies to extend existing content while maintaining interoperability.
Collapse
|
12
|
Jang SB, Hung LW, Jeong MS, Holbrook EL, Chen X, Turner DH, Holbrook SR. The crystal structure at 1.5 angstroms resolution of an RNA octamer duplex containing tandem G.U basepairs. Biophys J 2006; 90:4530-7. [PMID: 16581850 PMCID: PMC1471874 DOI: 10.1529/biophysj.106.081018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The crystal structure of the RNA octamer, 5'-GGCGUGCC-3' has been determined from x-ray diffraction data to 1.5 angstroms resolution. In the crystal, this oligonucleotide forms five self-complementary double-helices in the asymmetric unit. Tandem 5'GU/3'UG basepairs comprise an internal loop in the middle of each duplex. The NMR structure of this octameric RNA sequence is also known, allowing comparison of the variation among the five crystallographic duplexes and the solution structure. The G.U pairs in the five duplexes of the crystal form two direct hydrogen bonds and are stabilized by water molecules that bridge between the base of guanine (N2) and the sugar (O2') of uracil. This contrasts with the NMR structure in which only one direct hydrogen bond is observed for the G.U pairs. The reduced stability of the r(CGUG)2 motif relative to the r(GGUC)2 motif may be explained by the lack of stacking of the uracil bases between the Watson-Crick and G.U pairs as observed in the crystal structure.
Collapse
Affiliation(s)
- Se Bok Jang
- Korea Nanobiotechnology Center, Pusan National University, Jangjeon-dong, Keumjeong-gu, Busan, Korea.
| | | | | | | | | | | | | |
Collapse
|
13
|
Ye Y, Ding C, Holbrook SR. Identification of conserved protein modules from multiple organisms using transitive closure and cliques in protein interaction networks. FASEB J 2006. [DOI: 10.1096/fasebj.20.4.a530-b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Yong Ye
- Computer ScienceUNC at CharlotteMS64‐0123, 1 Cyclotron RoadBerkeleyCA94720
| | - Chris Ding
- Computational Research Divisionlawrence berkeley labMS50‐1608, 1 Cyclotron RoadBerkeleyCA94720
| | - Stephen R Holbrook
- Physical BiosciencesLawrence Berkeley LabMS64‐0123, 1 Cyclotron RoadBerkeleyCA94720
| |
Collapse
|
14
|
Abstract
Metal ions are essential for the folding of RNA into stable tertiary structures and for the catalytic activity of some RNA enzymes. To aid in the study of the roles of metal ions in RNA structural biology, we have created MeRNA (Metals in RNA), a comprehensive compilation of all metal binding sites identified in RNA 3D structures available from the PDB and Nucleic Acid Database. Currently, our database contains information relating to binding of 9764 metal ions corresponding to 23 distinct elements, in 256 RNA structures. The metal ion locations were confirmed and ligands characterized using original literature references. MeRNA includes eight manually identified metal-ion binding motifs, which are described in the literature. MeRNA is searchable by PDB identifier, metal ion, method of structure determination, resolution and R-values for X-ray structure and distance from metal to any RNA atom or to water. New structures with their respective binding motifs will be added to the database as they become available. The MeRNA database will further our understanding of the roles of metal ions in RNA folding and catalysis and have applications in structural and functional analysis, RNA design and engineering. The MeRNA database is accessible at .
Collapse
Affiliation(s)
- Liliana R. Stefan
- Department of Structural Biology, Physical Biosciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA 94720, USA
| | - Rui Zhang
- Department of Structural Biology, Physical Biosciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA 94720, USA
| | - Aaron G. Levitan
- Department of Structural Biology, Physical Biosciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA 94720, USA
| | - Donna K. Hendrix
- Department of Structural Biology, Physical Biosciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA 94720, USA
- Department of Plant and Microbial Biology111 Koshland Hall #3102University of California at BerkeleyBerkeley, CA 94720-3102, USA
| | - Steven E. Brenner
- Department of Structural Biology, Physical Biosciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA 94720, USA
- Department of Plant and Microbial Biology111 Koshland Hall #3102University of California at BerkeleyBerkeley, CA 94720-3102, USA
| | - Stephen R. Holbrook
- Department of Structural Biology, Physical Biosciences Division, Lawrence Berkeley National LaboratoryBerkeley, CA 94720, USA
- To whom correspondence should be addressed. Tel: +1 510 486 4304; Fax: +1 510 486 6798;
| |
Collapse
|
15
|
Abstract
The x-ray crystal structure of a 417-nt ribonuclease P RNA from Bacillus stearothermophilus was solved to 3.3-A resolution. This RNA enzyme is constructed from a number of coaxially stacked helical domains joined together by local and long-range interactions. These helical domains are arranged to form a remarkably flat surface, which is implicated by a wealth of biochemical data in the binding and cleavage of the precursors of transfer RNA substrate. Previous photoaffinity crosslinking data are used to position the substrate on the crystal structure and to identify the chemically active site of the ribozyme. This site is located in a highly conserved core structure formed by intricately interlaced long-range interactions between interhelical sequences.
Collapse
Affiliation(s)
- Alexei V Kazantsev
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309, USA
| | | | | | | | | | | |
Collapse
|
16
|
Abstract
The database of RNA structure has grown tremendously since the crystal structure analyses of ribosomal subunits in 2000–2001. During the past year, the trend toward determining the structure of large, complex biological RNAs has accelerated, with the analysis of three intact group I introns, A- and B-type ribonuclease P RNAs, a riboswitch–substrate complex and other structures. The growing database of RNA structures, coupled with efforts directed at the standardization of nomenclature and classification of motifs, has resulted in the identification and characterization of numerous RNA secondary and tertiary structure motifs. Because a large proportion of RNA structure can now be shown to be composed of these recurring structural motifs, a view of RNA as a modular structure built from a combination of these building blocks and tertiary linkers is beginning to emerge. At the same time, however, more detailed analysis of water, metal, ligand and protein binding to RNA is revealing the effect of these moieties on folding and structure formation. The balance between the views of RNA structure either as strictly a construct of preformed building blocks linked in a limited number of ways or as a flexible polymer assuming a global fold influenced by its environment will be the focus of current and future RNA structural biology.
Collapse
Affiliation(s)
- Stephen R Holbrook
- Structural Biology Department, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| |
Collapse
|
17
|
Thompson JD, Holbrook SR, Katoh K, Koehl P, Moras D, Westhof E, Poch O. MAO: a Multiple Alignment Ontology for nucleic acid and protein sequences. Nucleic Acids Res 2005; 33:4164-71. [PMID: 16043635 PMCID: PMC1180671 DOI: 10.1093/nar/gki735] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The application of high-throughput techniques such as genomics, proteomics or transcriptomics means that vast amounts of heterogeneous data are now available in the public databases. Bioinformatics is responding to the challenge with new integrated management systems for data collection, validation and analysis. Multiple alignments of genomic and protein sequences provide an ideal environment for the integration of this mass of information. In the context of the sequence family, structural and functional data can be evaluated and propagated from known to unknown sequences. However, effective integration is being hindered by syntactic and semantic differences between the different data resources and the alignment techniques employed. One solution to this problem is the development of an ontology that systematically defines the terms used in a specific domain. Ontologies are used to share data from different resources, to automatically analyse information and to represent domain knowledge for non-experts. Here, we present MAO, a new ontology for multiple alignments of nucleic and protein sequences. MAO is designed to improve interoperation and data sharing between different alignment protocols for the construction of a high quality, reliable multiple alignment in order to facilitate knowledge extraction and the presentation of the most pertinent information to the biologist.
Collapse
Affiliation(s)
- Julie D Thompson
- Institut de Génétique et deBiologie Moléculaire et Cellulaire 1 rue Laurent Fries, B.P. 10142, 67404 Illkirch Cedex, France.
| | | | | | | | | | | | | |
Collapse
|
18
|
Zhang Y, Chandonia JM, Ding C, Holbrook SR. Comparative mapping of sequence-based and structure-based protein domains. BMC Bioinformatics 2005; 6:77. [PMID: 15790427 PMCID: PMC1087832 DOI: 10.1186/1471-2105-6-77] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2004] [Accepted: 03/25/2005] [Indexed: 11/28/2022] Open
Abstract
Background Protein domains have long been an ill-defined concept in biology. They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. But whether these types of models alone can capture all essential features of domains is still an open question. Methods Here we provide insight on domain definitions through comparative mapping of two domain classification databases, one sequence-based (Pfam) and the other structure-based (SCOP). A mapping score is defined to indicate the significance of the mapping, and the properties of the mapping matrices are studied. Results The mapping results show a general agreement between the two databases, as well as many interesting areas of disagreement. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative.
Collapse
Affiliation(s)
- Ya Zhang
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- School of Information Sciences and Technology, The Pennsylvania State University, University Park, PA 16802, USA
| | - John-Marc Chandonia
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Chris Ding
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Stephen R Holbrook
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
19
|
Xiong H, He X, Ding C, Zhang Y, Kumar V, Holbrook SR. Identification of functional modules in protein complexes via hyperclique pattern discovery. Pac Symp Biocomput 2005:221-32. [PMID: 15759628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Proteins usually do not act isolated in a cell but function within complicated cellular pathways, interacting with other proteins either in pairs or as components of larger complexes. While many protein complexes have been identified by large-scale experimental studies, due to a large number of false-positive interactions existing in current protein complexes 10, it is still difficult to obtain an accurate understanding of functional modules, which encompass groups of proteins involved in common elementary biological function. In this paper, we present a hyperclique pattern discovery approach for extracting functional modules (hyperclique patterns) from protein complexes. A hyperclique pattern is a type of association pattern containing proteins that are highly affiliated with each other. The analysis of hyperclique patterns shows that proteins within the same pattern tend to present in the protein complex together. Also, statistically significant annotations of proteins in a pattern using the Gene Ontology suggest that proteins within the same hyperclique pattern more likely perform the same function and participate in the same biological process. More interestingly, the 3-D structural view of proteins within a hyperclique pattern reveals that these proteins physically interactwith each other. In addition, we show that several hyperclique patterns corresponding to different functions can participate in the same protein complex as independent modules. Finally, we demonstrate that a hyperclique pattern can be involved in different complexes performing different higher-order biological functions, although the pattern corresponds to a specific elementary biological function.
Collapse
Affiliation(s)
- Hui Xiong
- Computer Science & Engineering, University of Minnesota, MN, USA.
| | | | | | | | | | | |
Collapse
|
20
|
Karklin Y, Meraz RF, Holbrook SR. Classification of non-coding RNA using graph representations of secondary structure. Pac Symp Biocomput 2005:4-15. [PMID: 15759609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Some genes produce transcripts that function directly in regulatory, catalytic, or structural roles in the cell. These non-coding RNAs are prevalent in all living organisms, and methods that aid the understanding of their functional roles are essential. RNA secondary structure, the pattern of base-pairing, contains the critical information for determining the three dimensional structure and function of the molecule. In this work we examine whether the basic geometric and topological properties of secondary structure are sufficient to distinguish between RNA families in a learning framework. First, we develop a labeled dual graph representation of RNA secondary structure by adding biologically meaningful labels to the dual graphs proposed by Gan et al [1]. Next, we define a similarity measure directly on the labeled dual graphs using the recently developed marginalized kernels [2]. Using this similarity measure, we were able to train Support Vector Machine classifiers to distinguish RNAs of known families from random RNAs with similar statistics. For 22 of the 25 families tested, the classifier achieved better than 70% accuracy, with much higher accuracy rates for some families. Training a set of classifiers to automatically assign family labels to RNAs using a one vs. all multi-class scheme also yielded encouraging results. From these initial learning experiments, we suggest that the labeled dual graph representation, together with kernel machine methods, has potential for use in automated analysis and classification of uncharacterized RNA molecules or efficient genome-wide screens for RNA molecules from existing families.
Collapse
Affiliation(s)
- Yan Karklin
- Department of Computer Science, Carnegie Melon University, Pittsburgh, PA, USA. yan+@cs.cmu.edu
| | | | | |
Collapse
|
21
|
Buchko GW, Ni S, Holbrook SR, Kennedy MA. Solution structure of hypothetical Nudix hydrolase DR0079 from extremely radiation-resistant Deinococcus radiodurans bacterium. Proteins 2004; 56:28-39. [PMID: 15162484 DOI: 10.1002/prot.20082] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Using nuclear magnetic resonance (NMR) based methods, including residual dipolar coupling restraints, we have determined the solution structure of the hypothetical Deinococcus radiodurans Nudix protein DR0079 (171 residues, MW = 19.3 kDa). The protein contains eight beta-strands and three alpha-helices organized into three subdomains: an N-terminal beta-sheet (1-34), a central Nudix core (35-140), and a C-terminal helix-turn-helix (141-171). The Nudix core and the C-terminal helix-turn-helix form the fundamental fold common to the Nudix family, a large mixed beta-sheet sandwiched between alpha-helices. The residues that compose the signature Nudix sequence, GX5EX7REUXEEXGU (where U = I, L, or V and X = any amino acid), are contained in a turn-helix-turn motif on the face of the mixed beta-sheet. Chemical shift mapping experiments suggest that DR0079 binds Mg2+. Experiments designed to determine the biological function of the protein indicate that it is not a type I isopentenyl-diphosphate delta-isomerase and that it does not bind alpha,beta-methyleneadenosine 5'-triphosphate (AMPCPP) or guanosine 5'-[beta,gamma-imido]triphosphate (GMPPNP). In this article, the structure of DR0079 is compared to other known Nudix protein structures, a potential substrate-binding surface is proposed, and its possible biological function is discussed.
Collapse
Affiliation(s)
- Garry W Buchko
- Fundamental Sciences, Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington 99352, USA
| | | | | | | |
Collapse
|
22
|
Abstract
The protein interaction network presents one perspective for understanding cellular processes. Recent experiments employing high-throughput mass spectrometric characterizations have resulted in large data sets of physiologically relevant multiprotein complexes. We present a unified representation of such data sets based on an underlying bipartite graph model that is an advance over existing models of the network. Our unified representation allows for weighting of connections between proteins shared in more than one complex, as well as addressing the higher level organization that occurs when the network is viewed as consisting of protein complexes that share components. This representation also allows for the application of the rigorous MinMaxCut graph clustering algorithm for the determination of relevant protein modules in the networks. Statistically significant annotations of clusters in the protein-protein and complex-complex networks using terms from the Gene Ontology indicate that this method will be useful for posing hypotheses about uncharacterized components of protein complexes or uncharacterized relationships between protein complexes.
Collapse
Affiliation(s)
- Chris Ding
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
| | | | | | | |
Collapse
|
23
|
Ranatunga W, Hill EE, Mooster JL, Holbrook EL, Schulze-Gahmen U, Xu W, Bessman MJ, Brenner SE, Holbrook SR. Structural studies of the Nudix hydrolase DR1025 from Deinococcus radiodurans and its ligand complexes. J Mol Biol 2004; 339:103-16. [PMID: 15123424 DOI: 10.1016/j.jmb.2004.01.065] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2004] [Accepted: 01/29/2004] [Indexed: 11/20/2022]
Abstract
We have determined the crystal structure, at 1.4A, of the Nudix hydrolase DR1025 from the extremely radiation resistant bacterium Deinococcus radiodurans. The protein forms an intertwined homodimer by exchanging N-terminal segments between chains. We have identified additional conserved elements of the Nudix fold, including the metal-binding motif, a kinked beta-strand characterized by a proline two positions upstream of the Nudix consensus sequence, and participation of the N-terminal extension in the formation of the substrate-binding pocket. Crystal structures were also solved of DR1025 crystallized in the presence of magnesium and either a GTP analog or Ap(4)A (both at 1.6A resolution). In the Ap(4)A co-crystal, the electron density indicated that the product of asymmetric hydrolysis, ATP, was bound to the enzyme. The GTP analog bound structure showed that GTP was bound almost identically as ATP. Neither nucleoside triphosphate was further cleaved.
Collapse
Affiliation(s)
- Wasantha Ranatunga
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Klosterman PS, Hendrix DK, Tamura M, Holbrook SR, Brenner SE. Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops and U-turns. Nucleic Acids Res 2004; 32:2342-52. [PMID: 15121895 PMCID: PMC419439 DOI: 10.1093/nar/gkh537] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Release 2.0.1 of the Structural Classification of RNA (SCOR) database, http://scor.lbl.gov, contains a classification of the internal and hairpin loops in a comprehensive collection of 497 NMR and X-ray RNA structures. This report discusses findings of the classification that have not been reported previously. The SCOR database contains multiple examples of a newly described RNA motif, the extruded helical single strand. Internal loop base triples are classified in SCOR according to their three-dimensional context. These internal loop triples contain several examples of a frequently found motif, the minor groove AGC triple. SCOR also presents the predominant and alternate conformations of hairpin loops, as shown in the most well represented tetraloops, with consensus sequences GNRA, UNCG and ANYA. The ubiquity of the GNRA hairpin turn motif is illustrated by its presence in complex internal loops.
Collapse
Affiliation(s)
- Peter S Klosterman
- Department of Plant and Microbial Biology, University of California at Berkeley, 111 Koshland Hall, Berkeley, CA 94720-3102, USA
| | | | | | | | | |
Collapse
|
25
|
Jang SB, Baeyens K, Jeong MS, SantaLucia J, Turner D, Holbrook SR. Structures of two RNA octamers containing tandem G.A base pairs. Acta Crystallogr D Biol Crystallogr 2004; 60:829-35. [PMID: 15103128 DOI: 10.1107/s0907444904003804] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2003] [Accepted: 02/18/2004] [Indexed: 11/10/2022]
Abstract
The crystal structures of two RNA octamers, 5'-GGC(GA)GCC-3' and 5'-GIC(GA)GCC-3', have been determined from X-ray diffraction data to 2.8 and 2.7 A resolution, respectively. The RNA octamers crystallize in isomorphous unit cells containing two mispairs arranged in a self-complementary manner and one single strand in the asymmetric unit. The single strand pairs with another single strand related by crystallographic symmetry to form a third unique double helix. Tandem non-Watson-Crick G.A/A.G base pairs of the sheared type comprise an internal loop in the middle of each duplex. The NMR structure of this octameric RNA sequence is also known, allowing comparison of the variation between the six crystallographic duplexes and the solution structure. In the symmetric duplex of the octamer containing inosine, the sheared G.A pairs incorporate a bound water molecule. This duplex also binds one water molecule per strand in the minor groove adjacent to the G.A pairs.
Collapse
Affiliation(s)
- Se Bok Jang
- Korea Nanobiotechnology Center, Pusan National University, Busan 609-735, South Korea.
| | | | | | | | | | | |
Collapse
|
26
|
Abstract
SCOR, the Structural Classification of RNA (http://scor.lbl.gov), is a database designed to provide a comprehensive perspective and understanding of RNA motif three-dimensional structure, function, tertiary interactions and their relationships. SCOR 2.0 represents a major expansion and introduces a new classification organization. The new version represents the classification as a Directed Acyclic Graph (DAG), which allows a classification node to have multiple parents, in contrast to the strictly hierarchical classification used in SCOR 1.2. SCOR 2.0 supports three types of query terms in the updated search engine: PDB or NDB identifier, nucleotide sequence and keyword. We also provide parseable XML files for all information. This new release contains 511 RNA entries from the PDB as of 15 May 2003. A total of 5880 secondary structural elements are classified: 2104 hairpin loops and 3776 internal loops. RNA motifs reported in the literature, such as 'Kink turn' and 'GNRA loops', are now incorporated into the structural classification along with definitions and descriptions.
Collapse
Affiliation(s)
- Makio Tamura
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | | | |
Collapse
|
27
|
Kazantsev AV, Krivenko AA, Harrington DJ, Carter RJ, Holbrook SR, Adams PD, Pace NR. High-resolution structure of RNase P protein from Thermotoga maritima. Proc Natl Acad Sci U S A 2003; 100:7497-502. [PMID: 12799461 PMCID: PMC164615 DOI: 10.1073/pnas.0932597100] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The structure of RNase P protein from the hyperthermophilic bacterium Thermotoga maritima was determined at 1.2-A resolution by using x-ray crystallography. This protein structure is from an ancestral-type RNase P and bears remarkable similarity to the recently determined structures of RNase P proteins from bacteria that have the distinct, Bacillus type of RNase P. These two types of protein span the extent of bacterial RNase P diversity, so the results generalize the structure of the bacterial RNase P protein. The broad phylogenetic conservation of structure and distribution of potential RNA-binding elements in the RNase P proteins indicate that all of these homologous proteins bind to their cognate RNAs primarily by interaction with the phylogenetically conserved core of the RNA. The protein is found to dimerize through an extensive, well-ordered interface. This dimerization may reflect a mechanism of thermal stability of the protein before assembly with the RNA moiety of the holoenzyme.
Collapse
Affiliation(s)
- Alexei V Kazantsev
- Department of Molecular, Cellular, and Developmental Biology, University of Colorado, Boulder, CO 80309-0347, USA
| | | | | | | | | | | | | |
Collapse
|
28
|
Holbrook EL, Schulze-Gahmen U, Buchko GW, Ni S, Kennedy MA, Holbrook SR. Purification, crystallization and preliminary X-ray analysis of two nudix hydrolases from Deinococcus radiodurans. Acta Crystallogr D Biol Crystallogr 2003; 59:737-40. [PMID: 12657797 DOI: 10.1107/s0907444903002671] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2002] [Accepted: 01/30/2003] [Indexed: 11/10/2022]
Abstract
Two nudix hydrolases from Deinococcus radiodurans have been purified and crystallized. Diffraction data have been collected to 1.4 and 1.9 A resolution for DR1025 and DR0079, respectively. DR1025 belongs to space group P4(1)2(1)2/P4(3)2(1)2, with unit-cell parameters a = b = 53.2, c = 122.6 A (unit-cell Volume 346 883 A(3), V(M) = 2.5 A(3) Da(-1), solvent content 50.2%). DR0079 belongs to space group C222(1), with unit-cell parameters a = 34.1, b = 157.2, c = 126.5 A (unit-cell Volume 677 308 A(3), V(M) = 2.2 A(3) Da(-1), solvent content 44.0%). The calculated cell content of DR1025 indicates the presence of one molecule in the asymmetric unit. Dynamic light scattering and gel filtration suggest it to be a dimer in solution. The space group and unit-cell parameters of DR0079 indicate the presence of two molecules per asymmetric unit. Gel filtration and NMR spectroscopy suggest it to be a monomer in solution.
Collapse
|
29
|
Buchko GW, Ni S, Holbrook SR, Kennedy MA. 1H, (13)C, and (15)N NMR assignments of the hypothetical Nudix protein DR0079 from the extremely radiation-resistant bacterium Deinococcus radiodurans. J Biomol NMR 2003; 25:169-170. [PMID: 12652130 DOI: 10.1023/a:1022243724501] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
|
30
|
Abstract
The "ribose zipper", an important element of RNA tertiary structure, is characterized by consecutive hydrogen-bonding interactions between ribose 2'-hydroxyls from different regions of an RNA chain or between RNA chains. These tertiary contacts have previously been observed to also involve base-backbone and base-base interactions (A-minor type). We searched for ribose zipper tertiary interactions in the crystal structures of the large ribosomal subunit RNAs of Haloarcula marismortui and Deinococcus radiodurans, and the small ribosomal subunit RNA of Thermus thermophilus and identified a total of 97 ribose zippers. Of these, 20 were found in T. thermophilus 16 S rRNA, 44 in H. marismortui 23 S rRNA (plus 2 bridging 5 S and 23 S rRNAs) and 30 in D. radiodurans 23 S rRNA (plus 1 bridging 5 S and 23 S rRNAs). These were analyzed in terms of sequence conservation, structural conservation and stability, location in secondary structure, and phylogenetic conservation. Eleven types of ribose zippers were defined based on ribose-base interactions. Of these 11, seven were observed in the ribosomal RNAs. The most common of these is the canonical ribose zipper, originally observed in the P4-P6 group I intron fragment. All ribose zippers were formed by antiparallel chain interactions and only a single example extended beyond two residues, forming an overlapping ribose zipper of three consecutive residues near the small subunit A-site. Almost all ribose zippers link stem (Watson-Crick duplex) or stem-like (base-paired), with loop (external, internal, or junction) chain segments. About two-thirds of the observed ribose zippers interact with ribosomal proteins. Most of these ribosomal proteins bridge the ribose zipper chain segments with basic amino acid residues hydrogen bonding to the RNA backbone. Proteins involved in crucial ribosome function and in early stages of ribosomal assembly also stabilize ribose zipper interactions. All ribose zippers show strong sequence conservation both within these three ribosomal RNA structures and in a large database of aligned prokaryotic sequences. The physical basis of the sequence conservation is stacked base triples formed between consecutive base-pairs on the stem or stem-like segment with bases (often adenines) from the loop-side segment. These triples have previously been characterized as Type I and Type II A-minor motifs and are stabilized by base-base and base-ribose hydrogen bonds. The sequence and structure conservation of ribose zippers can be directly used in tertiary structure prediction and may have applications in molecular modeling and design.
Collapse
MESH Headings
- Bacteria/chemistry
- Bacteria/genetics
- Conserved Sequence
- Haloarcula marismortui/chemistry
- Haloarcula marismortui/genetics
- Hydrogen Bonding
- Models, Molecular
- Nucleic Acid Conformation
- Phylogeny
- Protein Binding
- RNA, Archaeal/chemistry
- RNA, Archaeal/genetics
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/genetics
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 23S/chemistry
- RNA, Ribosomal, 23S/genetics
- Ribose/chemistry
- Ribosomal Proteins/chemistry
- Thermus thermophilus/chemistry
- Thermus thermophilus/genetics
Collapse
Affiliation(s)
- Makio Tamura
- Lawrence Berkeley National Laboratory, Structural Biology Department, Physical Biosciences Division, 1 Cyclotron Road, 132 Melvin Calvin Lab, Bldg 3, Berkeley, CA 94720, USA
| | | |
Collapse
|
31
|
Klosterman PS, Tamura M, Holbrook SR, Brenner SE. SCOR: a Structural Classification of RNA database. Nucleic Acids Res 2002; 30:392-4. [PMID: 11752346 PMCID: PMC99131 DOI: 10.1093/nar/30.1.392] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2001] [Revised: 10/10/2001] [Accepted: 10/10/2001] [Indexed: 11/13/2022] Open
Abstract
The Structural Classification of RNA (SCOR) database provides a survey of the three-dimensional motifs contained in 259 NMR and X-ray RNA structures. In one classification, the structures are grouped according to function. The RNA motifs, including internal and external loops, are also organized in a hierarchical classification. The 259 database entries contain 223 internal and 203 external loops; 52 entries consist of fully complementary duplexes. A classification of the well-characterized tertiary interactions found in the larger RNA structures is also included along with examples. The SCOR database is accessible at http://scor.lbl.gov.
Collapse
Affiliation(s)
- Peter S Klosterman
- Physical Biosciences Division, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | | | | | | |
Collapse
|
32
|
Abstract
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.
Collapse
Affiliation(s)
- R J Carter
- Computational and Theoretical Biology Department, Physical Biosciences Division, National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
33
|
Abstract
Even as the number of RNA structures determined and under study multiplies, the critical step in X-ray diffraction analysis, growth of single well-ordered crystals, remains at the boundary between art and science. Recent advances in methods of RNA synthesis, purification, and characterization, as well as empirical and technical improvements in crystallization techniques, the development of cryo-crystallography, and the wider availability of bright, tunable, X-rays from synchrotron sources are improving the chances of obtaining RNA crystals suitable for X-ray structural analysis. In this review, we summarize the current status of the design, preparation, purification, and analysis of RNA for crystallization and describe the latest approaches to obtaining diffraction-quality crystals.
Collapse
Affiliation(s)
- S R Holbrook
- Structural Biology Department, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
| | | | | |
Collapse
|
34
|
Abstract
The current state of three-dimensional structure analysis of RNA by x-ray crystallography is summarized. The methods of sample preparation, crystallization, data collection, and structure solution are discussed, followed by a review of the RNA structures that have been determined and of common structural features, and finally, an appraisal of future prospects for x-ray crystal structure analysis of RNA.
Collapse
Affiliation(s)
- S R Holbrook
- Structural Biology Division, Lawrence Berkeley National Laboratory, University of California at Berkeley 94720, USA
| | | |
Collapse
|
35
|
Hung LW, Holbrook EL, Holbrook SR. The crystal structure of the Rev binding element of HIV-1 reveals novel base pairing and conformational variability. Proc Natl Acad Sci U S A 2000; 97:5107-12. [PMID: 10792052 PMCID: PMC25789 DOI: 10.1073/pnas.090588197] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The crystal and molecular structure of an RNA duplex corresponding to the high affinity Rev protein binding element (RBE) has been determined at 2.1-A resolution. Four unique duplexes are present in the crystal, comprising two structural variants. In each duplex, the RNA double helix consists of an annealed 12-mer and 14-mer that form an asymmetric internal loop consisting of G-G and G-A noncanonical base pairs and a flipped-out uridine. The 12-mer strand has an A-form conformation, whereas the 14-mer strand is distorted to accommodate the bulges and noncanonical base pairing. In contrast to the NMR model of the unbound RBE, an asymmetric G-G pair with N2-N7 and N1-O6 hydrogen bonding, is formed in each helix. The G-A base pairing agrees with the NMR structure in one structural variant, but forms a novel water-mediated pair in the other. A backbone flip and reorientation of the G-G base pair is required to assume the RBE conformation present in the NMR model of the complex between the RBE and the Rev peptide.
Collapse
Affiliation(s)
- L W Hung
- Macromolecular Crystallography Facility and Structural Biology Department, Melvin Calvin Building, Physical Biosciences Division, Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
36
|
Abstract
The crystal structure of the RNA octamer 5'-CGC(CA)GCG-3' has been determined from X-ray diffraction data to 2.3 A resolution. In the crystal, this oligomer forms a self-complementary double helix in the asymmetric unit. Tandem non-Watson-Crick C-A and A-C base pairs comprise an internal loop in the middle of the duplex, which is incorporated with little distortion of the A-form double helix. From the geometry of the C-A base pairs, it is inferred that the adenosine imino group is protonated and donates a hydrogen bond to the carbonyl group of the cytosine. The wobble geometry of the C-A+ base pairs is very similar to that of the common U-G non-Watson-Crick pair.
Collapse
Affiliation(s)
- S B Jang
- Structural Biology Department, Physical Biosciences Division, Lawrence Berkeley National Laboratory, University of California, Berkeley 94720, USA
| | | | | | | | | | | |
Collapse
|
37
|
Carter RJ, Baeyens KJ, SantaLucia J, Turner DH, Holbrook SR. The crystal structure of an RNA oligomer incorporating tandem adenosine-inosine mismatches. Nucleic Acids Res 1997; 25:4117-22. [PMID: 9321667 PMCID: PMC146998 DOI: 10.1093/nar/25.20.4117] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The X-ray crystallographic structure of the RNA duplex [r(CGCAIGCG)]2 has been refined to 2.5 A. It shows a symmetric internal loop of two non-Watson-Crick base pairs which form in the middle of the duplex. The tandem A-I/I-A pairs are related by a crystallographic two-fold axis. Both A(anti)-I(anti) mismatches are in a head-to-head conformation forming hydrogen bonds using the Watson-Crick positions. The octamer duplexes stack above one another in the cell forming a pseudo-infinite helix throughout the crystal. A hydrated calcium ion bridges between the 3'-terminal of one molecule and the backbone of another. The tandem A-I mismatches are incorporated with only minor distortion to the backbone. This is in contrast to the large helical perturbations often produced by sheared G-A pairs in RNA oligonucleotides.
Collapse
Affiliation(s)
- R J Carter
- Structural Biology Division, Lawrence Berkeley Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | |
Collapse
|
38
|
Baeyens KJ, De Bondt HL, Pardi A, Holbrook SR. A curved RNA helix incorporating an internal loop with G.A and A.A non-Watson-Crick base pairing. Proc Natl Acad Sci U S A 1996; 93:12851-5. [PMID: 8917508 PMCID: PMC24009 DOI: 10.1073/pnas.93.23.12851] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/1996] [Accepted: 08/16/1996] [Indexed: 02/03/2023] Open
Abstract
The crystal structure of the RNA dodecamer 5'-GGCC(GAAA)GGCC-3' has been determined from x-ray diffraction data to 2.3-A resolution. In the crystal, these oligomers form double helices around twofold symmetry axes. Four consecutive non-Watson-Crick base pairs make up an internal loop in the middle of the duplex, including sheared G.A pairs and novel asymmetric A.A pairs. This internal loop sequence produces a significant curvature and narrowing of the double helix. The helix is curved by 34 degrees from end to end and the diameter is narrowed by 24% in the internal loop. A Mn2+ ion is bound directly to the N7 of the first guanine in the Watson-Crick region following the internal loop and the phosphate of the preceding residue. This Mn2+ location corresponds to a metal binding site observed in the hammerhead catalytic RNA.
Collapse
Affiliation(s)
- K J Baeyens
- Structural Biology Division, Lawrence Berkeley National Laboratory, University of California, Berkeley 94720, USA
| | | | | | | |
Collapse
|
39
|
Abstract
We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.
Collapse
Affiliation(s)
- I Dubchak
- Department of Chemistry, University of California, Berkeley 94720, USA
| | | | | | | |
Collapse
|
40
|
Baeyens KJ, De Bondt HL, Holbrook SR. Structure of an RNA double helix including uracil-uracil base pairs in an internal loop. Nat Struct Biol 1995; 2:56-62. [PMID: 7719854 DOI: 10.1038/nsb0195-56] [Citation(s) in RCA: 108] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The crystal structure of the RNA dodecamer 5'-GGACUUUGGUCC-3' has been determined from X-ray diffraction data to 2.6 A resolution. This oligomer forms an asymmetric double helix in the crystal. Four consecutive non-Watson-Crick base-pairs are formed in the middle of the duplex including the first intrahelical U-U (or T-T) pairs observed in an oligonucleotide crystal structure. Two different conformations of U-U pairs are observed in the context of the surrounding sequence. One of these pairs is highly twisted, allowing a bound water to bridge across strands in the major groove. The crystal packing illustrates a new form of RNA helix-helix interaction.
Collapse
Affiliation(s)
- K J Baeyens
- Structural Biology Division, Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | |
Collapse
|
41
|
Baeyens KJ, Jancarik J, Holbrook SR. Use of low-molecular-weight polyethylene glycol in the crystallization of RNA oligomers. Acta Crystallogr D Biol Crystallogr 1994; 50:764-7. [PMID: 15299375 DOI: 10.1107/s0907444994003458] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have crystallized a variety of RNA oligonucleotides in a form suitable for X-ray diffraction studies using polyethylene glycol with a low-molecular-weight distribution (PEG 400) as the precipitant. Crystallization experiments on a set of 26 RNA oligomers ranging from eight to 12 nucleotides in length resulted in eight diffraction-quality crystals. Of these eight RNA crystals, six utilized PEG 400 as the precipitating agent. We have also been able to obtain large single crystals of a DNA-RNA hybrid, transfer RNA (two different conditions) and a catalytic RNA from PEG 400 solutions. These results suggest that PEG 400 may be a generally useful alternative to 2-methyl-2,4-pentanediol (MPD) which has, thus far, been the most successful precipitant for DNA oligomers.
Collapse
Affiliation(s)
- K J Baeyens
- Structural Biology Division, Melvin Calvin Laboratory, University of California, Berkeley 94720, USA
| | | | | |
Collapse
|
42
|
Holbrook SR, Dubchak I, Kim SH. PROBE: a computer program employing an integrated neural network approach to protein structure prediction. Biotechniques 1993; 14:984-9. [PMID: 8333967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
A computer program, PROBE, has been designed for the prediction of protein structural features from amino acid sequence. This program integrates a variety of computer-simulated neural networks, each predicting an aspect of protein structure, into a single, easy-to-use package. The surface accessibility of each residue, the presence of disulfide bonds, the overall secondary structure composition and the residue secondary structures, including beta-turn type, are predicted. In addition, the overall amino acid composition and relative hydrophobicity are used to determine whether a protein belongs to one of four common folding motifs. PROBE is able to compare and synergistically improve the predictions by allowing communication between the different networks.
Collapse
Affiliation(s)
- S R Holbrook
- Structural Biology Division, Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | |
Collapse
|
43
|
Abstract
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4 alpha-helical bundles, (2) parallel (alpha/beta)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.
Collapse
Affiliation(s)
- I Dubchak
- Department of Chemistry, Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | |
Collapse
|
44
|
Holbrook SR. Application of computational neural networks to the prediction of protein structural features. Genet Eng (N Y) 1993; 15:1-19. [PMID: 7763836 DOI: 10.1007/978-1-4899-1666-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- S R Holbrook
- Structural Biology Division, Lawrence Berkeley Laboratory, Berkeley, CA 94720
| |
Collapse
|
45
|
Abstract
The crystal structure of the RNA dodecamer duplex (r-GGACUUCGGUCC)2 has been determined. The dodecamers stack end-to-end in the crystal, simulating infinite A-form helices with only a break in the phosphodiester chain. These infinite helices are held together in the crystal by hydrogen bonding between ribose hydroxyl groups and a variety of donors and acceptors. The four noncomplementary nucleotides in the middle of the sequence did not form an internal loop, but rather a highly regular double-helix incorporating the non-Watson-Crick base pairs, G.U and U.C. This is the first direct observation of a U.C (or T.C) base pair in a crystal structure. The U.C pairs each form only a single base-base hydrogen bond, but are stabilized by a water molecule which bridges between the ring nitrogens and by four waters in the major groove which link the bases and phosphates. The lack of distortion introduced in the double helix by the U.C mismatch may explain its low efficiency of repair in DNA. The G.U wobble pair is also stabilized by a minor-groove water which bridges between the unpaired guanine amino and the ribose hydroxyl of the uracil. This structure emphasizes the importance of specific hydrogen bonding between not only the nucleotide bases, but also the ribose hydroxyls, phosphate oxygens and tightly bound waters in stabilization of the intramolecular and intermolecular structures of double helical RNA.
Collapse
Affiliation(s)
- S R Holbrook
- Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | | | |
Collapse
|
46
|
Tamura T, Holbrook SR, Kim SH. A Macintosh computer program for designing DNA sequences that code for specific peptides and proteins. Biotechniques 1991; 10:782-4. [PMID: 1878215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
A computer program (PINCERS) is described for use in the design of synthetic genes and mixed-probe DNA sequences. A protein sequence is reverse translated with generation of synonymous codons at each position producing a degenerate sequence. In order to locate potential restriction enzyme sites, the degenerate sequence is searched with a library of restriction enzymes for sites that utilize any combination of synonymous codons. These sites are indicated in a map so that they may be incorporated into the synthetic gene sequence. The program allows the user to select the appropriate codon usage table for the organism of interest and then to set a threshold usage frequency below which codons are not generated. PINCERS may also be used to assist in planning the synthesis of mixed-probe DNA sequences for cross-hybridization experiments. It can identify regions of specified length with the protein sequence that have the least overall degeneracy, thereby minimizing the number of probes to be synthesized and, therefore, maximizing the concentration of a given probe sequence.
Collapse
Affiliation(s)
- T Tamura
- Melvin Calvin Laboratory, University of California, Berkeley 94720
| | | | | |
Collapse
|
47
|
Abstract
The bonding states of cysteine play important functional and structural roles in proteins. In particular, disulfide bond formation is one of the most important factors influencing the three-dimensional fold of proteins. Proteins of known structure were used to teach computer-simulated neural networks rules for predicting the disulfide-bonding state of a cysteine given only its flanking amino acid sequence. Resulting networks make accurate predictions on sequences different from those used in training, suggesting that local sequence greatly influences cysteines in disulfide bond formation. The average prediction rate after seven independent network experiments is 81.4% for disulfide-bonded and 80.0% for non-disulfide-bonded scenarios. Predictive accuracy is related to the strength of network output activities. Network weights reveal interesting position-dependent amino acid preferences and provide a physical basis for understanding the correlation between the flanking sequence and a cysteine's disulfide-bonding state. Network predictions may be used to increase or decrease the stability of existing disulfide bonds or to aid the search for potential sites to introduce new disulfide bonds.
Collapse
Affiliation(s)
- S M Muskal
- Department of Chemistry, University of California, Berkeley 94720
| | | | | |
Collapse
|
48
|
Abstract
The amino acid residues on a protein surface play a key role in interaction with other molecules, determined many physical properties, and constrain the structure of the folded protein. A database of monomeric protein crystal structures was used to teach computer-simulated neural networks rules for predicting surface exposure from local sequence. These trained networks are able to correctly predict surface exposure for 72% of residues in a testing set using a binary model, (buried/exposed) and for 54% of residues using a ternary model (buried/intermediate/exposed). In the ternary model, only 11% of the exposed residues are predicted as buried and only 5% of the buried residues are predicted as exposed. Also, since the networks are able to predict exposure with a quantitative confidence estimate, it is possible to assign exposure for over half of the residues in a binary model with greater than 80% accuracy. Even more accurate predictions are obtained by making a consensus prediction of exposure for a homologous family. The effect of the local environment of an amino acid on its accessibility, though smaller than expected, is significant and accounts for the higher success rate of prediction than obtained with previously used criteria. In the absence of a three-dimensional structure, the ability to predict surface accessibility of amino acids directly from the sequence is a valuable tool in choosing sites of chemical modification or specific mutations and in studies of molecular interaction.
Collapse
Affiliation(s)
- S R Holbrook
- Chemical Biodynamics Division, Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | |
Collapse
|
49
|
Abstract
A structural model of guanine nucleotide-binding regulatory protein alpha subunits (G alpha subunits) is proposed based on the crystal structure of the catalytic domain of the human HRAS protein (p21ras). Because of low overall sequence similarity, structural and functional constraints were used to align the G alpha consensus sequence with that of p21ras. The resulting G alpha model specifies the spatial relationship among the guanine nucleotide-binding site, the binding site of the beta gamma subunit complex, likely regions of effector and receptor interaction, and sites of cholera and pertussis toxin modification. The locations in the model of the experimentally determined sites of proteolytic digestion, point mutation, monoclonal antibody binding, and toxin modification are consistent with and help explain the observed biological activity. Two important findings from our model are (i) the orientation of the G alpha model with respect to the membrane and (ii) the identification of the spatial proximity of the N- and C-terminal regions. Furthermore, by analogy to p21ras, the model assigns specific residues in G alpha required for binding the guanosine (G-box) and phosphates (PO4-box) and identifies residues potentially involved in the conformational switch mechanism (S-box). Specification of these critical regions in the G alpha model suggests guidelines for construction of mutants and chimeric proteins to experimentally test structural and functional hypotheses.
Collapse
Affiliation(s)
- S R Holbrook
- Division of Chemical Biodynamics, Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | |
Collapse
|
50
|
Abstract
The local mobility of the complex between the anti-tumor drug daunomycin and a DNA hexanucleotide duplex of sequence d(CpGpTpApCpG)2 has been determined by anisotropic refinement of single crystal X-ray diffraction data of 1.2 A resolution (1 A = 0.1 nm). The directions and amplitudes of the local motion indicate that changes in mobility of DNA due to daunomycin binding are primarily limited to the residues forming the intercalation site and do not propagate to the neighboring residues. The intercalated daunomycin ring system (aglycone) is rigidly fixed in the base stack, apparently serving as an anchor for the amino sugar segment of the drug which is one of the most mobile regions of the entire complex. The high flexibility of this amino sugar may be important for inhibition of replication and transcription not only by sterically blocking the minor groove, but also by allowing nonproductive interactions to be formed with various polymerases or other DNA-binding proteins. The crystallographic model is improved sufficiently by the rigid group anisotropic refinement to allow additional bound water molecules to be located.
Collapse
Affiliation(s)
- S R Holbrook
- Lawrence Berkeley Laboratory, University of California, Berkeley 94720
| | | | | | | |
Collapse
|