26
|
Tamura M, Hendrix DK, Klosterman PS, Schimmelman NRB, Brenner SE, Holbrook SR. SCOR: Structural Classification of RNA, version 2.0. Nucleic Acids Res 2004; 32:D182-4. [PMID: 14681389 PMCID: PMC308814 DOI: 10.1093/nar/gkh080] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SCOR, the Structural Classification of RNA (http://scor.lbl.gov), is a database designed to provide a comprehensive perspective and understanding of RNA motif three-dimensional structure, function, tertiary interactions and their relationships. SCOR 2.0 represents a major expansion and introduces a new classification organization. The new version represents the classification as a Directed Acyclic Graph (DAG), which allows a classification node to have multiple parents, in contrast to the strictly hierarchical classification used in SCOR 1.2. SCOR 2.0 supports three types of query terms in the updated search engine: PDB or NDB identifier, nucleotide sequence and keyword. We also provide parseable XML files for all information. This new release contains 511 RNA entries from the PDB as of 15 May 2003. A total of 5880 secondary structural elements are classified: 2104 hairpin loops and 3776 internal loops. RNA motifs reported in the literature, such as 'Kink turn' and 'GNRA loops', are now incorporated into the structural classification along with definitions and descriptions.
Collapse
|
27
|
Kazantsev AV, Krivenko AA, Harrington DJ, Carter RJ, Holbrook SR, Adams PD, Pace NR. High-resolution structure of RNase P protein from Thermotoga maritima. Proc Natl Acad Sci U S A 2003; 100:7497-502. [PMID: 12799461 PMCID: PMC164615 DOI: 10.1073/pnas.0932597100] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The structure of RNase P protein from the hyperthermophilic bacterium Thermotoga maritima was determined at 1.2-A resolution by using x-ray crystallography. This protein structure is from an ancestral-type RNase P and bears remarkable similarity to the recently determined structures of RNase P proteins from bacteria that have the distinct, Bacillus type of RNase P. These two types of protein span the extent of bacterial RNase P diversity, so the results generalize the structure of the bacterial RNase P protein. The broad phylogenetic conservation of structure and distribution of potential RNA-binding elements in the RNase P proteins indicate that all of these homologous proteins bind to their cognate RNAs primarily by interaction with the phylogenetically conserved core of the RNA. The protein is found to dimerize through an extensive, well-ordered interface. This dimerization may reflect a mechanism of thermal stability of the protein before assembly with the RNA moiety of the holoenzyme.
Collapse
|
28
|
Holbrook EL, Schulze-Gahmen U, Buchko GW, Ni S, Kennedy MA, Holbrook SR. Purification, crystallization and preliminary X-ray analysis of two nudix hydrolases from Deinococcus radiodurans. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2003; 59:737-40. [PMID: 12657797 DOI: 10.1107/s0907444903002671] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2002] [Accepted: 01/30/2003] [Indexed: 11/10/2022]
Abstract
Two nudix hydrolases from Deinococcus radiodurans have been purified and crystallized. Diffraction data have been collected to 1.4 and 1.9 A resolution for DR1025 and DR0079, respectively. DR1025 belongs to space group P4(1)2(1)2/P4(3)2(1)2, with unit-cell parameters a = b = 53.2, c = 122.6 A (unit-cell Volume 346 883 A(3), V(M) = 2.5 A(3) Da(-1), solvent content 50.2%). DR0079 belongs to space group C222(1), with unit-cell parameters a = 34.1, b = 157.2, c = 126.5 A (unit-cell Volume 677 308 A(3), V(M) = 2.2 A(3) Da(-1), solvent content 44.0%). The calculated cell content of DR1025 indicates the presence of one molecule in the asymmetric unit. Dynamic light scattering and gel filtration suggest it to be a dimer in solution. The space group and unit-cell parameters of DR0079 indicate the presence of two molecules per asymmetric unit. Gel filtration and NMR spectroscopy suggest it to be a monomer in solution.
Collapse
|
29
|
Buchko GW, Ni S, Holbrook SR, Kennedy MA. 1H, (13)C, and (15)N NMR assignments of the hypothetical Nudix protein DR0079 from the extremely radiation-resistant bacterium Deinococcus radiodurans. JOURNAL OF BIOMOLECULAR NMR 2003; 25:169-170. [PMID: 12652130 DOI: 10.1023/a:1022243724501] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
|
30
|
Abstract
The "ribose zipper", an important element of RNA tertiary structure, is characterized by consecutive hydrogen-bonding interactions between ribose 2'-hydroxyls from different regions of an RNA chain or between RNA chains. These tertiary contacts have previously been observed to also involve base-backbone and base-base interactions (A-minor type). We searched for ribose zipper tertiary interactions in the crystal structures of the large ribosomal subunit RNAs of Haloarcula marismortui and Deinococcus radiodurans, and the small ribosomal subunit RNA of Thermus thermophilus and identified a total of 97 ribose zippers. Of these, 20 were found in T. thermophilus 16 S rRNA, 44 in H. marismortui 23 S rRNA (plus 2 bridging 5 S and 23 S rRNAs) and 30 in D. radiodurans 23 S rRNA (plus 1 bridging 5 S and 23 S rRNAs). These were analyzed in terms of sequence conservation, structural conservation and stability, location in secondary structure, and phylogenetic conservation. Eleven types of ribose zippers were defined based on ribose-base interactions. Of these 11, seven were observed in the ribosomal RNAs. The most common of these is the canonical ribose zipper, originally observed in the P4-P6 group I intron fragment. All ribose zippers were formed by antiparallel chain interactions and only a single example extended beyond two residues, forming an overlapping ribose zipper of three consecutive residues near the small subunit A-site. Almost all ribose zippers link stem (Watson-Crick duplex) or stem-like (base-paired), with loop (external, internal, or junction) chain segments. About two-thirds of the observed ribose zippers interact with ribosomal proteins. Most of these ribosomal proteins bridge the ribose zipper chain segments with basic amino acid residues hydrogen bonding to the RNA backbone. Proteins involved in crucial ribosome function and in early stages of ribosomal assembly also stabilize ribose zipper interactions. All ribose zippers show strong sequence conservation both within these three ribosomal RNA structures and in a large database of aligned prokaryotic sequences. The physical basis of the sequence conservation is stacked base triples formed between consecutive base-pairs on the stem or stem-like segment with bases (often adenines) from the loop-side segment. These triples have previously been characterized as Type I and Type II A-minor motifs and are stabilized by base-base and base-ribose hydrogen bonds. The sequence and structure conservation of ribose zippers can be directly used in tertiary structure prediction and may have applications in molecular modeling and design.
Collapse
MESH Headings
- Bacteria/chemistry
- Bacteria/genetics
- Conserved Sequence
- Haloarcula marismortui/chemistry
- Haloarcula marismortui/genetics
- Hydrogen Bonding
- Models, Molecular
- Nucleic Acid Conformation
- Phylogeny
- Protein Binding
- RNA, Archaeal/chemistry
- RNA, Archaeal/genetics
- RNA, Bacterial/chemistry
- RNA, Bacterial/genetics
- RNA, Ribosomal/chemistry
- RNA, Ribosomal/genetics
- RNA, Ribosomal, 16S/chemistry
- RNA, Ribosomal, 16S/genetics
- RNA, Ribosomal, 23S/chemistry
- RNA, Ribosomal, 23S/genetics
- Ribose/chemistry
- Ribosomal Proteins/chemistry
- Thermus thermophilus/chemistry
- Thermus thermophilus/genetics
Collapse
|
31
|
Klosterman PS, Tamura M, Holbrook SR, Brenner SE. SCOR: a Structural Classification of RNA database. Nucleic Acids Res 2002; 30:392-4. [PMID: 11752346 PMCID: PMC99131 DOI: 10.1093/nar/30.1.392] [Citation(s) in RCA: 85] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2001] [Revised: 10/10/2001] [Accepted: 10/10/2001] [Indexed: 11/13/2022] Open
Abstract
The Structural Classification of RNA (SCOR) database provides a survey of the three-dimensional motifs contained in 259 NMR and X-ray RNA structures. In one classification, the structures are grouped according to function. The RNA motifs, including internal and external loops, are also organized in a hierarchical classification. The 259 database entries contain 223 internal and 203 external loops; 52 entries consist of fully complementary duplexes. A classification of the well-characterized tertiary interactions found in the larger RNA structures is also included along with examples. The SCOR database is accessible at http://scor.lbl.gov.
Collapse
|
32
|
Carter RJ, Dubchak I, Holbrook SR. A computational approach to identify genes for functional RNAs in genomic sequences. Nucleic Acids Res 2001; 29:3928-38. [PMID: 11574674 PMCID: PMC60242 DOI: 10.1093/nar/29.19.3928] [Citation(s) in RCA: 148] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80-90% accurate in jackknife testing experiments for bacteria and 90-99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.
Collapse
|
33
|
Abstract
The current state of three-dimensional structure analysis of RNA by x-ray crystallography is summarized. The methods of sample preparation, crystallization, data collection, and structure solution are discussed, followed by a review of the RNA structures that have been determined and of common structural features, and finally, an appraisal of future prospects for x-ray crystal structure analysis of RNA.
Collapse
|
34
|
Hung LW, Holbrook EL, Holbrook SR. The crystal structure of the Rev binding element of HIV-1 reveals novel base pairing and conformational variability. Proc Natl Acad Sci U S A 2000; 97:5107-12. [PMID: 10792052 PMCID: PMC25789 DOI: 10.1073/pnas.090588197] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The crystal and molecular structure of an RNA duplex corresponding to the high affinity Rev protein binding element (RBE) has been determined at 2.1-A resolution. Four unique duplexes are present in the crystal, comprising two structural variants. In each duplex, the RNA double helix consists of an annealed 12-mer and 14-mer that form an asymmetric internal loop consisting of G-G and G-A noncanonical base pairs and a flipped-out uridine. The 12-mer strand has an A-form conformation, whereas the 14-mer strand is distorted to accommodate the bulges and noncanonical base pairing. In contrast to the NMR model of the unbound RBE, an asymmetric G-G pair with N2-N7 and N1-O6 hydrogen bonding, is formed in each helix. The G-A base pairing agrees with the NMR structure in one structural variant, but forms a novel water-mediated pair in the other. A backbone flip and reorientation of the G-G base pair is required to assume the RBE conformation present in the NMR model of the complex between the RBE and the Rev peptide.
Collapse
|
35
|
Jang SB, Hung LW, Chi YI, Holbrook EL, Carter RJ, Holbrook SR. Structure of an RNA internal loop consisting of tandem C-A+ base pairs. Biochemistry 1998; 37:11726-31. [PMID: 9718295 DOI: 10.1021/bi980758j] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The crystal structure of the RNA octamer 5'-CGC(CA)GCG-3' has been determined from X-ray diffraction data to 2.3 A resolution. In the crystal, this oligomer forms a self-complementary double helix in the asymmetric unit. Tandem non-Watson-Crick C-A and A-C base pairs comprise an internal loop in the middle of the duplex, which is incorporated with little distortion of the A-form double helix. From the geometry of the C-A base pairs, it is inferred that the adenosine imino group is protonated and donates a hydrogen bond to the carbonyl group of the cytosine. The wobble geometry of the C-A+ base pairs is very similar to that of the common U-G non-Watson-Crick pair.
Collapse
|
36
|
Carter RJ, Baeyens KJ, SantaLucia J, Turner DH, Holbrook SR. The crystal structure of an RNA oligomer incorporating tandem adenosine-inosine mismatches. Nucleic Acids Res 1997; 25:4117-22. [PMID: 9321667 PMCID: PMC146998 DOI: 10.1093/nar/25.20.4117] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The X-ray crystallographic structure of the RNA duplex [r(CGCAIGCG)]2 has been refined to 2.5 A. It shows a symmetric internal loop of two non-Watson-Crick base pairs which form in the middle of the duplex. The tandem A-I/I-A pairs are related by a crystallographic two-fold axis. Both A(anti)-I(anti) mismatches are in a head-to-head conformation forming hydrogen bonds using the Watson-Crick positions. The octamer duplexes stack above one another in the cell forming a pseudo-infinite helix throughout the crystal. A hydrated calcium ion bridges between the 3'-terminal of one molecule and the backbone of another. The tandem A-I mismatches are incorporated with only minor distortion to the backbone. This is in contrast to the large helical perturbations often produced by sheared G-A pairs in RNA oligonucleotides.
Collapse
|
37
|
Baeyens KJ, De Bondt HL, Pardi A, Holbrook SR. A curved RNA helix incorporating an internal loop with G.A and A.A non-Watson-Crick base pairing. Proc Natl Acad Sci U S A 1996; 93:12851-5. [PMID: 8917508 PMCID: PMC24009 DOI: 10.1073/pnas.93.23.12851] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/1996] [Accepted: 08/16/1996] [Indexed: 02/03/2023] Open
Abstract
The crystal structure of the RNA dodecamer 5'-GGCC(GAAA)GGCC-3' has been determined from x-ray diffraction data to 2.3-A resolution. In the crystal, these oligomers form double helices around twofold symmetry axes. Four consecutive non-Watson-Crick base pairs make up an internal loop in the middle of the duplex, including sheared G.A pairs and novel asymmetric A.A pairs. This internal loop sequence produces a significant curvature and narrowing of the double helix. The helix is curved by 34 degrees from end to end and the diameter is narrowed by 24% in the internal loop. A Mn2+ ion is bound directly to the N7 of the first guanine in the Watson-Crick region following the internal loop and the phosphate of the preceding residue. This Mn2+ location corresponds to a metal binding site observed in the hammerhead catalytic RNA.
Collapse
|
38
|
Dubchak I, Muchnik I, Holbrook SR, Kim SH. Prediction of protein folding class using global description of amino acid sequence. Proc Natl Acad Sci U S A 1995; 92:8700-4. [PMID: 7568000 PMCID: PMC41034 DOI: 10.1073/pnas.92.19.8700] [Citation(s) in RCA: 348] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
We present a method for predicting protein folding class based on global protein chain description and a voting process. Selection of the best descriptors was achieved by a computer-simulated neural network trained on a data base consisting of 83 folding classes. Protein-chain descriptors include overall composition, transition, and distribution of amino acid attributes, such as relative hydrophobicity, predicted secondary structure, and predicted solvent exposure. Cross-validation testing was performed on 15 of the largest classes. The test shows that proteins were assigned to the correct class (correct positive prediction) with an average accuracy of 71.7%, whereas the inverse prediction of proteins as not belonging to a particular class (correct negative prediction) was 90-95% accurate. When tested on 254 structures used in this study, the top two predictions contained the correct class in 91% of the cases.
Collapse
|
39
|
Baeyens KJ, De Bondt HL, Holbrook SR. Structure of an RNA double helix including uracil-uracil base pairs in an internal loop. NATURE STRUCTURAL BIOLOGY 1995; 2:56-62. [PMID: 7719854 DOI: 10.1038/nsb0195-56] [Citation(s) in RCA: 108] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
The crystal structure of the RNA dodecamer 5'-GGACUUUGGUCC-3' has been determined from X-ray diffraction data to 2.6 A resolution. This oligomer forms an asymmetric double helix in the crystal. Four consecutive non-Watson-Crick base-pairs are formed in the middle of the duplex including the first intrahelical U-U (or T-T) pairs observed in an oligonucleotide crystal structure. Two different conformations of U-U pairs are observed in the context of the surrounding sequence. One of these pairs is highly twisted, allowing a bound water to bridge across strands in the major groove. The crystal packing illustrates a new form of RNA helix-helix interaction.
Collapse
|
40
|
Baeyens KJ, Jancarik J, Holbrook SR. Use of low-molecular-weight polyethylene glycol in the crystallization of RNA oligomers. ACTA CRYSTALLOGRAPHICA SECTION D: BIOLOGICAL CRYSTALLOGRAPHY 1994; 50:764-7. [PMID: 15299375 DOI: 10.1107/s0907444994003458] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have crystallized a variety of RNA oligonucleotides in a form suitable for X-ray diffraction studies using polyethylene glycol with a low-molecular-weight distribution (PEG 400) as the precipitant. Crystallization experiments on a set of 26 RNA oligomers ranging from eight to 12 nucleotides in length resulted in eight diffraction-quality crystals. Of these eight RNA crystals, six utilized PEG 400 as the precipitating agent. We have also been able to obtain large single crystals of a DNA-RNA hybrid, transfer RNA (two different conditions) and a catalytic RNA from PEG 400 solutions. These results suggest that PEG 400 may be a generally useful alternative to 2-methyl-2,4-pentanediol (MPD) which has, thus far, been the most successful precipitant for DNA oligomers.
Collapse
|
41
|
Holbrook SR, Dubchak I, Kim SH. PROBE: a computer program employing an integrated neural network approach to protein structure prediction. Biotechniques 1993; 14:984-9. [PMID: 8333967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
A computer program, PROBE, has been designed for the prediction of protein structural features from amino acid sequence. This program integrates a variety of computer-simulated neural networks, each predicting an aspect of protein structure, into a single, easy-to-use package. The surface accessibility of each residue, the presence of disulfide bonds, the overall secondary structure composition and the residue secondary structures, including beta-turn type, are predicted. In addition, the overall amino acid composition and relative hydrophobicity are used to determine whether a protein belongs to one of four common folding motifs. PROBE is able to compare and synergistically improve the predictions by allowing communication between the different networks.
Collapse
|
42
|
Abstract
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4 alpha-helical bundles, (2) parallel (alpha/beta)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class.
Collapse
|
43
|
Holbrook SR. Application of computational neural networks to the prediction of protein structural features. GENETIC ENGINEERING 1993; 15:1-19. [PMID: 7763836 DOI: 10.1007/978-1-4899-1666-2_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
44
|
Holbrook SR, Cheong C, Tinoco I, Kim SH. Crystal structure of an RNA double helix incorporating a track of non-Watson-Crick base pairs. Nature 1991; 353:579-81. [PMID: 1922368 DOI: 10.1038/353579a0] [Citation(s) in RCA: 255] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The crystal structure of the RNA dodecamer duplex (r-GGACUUCGGUCC)2 has been determined. The dodecamers stack end-to-end in the crystal, simulating infinite A-form helices with only a break in the phosphodiester chain. These infinite helices are held together in the crystal by hydrogen bonding between ribose hydroxyl groups and a variety of donors and acceptors. The four noncomplementary nucleotides in the middle of the sequence did not form an internal loop, but rather a highly regular double-helix incorporating the non-Watson-Crick base pairs, G.U and U.C. This is the first direct observation of a U.C (or T.C) base pair in a crystal structure. The U.C pairs each form only a single base-base hydrogen bond, but are stabilized by a water molecule which bridges between the ring nitrogens and by four waters in the major groove which link the bases and phosphates. The lack of distortion introduced in the double helix by the U.C mismatch may explain its low efficiency of repair in DNA. The G.U wobble pair is also stabilized by a minor-groove water which bridges between the unpaired guanine amino and the ribose hydroxyl of the uracil. This structure emphasizes the importance of specific hydrogen bonding between not only the nucleotide bases, but also the ribose hydroxyls, phosphate oxygens and tightly bound waters in stabilization of the intramolecular and intermolecular structures of double helical RNA.
Collapse
|
45
|
Tamura T, Holbrook SR, Kim SH. A Macintosh computer program for designing DNA sequences that code for specific peptides and proteins. Biotechniques 1991; 10:782-4. [PMID: 1878215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
A computer program (PINCERS) is described for use in the design of synthetic genes and mixed-probe DNA sequences. A protein sequence is reverse translated with generation of synonymous codons at each position producing a degenerate sequence. In order to locate potential restriction enzyme sites, the degenerate sequence is searched with a library of restriction enzymes for sites that utilize any combination of synonymous codons. These sites are indicated in a map so that they may be incorporated into the synthetic gene sequence. The program allows the user to select the appropriate codon usage table for the organism of interest and then to set a threshold usage frequency below which codons are not generated. PINCERS may also be used to assist in planning the synthesis of mixed-probe DNA sequences for cross-hybridization experiments. It can identify regions of specified length with the protein sequence that have the least overall degeneracy, thereby minimizing the number of probes to be synthesized and, therefore, maximizing the concentration of a given probe sequence.
Collapse
|
46
|
Muskal SM, Holbrook SR, Kim SH. Prediction of the disulfide-bonding state of cysteine in proteins. PROTEIN ENGINEERING 1990; 3:667-72. [PMID: 2217140 DOI: 10.1093/protein/3.8.667] [Citation(s) in RCA: 73] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The bonding states of cysteine play important functional and structural roles in proteins. In particular, disulfide bond formation is one of the most important factors influencing the three-dimensional fold of proteins. Proteins of known structure were used to teach computer-simulated neural networks rules for predicting the disulfide-bonding state of a cysteine given only its flanking amino acid sequence. Resulting networks make accurate predictions on sequences different from those used in training, suggesting that local sequence greatly influences cysteines in disulfide bond formation. The average prediction rate after seven independent network experiments is 81.4% for disulfide-bonded and 80.0% for non-disulfide-bonded scenarios. Predictive accuracy is related to the strength of network output activities. Network weights reveal interesting position-dependent amino acid preferences and provide a physical basis for understanding the correlation between the flanking sequence and a cysteine's disulfide-bonding state. Network predictions may be used to increase or decrease the stability of existing disulfide bonds or to aid the search for potential sites to introduce new disulfide bonds.
Collapse
|
47
|
Holbrook SR, Muskal SM, Kim SH. Predicting surface exposure of amino acids from protein sequence. PROTEIN ENGINEERING 1990; 3:659-65. [PMID: 2217139 DOI: 10.1093/protein/3.8.659] [Citation(s) in RCA: 76] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The amino acid residues on a protein surface play a key role in interaction with other molecules, determined many physical properties, and constrain the structure of the folded protein. A database of monomeric protein crystal structures was used to teach computer-simulated neural networks rules for predicting surface exposure from local sequence. These trained networks are able to correctly predict surface exposure for 72% of residues in a testing set using a binary model, (buried/exposed) and for 54% of residues using a ternary model (buried/intermediate/exposed). In the ternary model, only 11% of the exposed residues are predicted as buried and only 5% of the buried residues are predicted as exposed. Also, since the networks are able to predict exposure with a quantitative confidence estimate, it is possible to assign exposure for over half of the residues in a binary model with greater than 80% accuracy. Even more accurate predictions are obtained by making a consensus prediction of exposure for a homologous family. The effect of the local environment of an amino acid on its accessibility, though smaller than expected, is significant and accounts for the higher success rate of prediction than obtained with previously used criteria. In the absence of a three-dimensional structure, the ability to predict surface accessibility of amino acids directly from the sequence is a valuable tool in choosing sites of chemical modification or specific mutations and in studies of molecular interaction.
Collapse
|
48
|
Holbrook SR, Kim SH. Molecular model of the G protein alpha subunit based on the crystal structure of the HRAS protein. Proc Natl Acad Sci U S A 1989; 86:1751-5. [PMID: 2494654 PMCID: PMC286782 DOI: 10.1073/pnas.86.6.1751] [Citation(s) in RCA: 67] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
A structural model of guanine nucleotide-binding regulatory protein alpha subunits (G alpha subunits) is proposed based on the crystal structure of the catalytic domain of the human HRAS protein (p21ras). Because of low overall sequence similarity, structural and functional constraints were used to align the G alpha consensus sequence with that of p21ras. The resulting G alpha model specifies the spatial relationship among the guanine nucleotide-binding site, the binding site of the beta gamma subunit complex, likely regions of effector and receptor interaction, and sites of cholera and pertussis toxin modification. The locations in the model of the experimentally determined sites of proteolytic digestion, point mutation, monoclonal antibody binding, and toxin modification are consistent with and help explain the observed biological activity. Two important findings from our model are (i) the orientation of the G alpha model with respect to the membrane and (ii) the identification of the spatial proximity of the N- and C-terminal regions. Furthermore, by analogy to p21ras, the model assigns specific residues in G alpha required for binding the guanosine (G-box) and phosphates (PO4-box) and identifies residues potentially involved in the conformational switch mechanism (S-box). Specification of these critical regions in the G alpha model suggests guidelines for construction of mutants and chimeric proteins to experimentally test structural and functional hypotheses.
Collapse
|
49
|
Holbrook SR, Wang AH, Rich A, Kim SH. Local mobility of nucleic acids as determined from crystallographic data. III. A daunomycin-DNA complex. J Mol Biol 1988; 199:349-57. [PMID: 3351928 DOI: 10.1016/0022-2836(88)90318-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
The local mobility of the complex between the anti-tumor drug daunomycin and a DNA hexanucleotide duplex of sequence d(CpGpTpApCpG)2 has been determined by anisotropic refinement of single crystal X-ray diffraction data of 1.2 A resolution (1 A = 0.1 nm). The directions and amplitudes of the local motion indicate that changes in mobility of DNA due to daunomycin binding are primarily limited to the residues forming the intercalation site and do not propagate to the neighboring residues. The intercalated daunomycin ring system (aglycone) is rigidly fixed in the base stack, apparently serving as an anchor for the amino sugar segment of the drug which is one of the most mobile regions of the entire complex. The high flexibility of this amino sugar may be important for inhibition of replication and transcription not only by sterically blocking the minor groove, but also by allowing nonproductive interactions to be formed with various polymerases or other DNA-binding proteins. The crystallographic model is improved sufficiently by the rigid group anisotropic refinement to allow additional bound water molecules to be located.
Collapse
|
50
|
Husain I, Sancar GB, Holbrook SR, Sancar A. Mechanism of damage recognition by Escherichia coli DNA photolyase. J Biol Chem 1987; 262:13188-97. [PMID: 3308872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Escherichia coli DNA photolyase binds to DNA containing pyrimidine dimers with high affinity and then breaks the cyclobutane ring joining the two pyrimidines of the dimer in a light- (300-500 nm) dependent reaction. In order to determine the structural features important for this level of specificity, we have constructed a 43 base pair (bp) long DNA substrate that contains a thymine dimer at a unique location and studied its interaction with photolyase. We find that the enzyme protects a 12-16-bp region around the dimer from DNase I digestion and only a 6-bp region from methidium propyl-EDTA-Fe (II) digestion. Chemical footprinting experiments reveal that photolyase contacts the phosphodiester bond immediately 5' and the 3 phosphodiester bonds immediately 3' to the dimer but not the phosphodiester bond between the two thymines that make up the dimer. Methylation protection and interference experiments indicate that the enzyme makes major groove contacts with the first base 5' and the second base 3' to the dimer. These data are consistent with photolyase binding in the major groove over a 4-6-bp region. However, major groove contacts cannot be of major significance in substrate recognition as the enzyme binds equally well to a thymine dimer in a 44-base long single strand DNA and protects a 10-nucleotide long region around the dimer from DNase I digestion. It is therefore concluded that the unique configuration of the phosphodiester backbone in the strand containing the pyrimidine dimer, as well as the cyclobutane ring of the dimer itself are the important structural determinants of the substrate for recognition by photolyase.
Collapse
|