51
|
Abstract
Buried within the genomes of many microorganisms are genetic elements that encode rare-cutting homing endonucleases that assist in the mobility of the elements that encode them, such as the self-splicing group I and II introns and in some cases inteins. There are several different families of homing endonucleases and their ability to initiate and target specific sequences for lateral transfers makes them attractive reagents for gene targeting. Homing endonucleases have been applied in promoting DNA modification or genome editing such as gene repair or "gene knockouts". This review examines the categories of homing endonucleases that have been described so far and their possible applications to biotechnology. Strategies to engineer homing endonucleases to alter target site specificities will also be addressed. Alternatives to homing endonucleases such as zinc finger nucleases, transcription activator-like effector nucleases, triplex forming oligonucleotide nucleases, and targetrons are also briefly discussed.
Collapse
Affiliation(s)
- Mohamed Hafez
- Department of Microbiology, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | | |
Collapse
|
52
|
Sun N, Abil Z, Zhao H. Recent advances in targeted genome engineering in mammalian systems. Biotechnol J 2012; 7:1074-87. [PMID: 22777886 DOI: 10.1002/biot.201200038] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 05/22/2012] [Accepted: 06/15/2012] [Indexed: 12/21/2022]
Abstract
Targeted genome engineering enables researchers to disrupt, insert, or replace a genomic sequence precisely at a predetermined locus. One well-established technology to edit a mammalian genome is known as gene targeting, which is based on the homologous recombination (HR) mechanism. However, the low HR frequency in mammalian cells (except for mice) prevents its wide application. To address this limitation, a custom-designed nuclease is used to introduce a site-specific DNA double-strand break (DSB) on the chromosome and the subsequent repair of the DSB by the HR mechanism or the non-homologous end joining mechanism results in efficient targeted genome modifications. Engineered homing endonucleases (also called meganucleases), zinc finger nucleases, and transcription activator-like effector nucleases represent the three major classes of custom-designed nucleases that have been successfully applied in many different organisms for targeted genome engineering. This article reviews the recent developments of these genome engineering tools and highlights a few representative applications in mammalian systems. Recent advances in gene delivery strategies of these custom-designed nucleases are also briefly discussed.
Collapse
Affiliation(s)
- Ning Sun
- Department of Biochemistry, University of Illinois at Urbana-Champaign, 61801, USA
| | | | | |
Collapse
|
53
|
Abstract
Targeted manipulation of complex genomes often requires the introduction of a double-strand break at defined locations by site-specific DNA endonucleases. Here, we describe a monomeric nuclease domain derived from GIY-YIG homing endonucleases for genome-editing applications. Fusion of the GIY-YIG nuclease domain to three-member zinc-finger DNA binding domains generated chimeric GIY-zinc finger endonucleases (GIY-ZFEs). Significantly, the I-TevI-derived fusions (Tev-ZFEs) function in vitro as monomers to introduce a double-strand break, and discriminate in vitro and in bacterial and yeast assays against substrates lacking a preferred 5'-CNNNG-3' cleavage motif. The Tev-ZFEs function to induce recombination in a yeast-based assay with activity on par with a homodimeric Zif268 zinc-finger nuclease. We also fused the I-TevI nuclease domain to a catalytically inactive LADGLIDADG homing endonuclease (LHE) scaffold. The monomeric Tev-LHEs are active in vivo and similarly discriminate against substrates lacking the 5'-CNNNG-3' motif. The monomeric Tev-ZFEs and Tev-LHEs are distinct from the FokI-derived zinc-finger nuclease and TAL effector nuclease platforms as the GIY-YIG domain alleviates the requirement to design two nuclease fusions to target a given sequence, highlighting the diversity of nuclease domains with distinctive biochemical properties suitable for genome-editing applications.
Collapse
|
54
|
Abstract
Many devastating human diseases are caused by mutations in a single gene that prevent a somatic cell from carrying out its essential functions, or by genetic changes acquired as a result of infectious disease or in the course of cell transformation. Targeted gene therapies have emerged as potential strategies for treatment of such diseases. These therapies depend upon rare-cutting endonucleases to cleave at specific sites in or near disease genes. Targeted gene correction provides a template for homology-directed repair, enabling the cell's own repair pathways to erase the mutation and replace it with the correct sequence. Targeted gene disruption ablates the disease gene, disabling its function. Gene targeting can also promote other kinds of genome engineering, including mutation, insertion, or gene deletion. Targeted gene therapies present significant advantages compared to approaches to gene therapy that depend upon delivery of stably expressing transgenes. Recent progress has been fueled by advances in nuclease discovery and design, and by new strategies that maximize efficiency of targeting and minimize off-target damage. Future progress will build on deeper mechanistic understanding of critical factors and pathways.
Collapse
Affiliation(s)
- Olivier Humbert
- Departments of Immunology and Biochemistry, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | | | |
Collapse
|
55
|
Abstract
Summary: Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe—SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from >2600 organisms; ‘human’ being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. Availability:http://www.rostlab.org/services/snpdbe Contact:schaefer@rostlab.org; snpdbe@rostlab.org
Collapse
Affiliation(s)
- Christian Schaefer
- Technische Universitaet Muenchen, Bioinformatics - I12, Informatik, Boltzmannstrasse 3, Muenchen, Germany.
| | | | | | | |
Collapse
|
56
|
Jewison T, Knox C, Neveu V, Djoumbou Y, Guo AC, Lee J, Liu P, Mandal R, Krishnamurthy R, Sinelnikov I, Wilson M, Wishart DS. YMDB: the Yeast Metabolome Database. Nucleic Acids Res 2011; 40:D815-20. [PMID: 22064855 PMCID: PMC3245085 DOI: 10.1093/nar/gkr916] [Citation(s) in RCA: 133] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Yeast Metabolome Database (YMDB, http://www.ymdb.ca) is a richly annotated ‘metabolomic’ database containing detailed information about the metabolome of Saccharomyces cerevisiae. Modeled closely after the Human Metabolome Database, the YMDB contains >2000 metabolites with links to 995 different genes/proteins, including enzymes and transporters. The information in YMDB has been gathered from hundreds of books, journal articles and electronic databases. In addition to its comprehensive literature-derived data, the YMDB also contains an extensive collection of experimental intracellular and extracellular metabolite concentration data compiled from detailed Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) metabolomic analyses performed in our lab. This is further supplemented with thousands of NMR and MS spectra collected on pure, reference yeast metabolites. Each metabolite entry in the YMDB contains an average of 80 separate data fields including comprehensive compound description, names and synonyms, structural information, physico-chemical data, reference NMR and MS spectra, intracellular/extracellular concentrations, growth conditions and substrates, pathway information, enzyme data, gene/protein sequence data, as well as numerous hyperlinks to images, references and other public databases. Extensive searching, relational querying and data browsing tools are also provided that support text, chemical structure, spectral, molecular weight and gene/protein sequence queries. Because of S. cervesiae's importance as a model organism for biologists and as a biofactory for industry, we believe this kind of database could have considerable appeal not only to metabolomics researchers, but also to yeast biologists, systems biologists, the industrial fermentation industry, as well as the beer, wine and spirit industry.
Collapse
Affiliation(s)
- Timothy Jewison
- Department of Computing Science, Department of Biological Sciences, University of Alberta, Edmonton, AB T6G 2E8, Canada
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Kinjo AR, Suzuki H, Yamashita R, Ikegawa Y, Kudou T, Igarashi R, Kengaku Y, Cho H, Standley DM, Nakagawa A, Nakamura H. Protein Data Bank Japan (PDBj): maintaining a structural data archive and resource description framework format. Nucleic Acids Res 2011; 40:D453-60. [PMID: 21976737 PMCID: PMC3245181 DOI: 10.1093/nar/gkr811] [Citation(s) in RCA: 97] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
The Protein Data Bank Japan (PDBj, http://pdbj.org) is a member of the worldwide Protein Data Bank (wwPDB) and accepts and processes the deposited data of experimentally determined macromolecular structures. While maintaining the archive in collaboration with other wwPDB partners, PDBj also provides a wide range of services and tools for analyzing structures and functions of proteins, which are summarized in this article. To enhance the interoperability of the PDB data, we have recently developed PDB/RDF, PDB data in the Resource Description Framework (RDF) format, along with its ontology in the Web Ontology Language (OWL) based on the PDB mmCIF Exchange Dictionary. Being in the standard format for the Semantic Web, the PDB/RDF data provide a means to integrate the PDB with other biological information resources.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research and Immunology Frontier Research Center, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Tapping natural reservoirs of homing endonucleases for targeted gene modification. Proc Natl Acad Sci U S A 2011; 108:13077-82. [PMID: 21784983 DOI: 10.1073/pnas.1107719108] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Homing endonucleases mobilize their own genes by generating double-strand breaks at individual target sites within potential host DNA. Because of their high specificity, these proteins are used for "genome editing" in higher eukaryotes. However, alteration of homing endonuclease specificity is quite challenging. Here we describe the identification and phylogenetic analysis of over 200 naturally occurring LAGLIDADG homing endonucleases (LHEs). Biochemical and structural characterization of endonucleases from one clade within the phylogenetic tree demonstrates strong conservation of protein structure contrasted against highly diverged DNA target sites and indicates that a significant fraction of these proteins are sufficiently stable and active to serve as engineering scaffolds. This information was exploited to create a targeting enzyme to disrupt the endogenous monoamine oxidase B gene in human cells. The ubiquitous presence and diversity of LHEs described in this study may facilitate the creation of many tailored nucleases for genome editing.
Collapse
|
59
|
Sjölander K, Datta RS, Shen Y, Shoffner GM. Ortholog identification in the presence of domain architecture rearrangement. Brief Bioinform 2011; 12:413-22. [PMID: 21712343 PMCID: PMC3178056 DOI: 10.1093/bib/bbr036] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area.
Collapse
Affiliation(s)
- Kimmen Sjölander
- 308C Stanley Hall #1762, Department of Bioengineering, University of California, Berkeley, CA 94720, USA.
| | | | | | | |
Collapse
|
60
|
Barzel A, Privman E, Peeri M, Naor A, Shachar E, Burstein D, Lazary R, Gophna U, Pupko T, Kupiec M. Native homing endonucleases can target conserved genes in humans and in animal models. Nucleic Acids Res 2011; 39:6646-59. [PMID: 21525128 PMCID: PMC3159444 DOI: 10.1093/nar/gkr242] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
In recent years, both homing endonucleases (HEases) and zinc-finger nucleases (ZFNs) have been engineered and selected for the targeting of desired human loci for gene therapy. However, enzyme engineering is lengthy and expensive and the off-target effect of the manufactured endonucleases is difficult to predict. Moreover, enzymes selected to cleave a human DNA locus may not cleave the homologous locus in the genome of animal models because of sequence divergence, thus hampering attempts to assess the in vivo efficacy and safety of any engineered enzyme prior to its application in human trials. Here, we show that naturally occurring HEases can be found, that cleave desirable human targets. Some of these enzymes are also shown to cleave the homologous sequence in the genome of animal models. In addition, the distribution of off-target effects may be more predictable for native HEases. Based on our experimental observations, we present the HomeBase algorithm, database and web server that allow a high-throughput computational search and assignment of HEases for the targeting of specific loci in the human and other genomes. We validate experimentally the predicted target specificity of candidate fungal, bacterial and archaeal HEases using cell free, yeast and archaeal assays.
Collapse
Affiliation(s)
- Adi Barzel
- Department of Molecular Microbiology and Biotechnology, Tel Aviv University, Ramat Aviv 69978, Israel.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Cline MS, Karchin R. Using bioinformatics to predict the functional impact of SNVs. Bioinformatics 2011; 27:441-8. [PMID: 21159622 PMCID: PMC3105482 DOI: 10.1093/bioinformatics/btq695] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 11/21/2010] [Accepted: 12/12/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The past decade has seen the introduction of fast and relatively inexpensive methods to detect genetic variation across the genome and exponential growth in the number of known single nucleotide variants (SNVs). There is increasing interest in bioinformatics approaches to identify variants that are functionally important from millions of candidate variants. Here, we describe the essential components of bioinformatics tools that predict functional SNVs. RESULTS Bioinformatics tools have great potential to identify functional SNVs, but the black box nature of many tools can be a pitfall for researchers. Understanding the underlying methods, assumptions and biases of these tools is essential to their intelligent application.
Collapse
Affiliation(s)
- Melissa S Cline
- Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, CA, USA
| | | |
Collapse
|
62
|
Ulge UY, Baker DA, Monnat RJ. Comprehensive computational design of mCreI homing endonuclease cleavage specificity for genome engineering. Nucleic Acids Res 2011; 39:4330-9. [PMID: 21288879 PMCID: PMC3105429 DOI: 10.1093/nar/gkr022] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Homing endonucleases (HEs) cleave long (∼ 20 bp) DNA target sites with high site specificity to catalyze the lateral transfer of parasitic DNA elements. In order to determine whether comprehensive computational design could be used as a general strategy to engineer new HE target site specificities, we used RosettaDesign (RD) to generate 3200 different variants of the mCreI LAGLIDADG HE towards 16 different base pair positions in the 22 bp mCreI target site. Experimental verification of a range of these designs demonstrated that over 2/3 (24 of 35 designs, 69%) had the intended new site specificity, and that 14 of the 15 attempted specificity shifts (93%) were achieved. These results demonstrate the feasibility of using structure-based computational design to engineer HE variants with novel target site specificities to facilitate genome engineering.
Collapse
Affiliation(s)
- Umut Y Ulge
- Department of Biochemistry, Howard Hughes Medical InstituteUniversity of Washington, Box 357705, Seattle, WA 98195, USA
| | | | | |
Collapse
|
63
|
Abstract
Herpes simplex virus type 1 (HSV1) is a major health problem. As for most viral diseases, current antiviral treatments are based on the inhibition of viral replication once it has already started. As a consequence, they impair neither the viral cycle at its early stages nor the latent form of the virus, and thus cannot be considered as real preventive treatments. Latent HSV1 virus could be addressed by rare cutting endonucleases, such as meganucleases. With the aim of a proof of concept study, we generated several meganucleases recognizing HSV1 sequences, and assessed their antiviral activity in cultured cells. We demonstrate that expression of these proteins in African green monkey kidney fibroblast (COS-7) and BSR cells inhibits infection by HSV1, at low and moderate multiplicities of infection (MOIs), inducing a significant reduction of the viral load. Furthermore, the remaining viral genomes display a high rate of mutation (up to 16%) at the meganuclease cleavage site, consistent with a mechanism of action based on the cleavage of the viral genome. This specific mechanism of action qualifies meganucleases as an alternative class of antiviral agent, with the potential to address replicative as well as latent DNA viral forms.
Collapse
|
64
|
Turinsky AL, Razick S, Turner B, Donaldson IM, Wodak SJ. Literature curation of protein interactions: measuring agreement across major public databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq026. [PMID: 21183497 PMCID: PMC3011985 DOI: 10.1093/database/baq026] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Literature curation of protein interaction data faces a number of challenges. Although curators increasingly adhere to standard data representations, the data that various databases actually record from the same published information may differ significantly. Some of the reasons underlying these differences are well known, but their global impact on the interactions collectively curated by major public databases has not been evaluated. Here we quantify the agreement between curated interactions from 15 471 publications shared across nine major public databases. Results show that on average, two databases fully agree on 42% of the interactions and 62% of the proteins curated from the same publication. Furthermore, a sizable fraction of the measured differences can be attributed to divergent assignments of organism or splice isoforms, different organism focus and alternative representations of multi-protein complexes. Our findings highlight the impact of divergent curation policies across databases, and should be relevant to both curators and data consumers interested in analyzing protein-interaction data generated by the scientific community. Database URL:http://wodaklab.org/iRefWeb
Collapse
Affiliation(s)
- Andrei L Turinsky
- Molecular Structure and Function Program, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario, Canada
| | | | | | | | | |
Collapse
|
65
|
Pieper U, Webb BM, Barkan DT, Schneidman-Duhovny D, Schlessinger A, Braberg H, Yang Z, Meng EC, Pettersen EF, Huang CC, Datta RS, Sampathkumar P, Madhusudhan MS, Sjölander K, Ferrin TE, Burley SK, Sali A. ModBase, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 2010; 39:D465-74. [PMID: 21097780 PMCID: PMC3013688 DOI: 10.1093/nar/gkq1091] [Citation(s) in RCA: 240] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence–structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains 10 355 444 reliable models for domains in 2 421 920 unique protein sequences. ModBase allows users to update comparative models on demand, and request modeling of additional sequences through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are available through the ModBase interface as well as the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the SALIGN server for multiple sequence and structure alignment (http://salilab.org/salign), the ModEval server for predicting the accuracy of protein structure models (http://salilab.org/modeval), the PCSS server for predicting which peptides bind to a given protein (http://salilab.org/pcss) and the FoXS server for calculating and fitting Small Angle X-ray Scattering profiles (http://salilab.org/foxs).
Collapse
Affiliation(s)
- Ursula Pieper
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, University of California at San Francisco, CA 94158, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
66
|
Lourenco A, Carneiro S, Rocha M, Ferreira EC, Rocha I. Challenges in integrating Escherichia coli molecular biology data. Brief Bioinform 2010; 12:91-103. [DOI: 10.1093/bib/bbq067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
67
|
Evolution of I-SceI homing endonucleases with increased DNA recognition site specificity. J Mol Biol 2010; 405:185-200. [PMID: 21029741 DOI: 10.1016/j.jmb.2010.10.029] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2010] [Revised: 10/14/2010] [Accepted: 10/18/2010] [Indexed: 12/22/2022]
Abstract
Elucidating how homing endonucleases undergo changes in recognition site specificity will facilitate efforts to engineer proteins for gene therapy applications. I-SceI is a monomeric homing endonuclease that recognizes and cleaves within an 18-bp target. It tolerates limited degeneracy in its target sequence, including substitution of a C:G(+4) base pair for the wild-type A:T(+4) base pair. Libraries encoding randomized amino acids at I-SceI residue positions that contact or are proximal to A:T(+4) were used in conjunction with a bacterial one-hybrid system to select I-SceI derivatives that bind to recognition sites containing either the A:T(+4) or the C:G(+4) base pairs. As expected, isolates encoding wild-type residues at the randomized positions were selected using either target sequence. All I-SceI proteins isolated using the C:G(+4) recognition site included small side-chain substitutions at G100 and either contained (K86R/G100T, K86R/G100S and K86R/G100C) or lacked (G100A, G100T) a K86R substitution. Interestingly, the binding affinities of the selected variants for the wild-type A:T(+4) target are 4- to 11-fold lower than that of wild-type I-SceI, whereas those for the C:G(+4) target are similar. The increased specificity of the mutant proteins is also evident in binding experiments in vivo. These differences in binding affinities account for the observed ∼36-fold difference in target preference between the K86R/G100T and wild-type proteins in DNA cleavage assays. An X-ray crystal structure of the K86R/G100T mutant protein bound to a DNA duplex containing the C:G(+4) substitution suggests how sequence specificity of a homing enzyme can increase. This biochemical and structural analysis defines one pathway by which site specificity is augmented for a homing endonuclease.
Collapse
|
68
|
Tsirigos KD, Bagos PG, Hamodrakas SJ. OMPdb: a database of {beta}-barrel outer membrane proteins from Gram-negative bacteria. Nucleic Acids Res 2010; 39:D324-31. [PMID: 20952406 PMCID: PMC3013764 DOI: 10.1093/nar/gkq863] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We describe here OMPdb, which is currently the most complete and comprehensive collection of integral β-barrel outer membrane proteins from Gram-negative bacteria. The database currently contains 69,354 proteins, which are classified into 85 families, based mainly on structural and functional criteria. Although OMPdb follows the annotation scheme of Pfam, many of the families included in the database were not previously described or annotated in other publicly available databases. There are also cross-references to other databases, references to the literature and annotation for sequence features, like transmembrane segments and signal peptides. Furthermore, via the web interface, the user can not only browse the available data, but submit advanced text searches and run BLAST queries against the database protein sequences or domain searches against the collection of profile Hidden Markov Models that represent each family's domain organization as well. The database is freely accessible for academic users at http://bioinformatics.biol.uoa.gr/OMPdb and we expect it to be useful for genome-wide analyses, comparative genomics as well as for providing training and test sets for predictive algorithms regarding transmembrane β-barrels.
Collapse
Affiliation(s)
- Konstantinos D Tsirigos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 15701, Greece
| | | | | |
Collapse
|
69
|
Kinjo AR, Yamashita R, Nakamura H. PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq021. [PMID: 20798081 PMCID: PMC2997606 DOI: 10.1093/database/baq021] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
This article is a tutorial for PDBj Mine, a new database and its interface for Protein Data Bank Japan (PDBj). In PDBj Mine, data are loaded from files in the PDBMLplus format (an extension of PDBML, PDB's canonical XML format, enriched with annotations), which are then served for the user of PDBj via the worldwide web (WWW). We describe the basic design of the relational database (RDB) and web interfaces of PDBj Mine. The contents of PDBMLplus files are first broken into XPath entities, and these paths and data are indexed in the way that reflects the hierarchical structure of the XML files. The data for each XPath type are saved into the corresponding relational table that is named as the XPath itself. The generation of table definitions from the PDBMLplus XML schema is fully automated. For efficient search, frequently queried terms are compiled into a brief summary table. Casual users can perform simple keyword search, and 'Advanced Search' which can specify various conditions on the entries. More experienced users can query the database using SQL statements which can be constructed in a uniform manner. Thus, PDBj Mine achieves a combination of the flexibility of XML documents and the robustness of the RDB. Database URL: http://www.pdbj.org/
Collapse
Affiliation(s)
- Akira R Kinjo
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan.
| | | | | |
Collapse
|
70
|
Guo J, Gaj T, Barbas CF. Directed evolution of an enhanced and highly efficient FokI cleavage domain for zinc finger nucleases. J Mol Biol 2010; 400:96-107. [PMID: 20447404 PMCID: PMC2885538 DOI: 10.1016/j.jmb.2010.04.060] [Citation(s) in RCA: 156] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2010] [Revised: 04/27/2010] [Accepted: 04/28/2010] [Indexed: 10/19/2022]
Abstract
Zinc finger nucleases (ZFNs) are powerful tools for gene therapy and genetic engineering. The high specificity and affinity of these chimeric enzymes are based on custom-designed zinc finger proteins (ZFPs). To improve the performance of existing ZFN technology, we developed an in vivo evolution-based approach to improve the efficacy of the FokI cleavage domain (FCD). After multiple rounds of cycling mutagenesis and DNA shuffling, a more efficient nuclease variant (Sharkey) was generated. In vivo analyses indicated that Sharkey is >15-fold more active than wild-type FCD on a diverse panel of cleavage sites. Further, a mammalian cell-based assay showed a three to sixfold improvement in targeted mutagenesis for ZFNs containing derivatives of the Sharkey cleavage domain. We also identified mutations that impart sequence specificity to the FCD that might be utilized in future studies to further refine ZFNs through cooperative specificity. In addition, Sharkey was observed to enhance the cleavage profiles of previously published and newly selected heterodimer ZFN architectures. This enhanced and highly efficient cleavage domain will aid in a variety of ZFN applications in medicine and biology.
Collapse
Affiliation(s)
- Jing Guo
- The Skaggs Institute for Chemical Biology and the Departments of Molecular Biology and Chemistry, The Scripps Research Institute, La Jolla, California, USA
| | - Thomas Gaj
- The Skaggs Institute for Chemical Biology and the Departments of Molecular Biology and Chemistry, The Scripps Research Institute, La Jolla, California, USA
| | - Carlos F. Barbas
- The Skaggs Institute for Chemical Biology and the Departments of Molecular Biology and Chemistry, The Scripps Research Institute, La Jolla, California, USA
| |
Collapse
|
71
|
Guo J, Gaj T, Barbas CF. Directed evolution of an enhanced and highly efficient FokI cleavage domain for zinc finger nucleases. J Mol Biol 2010. [PMID: 20447404 DOI: 10.1016/s13007-018-0305-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Zinc finger nucleases (ZFNs) are powerful tools for gene therapy and genetic engineering. The high specificity and affinity of these chimeric enzymes are based on custom-designed zinc finger proteins (ZFPs). To improve the performance of existing ZFN technology, we developed an in vivo evolution-based approach to improve the efficacy of the FokI cleavage domain (FCD). After multiple rounds of cycling mutagenesis and DNA shuffling, a more efficient nuclease variant (Sharkey) was generated. In vivo analyses indicated that Sharkey is >15-fold more active than wild-type FCD on a diverse panel of cleavage sites. Further, a mammalian cell-based assay showed a three to sixfold improvement in targeted mutagenesis for ZFNs containing derivatives of the Sharkey cleavage domain. We also identified mutations that impart sequence specificity to the FCD that might be utilized in future studies to further refine ZFNs through cooperative specificity. In addition, Sharkey was observed to enhance the cleavage profiles of previously published and newly selected heterodimer ZFN architectures. This enhanced and highly efficient cleavage domain will aid in a variety of ZFN applications in medicine and biology.
Collapse
Affiliation(s)
- Jing Guo
- The Skaggs Institute for Chemical Biology and Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | |
Collapse
|
72
|
Davey NE, Haslam NJ, Shields DC, Edwards RJ. SLiMFinder: a web server to find novel, significantly over-represented, short protein motifs. Nucleic Acids Res 2010; 38:W534-9. [PMID: 20497999 PMCID: PMC2896084 DOI: 10.1093/nar/gkq440] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein–protein interactions. The Short, Linear Motif Finder (SLiMFinder) web server is a de novo motif discovery tool that identifies statistically over-represented motifs in a set of protein sequences, accounting for the evolutionary relationships between them. Motifs are returned with an intuitive P-value that greatly reduces the problem of false positives and is accessible to biologists of all disciplines. Input can be uploaded by the user or extracted directly from UniProt. Numerous masking options give the user great control over the contextual information to be included in the analyses. The SLiMFinder server combines these with user-friendly output and visualizations of motif context to allow the user to quickly gain insight into the validity of a putatively functional motif. These visualizations include alignments of motif occurrences, alignments of motifs and their homologues and a visual schematic of the top-ranked motifs. Returned motifs can also be compared with known SLiMs from the literature using CompariMotif. All results are available for download. The SLiMFinder server is available at: http://bioware.ucd.ie/slimfinder.html.
Collapse
Affiliation(s)
- Norman E Davey
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, 69117 Heidelberg, Germany
| | | | | | | |
Collapse
|
73
|
Kleinstiver BP, Fernandes AD, Gloor GB, Edgell DR. A unified genetic, computational and experimental framework identifies functionally relevant residues of the homing endonuclease I-BmoI. Nucleic Acids Res 2010; 38:2411-27. [PMID: 20061372 PMCID: PMC2853131 DOI: 10.1093/nar/gkp1223] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2009] [Revised: 12/18/2009] [Accepted: 12/20/2009] [Indexed: 11/14/2022] Open
Abstract
Insight into protein structure and function is best obtained through a synthesis of experimental, structural and bioinformatic data. Here, we outline a framework that we call MUSE (mutual information, unigenic evolution and structure-guided elucidation), which facilitated the identification of previously unknown residues that are relevant for function of the GIY-YIG homing endonuclease I-BmoI. Our approach synthesizes three types of data: mutual information analyses that identify co-evolving residues within the GIY-YIG catalytic domain; a unigenic evolution strategy that identifies hyper- and hypo-mutable residues of I-BmoI; and interpretation of the unigenic and co-evolution data using a homology model. In particular, we identify novel positions within the GIY-YIG domain as functionally important. Proof-of-principle experiments implicate the non-conserved I71 as functionally relevant, with an I71N mutant accumulating a nicked cleavage intermediate. Moreover, many additional positions within the catalytic, linker and C-terminal domains of I-BmoI were implicated as important for function. Our results represent a platform on which to pursue future studies of I-BmoI and other GIY-YIG-containing proteins, and demonstrate that MUSE can successfully identify novel functionally critical residues that would be ignored in a traditional structure-function analysis within an extensively studied small domain of approximately 90 amino acids.
Collapse
Affiliation(s)
- Benjamin P. Kleinstiver
- Department of Biochemistry, Schulich School of Medicine & Dentistry and Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5C1, Canada
| | - Andrew D. Fernandes
- Department of Biochemistry, Schulich School of Medicine & Dentistry and Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5C1, Canada
| | - Gregory B. Gloor
- Department of Biochemistry, Schulich School of Medicine & Dentistry and Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5C1, Canada
| | - David R. Edgell
- Department of Biochemistry, Schulich School of Medicine & Dentistry and Department of Applied Mathematics, The University of Western Ontario, London, ON N6A 5C1, Canada
| |
Collapse
|
74
|
Takaki Y, Shimamura S, Nakagawa S, Fukuhara Y, Horikawa H, Ankai A, Harada T, Hosoyama A, Oguchi A, Fukui S, Fujita N, Takami H, Takai K. Bacterial lifestyle in a deep-sea hydrothermal vent chimney revealed by the genome sequence of the thermophilic bacterium Deferribacter desulfuricans SSM1. DNA Res 2010; 17:123-37. [PMID: 20189949 PMCID: PMC2885270 DOI: 10.1093/dnares/dsq005] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
The complete genome sequence of the thermophilic sulphur-reducing bacterium, Deferribacter desulfuricans SMM1, isolated from a hydrothermal vent chimney has been determined. The genome comprises a single circular chromosome of 2 234 389 bp and a megaplasmid of 308 544 bp. Many genes encoded in the genome are most similar to the genes of sulphur- or sulphate-reducing bacterial species within Deltaproteobacteria. The reconstructed central metabolisms showed a heterotrophic lifestyle primarily driven by C1 to C3 organics, e.g. formate, acetate, and pyruvate, and also suggested that the inability of autotrophy via a reductive tricarboxylic acid cycle may be due to the lack of ATP-dependent citrate lyase. In addition, the genome encodes numerous genes for chemoreceptors, chemotaxis-like systems, and signal transduction machineries. These signalling networks may be linked to this bacterium's versatile energy metabolisms and may provide ecophysiological advantages for D. desulfuricans SSM1 thriving in the physically and chemically fluctuating environments near hydrothermal vents. This is the first genome sequence from the phylum Deferribacteres.
Collapse
Affiliation(s)
- Yoshihiro Takaki
- Microbial Genome Research Group, Extremobiosphere Research Program, Institute of Biogeosciences, Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Kanagawa 237-0061, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
75
|
Galetto R, Duchateau P, Pâques F. Targeted approaches for gene therapy and the emergence of engineered meganucleases. Expert Opin Biol Ther 2009; 9:1289-303. [PMID: 19689185 DOI: 10.1517/14712590903213669] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
BACKGROUND In spite of significant advances in gene transfer strategies in the field of gene therapy, there is a strong emphasis on the development of alternative methods, providing better control of transgene expression and insertion patterns. OBJECTIVE Several new approaches consist of targeting a desired transgene or gene modification in a well defined locus, and we collectively refer to them as 'targeted approaches'. The use of redesigned meganucleases is one of these emerging technologies. Here we try to define the potential of this method, in the larger scope of targeted strategies. METHODS We survey the different types of targeted strategies, presenting the achievements and the potential applications, with a special emphasis on the use of redesigned endonucleases. CONCLUSION redesigned endonucleases represent one of the most promising tools for targeted approaches, and the opening of a clinical trial for AIDS patients has recently shown the maturity of these strategies. However, there is still a 'quest' for the best reagents, that is the endonucleases providing the best efficacy:toxicity ratio. New advances in protein design have allowed the engineering of new scaffolds, such as meganucleases, and the landscape of existing methods is likely to change over the next few years.
Collapse
Affiliation(s)
- Roman Galetto
- Cellectis Genome Surgery, 102 Avenue Gaston Roussel, 93 340 Romainville Cedex, France
| | | | | |
Collapse
|
76
|
Guerrero FD, Dowd SE, Djikeng A, Wiley G, Macmil S, Saldivar L, Najar F, Roe BA. A database of expressed genes from Cochliomyia hominivorax (Diptera: Calliphoridae). JOURNAL OF MEDICAL ENTOMOLOGY 2009; 46:1109-1116. [PMID: 19769042 DOI: 10.1603/033.046.0518] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
We used an expressed sequence tag and 454 pyrosequencing approach to initiate a study of the genome of the screwworm, Cochliomyia hominivorax (Coquerel) (Diptera: Calliphoridae). Two normalized cDNA libraries were constructed from RNA isolated from embryos and second instar larvae from the Panama 95 strain. Approximately 5,400 clones from each library were sequenced from both the 5' and 3' directions using the Sanger method. In addition, double-stranded cDNA was prepared from random-primed polyA RNA purified from embryos, second-instar larvae, adult males, and adult females. These four cDNA samples were used for 454 pyrosequencing that produced approximately 300,000 independent sequences. Sequences were assembled into a database of assembled contigs and singletons and used to search public protein databases and annotate the sequences. The full database consists of 6,076 contigs and 58,221 singletons assembled from both the traditional expressed sequence tag (EST) and 454 sequences. Annotation of the data led to the identification of several gene coding regions with possible roles in sex determination in the screwworm. This database will facilitate the design of microarray and other experiments to study screwworm gene expression on a larger scale than previously possible.
Collapse
Affiliation(s)
- F D Guerrero
- USDA-ARS, Knipling-Bushland U.S. Livestock Insects Research Laboratory; 2700 Fredericksburg Rd., Kerrville, TX 78028 , USA.
| | | | | | | | | | | | | | | |
Collapse
|
77
|
Encinar JA, Fernandez-Ballester G, Sánchez IE, Hurtado-Gomez E, Stricher F, Beltrao P, Serrano L. ADAN: a database for prediction of protein-protein interaction of modular domains mediated by linear motifs. Bioinformatics 2009; 25:2418-24. [PMID: 19602529 DOI: 10.1093/bioinformatics/btp424] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Most of the structures and functions of proteome globular domains are yet unknown. We can use high-resolution structures from different modular domains in combination with automatic protein design algorithms to predict genome-wide potential interactions of a protein. ADAN database and related web tools are online resources for the predictive analysis of ligand-domain complexes. ADAN database is a collection of different modular protein domains (SH2, SH3, PDZ, WW, etc.). It contains 3505 entries with extensive structural and functional information available, manually integrated, curated and annotated with cross-references to other databases, biochemical and thermodynamical data, simplified coordinate files, sequence files and alignments. Prediadan, a subset of ADAN database, offers position-specific scoring matrices for protein-protein interactions, calculated by FoldX, and predictions of optimum ligands and putative binding partners. Users can also scan a query sequence against selected matrices, or improve a ligand-domain interaction. AVAILABILITY ADAN is accessible at http://adan-embl.ibmc.umh.es/ or http://adan.crg.es/.
Collapse
Affiliation(s)
- J A Encinar
- Instituto de Biologia Molecular y Celular, Edificio Torregaitan, Universidad Miguel Hernandez, Elche, Alicante, Spain
| | | | | | | | | | | | | |
Collapse
|
78
|
Grizot S, Smith J, Daboussi F, Prieto J, Redondo P, Merino N, Villate M, Thomas S, Lemaire L, Montoya G, Blanco FJ, Pâques F, Duchateau P. Efficient targeting of a SCID gene by an engineered single-chain homing endonuclease. Nucleic Acids Res 2009; 37:5405-19. [PMID: 19584299 PMCID: PMC2760784 DOI: 10.1093/nar/gkp548] [Citation(s) in RCA: 122] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Sequence-specific endonucleases recognizing long target sequences are emerging as powerful tools for genome engineering. These endonucleases could be used to correct deleterious mutations or to inactivate viruses, in a new approach to molecular medicine. However, such applications are highly demanding in terms of safety. Mutations in the human RAG1 gene cause severe combined immunodeficiency (SCID). Using the I-CreI dimeric LAGLIDADG meganuclease as a scaffold, we describe here the engineering of a series of endonucleases cleaving the human RAG1 gene, including obligate heterodimers and single-chain molecules. We show that a novel single-chain design, in which two different monomers are linked to form a single molecule, can induce high levels of recombination while safeguarding more effectively against potential genotoxicity. We provide here the first demonstration that an engineered meganuclease can induce targeted recombination at an endogenous locus in up to 6% of transfected human cells. These properties rank this new generation of endonucleases among the best molecular scissors available for genome surgery strategies, potentially avoiding the deleterious effects of previous gene therapy approaches.
Collapse
Affiliation(s)
- Sylvestre Grizot
- Cellectis SA, Cellectis Genome Surgery, 93235 Romainville, France
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
79
|
Nam SH, Kim DW, Jung TS, Choi YS, Kim DW, Choi HS, Choi SH, Park HS. PESTAS: a web server for EST analysis and sequence mining. Bioinformatics 2009; 25:1846-8. [PMID: 19414531 DOI: 10.1093/bioinformatics/btp293] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
SUMMARY We have developed a web server for the high-throughput annotation of expressed sequence tags (ESTs) called pipeline for EST analysis service (PESTAS). PESTAS processes entire datasets with an automated pipeline of 13 analytic services, then deposits the data into the MySQL database and transforms it into three kinds of reports: preprocessing, assembling and annotation. All annotated information is provided to the scientist and can be downloaded through a web browser. To get more relevant functional annotation results, a curation function was introduced with which biologists can easily change the best-hit annotation information. We included a gene chip module that detects gene expression differences between libraries by comparing accession number counts from BLAST search results. PESTAS also provides access to the pathway information of KEGG, which is useful for mapping the relationships among networks of annotated enzymes, and is especially valuable for those researchers interested in biological pathways. AVAILABILITY PESTAS is available at http://pestas.kribb.re.kr/.
Collapse
Affiliation(s)
- Seong-Hyeuk Nam
- Industrial Biotechnology & Bioenergy Research Center, Korea Research Institute of Bioscience and Biotechnology, Daejeon 305-806, Korea
| | | | | | | | | | | | | | | |
Collapse
|
80
|
Lippow SM, Aha PM, Parker MH, Blake WJ, Baynes BM, Lipovsek D. Creation of a type IIS restriction endonuclease with a long recognition sequence. Nucleic Acids Res 2009; 37:3061-73. [PMID: 19304757 PMCID: PMC2685105 DOI: 10.1093/nar/gkp182] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2009] [Revised: 02/27/2009] [Accepted: 03/05/2009] [Indexed: 12/19/2022] Open
Abstract
Type IIS restriction endonucleases cleave DNA outside their recognition sequences, and are therefore particularly useful in the assembly of DNA from smaller fragments. A limitation of type IIS restriction endonucleases in assembly of long DNA sequences is the relative abundance of their target sites. To facilitate ligation-based assembly of extremely long pieces of DNA, we have engineered a new type IIS restriction endonuclease that combines the specificity of the homing endonuclease I-SceI with the type IIS cleavage pattern of FokI. We linked a non-cleaving mutant of I-SceI, which conveys to the chimeric enzyme its specificity for an 18-bp DNA sequence, to the catalytic domain of FokI, which cuts DNA at a defined site outside the target site. Whereas previously described chimeric endonucleases do not produce type IIS-like precise DNA overhangs suitable for ligation, our chimeric endonuclease cleaves double-stranded DNA exactly 2 and 6 nt from the target site to generate homogeneous, 5', four-base overhangs, which can be ligated with 90% fidelity. We anticipate that these enzymes will be particularly useful in manipulation of DNA fragments larger than a thousand bases, which are very likely to contain target sites for all natural type IIS restriction endonucleases.
Collapse
|
81
|
Guerrero FD, Dowd SE, Sun Y, Saldivar L, Wiley GB, Macmil SL, Najar F, Roe BA, Foil LD. Microarray analysis of female- and larval-specific gene expression in the horn fly (Diptera: Muscidae). JOURNAL OF MEDICAL ENTOMOLOGY 2009; 46:257-270. [PMID: 19351076 DOI: 10.1603/033.046.0210] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The horn fly, Haematobia irritans L., is an obligate blood-feeding parasite of cattle, and control of this pest is a continuing problem because the fly is becoming resistant to pesticides. Dominant conditional lethal gene systems are being studied as population control technologies against agricultural pests. One of the components of these systems is a female-specific gene promoter that drives expression of a lethality-inducing gene. To identify candidate genes to supply this promoter, microarrays were designed from a horn fly expressed sequence tag (EST) database and probed to identify female-specific and larval-specific gene expression. Analysis of dye swap experiments found 432 and 417 transcripts whose expression levels were higher or lower in adult female flies, respectively, compared with adult male flies. Additionally, 419 and 871 transcripts were identified whose expression levels were higher or lower in first-instar larvae compared with adult flies, respectively. Three transcripts were expressed more highly in adult females flies compared with adult males and also higher in the first-instar larval lifestage compared with adult flies. One of these transcripts, a putative nanos ortholog, has a high female-to-male expression ratio, a moderate expression level in first-instar larvae, and has been well characterized in Drosophila. melanogaster (Meigen). In conclusion, we used microarray technology, verified by reverse transcriptase-polymerase chain reaction and massively parallel pyrosequencing, to study life stage- and sex-specific gene expression in the horn fly and identified three gene candidates for detailed evaluation as a gene promoter source for the development of a female-specific conditional lethality system.
Collapse
Affiliation(s)
- Felix D Guerrero
- USDA-ARS Knipling-Bushland U.S. Livestock Insects Research Laboratory, 2700 Fredericksburg Rd., Kerrville, TX 78028, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
82
|
Chen Z, Wen F, Sun N, Zhao H. Directed evolution of homing endonuclease I-SceI with altered sequence specificity. Protein Eng Des Sel 2009; 22:249-56. [PMID: 19176595 DOI: 10.1093/protein/gzp001] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Homing endonucleases recognize specific long DNA sequences and catalyze double-stranded breaks that significantly stimulate homologous recombination, representing an attractive tool for genome targeting and editing. We previously described a two-plasmid selection system that couples enzymatic DNA cleavage with the survival of host cells, and enables directed evolution of homing endonucleases with altered cleavage sequence specificity. Using this selection system, we successfully evolved mutant I-SceI homing endonucleases with greatly increased cleavage activity towards a new target DNA sequence that differs from the wild-type cleavage sequence by 4 bp. The most highly evolved mutant showed a survival rate approximately 100-fold higher than that of wild-type I-SceI enzyme. The degree of selectivity displayed by a mutant isolated from one round of saturation mutagenesis for the new target sequence is comparable to that of wild-type I-SceI for the natural sequence. These results highlight the ability and efficiency of our selection system for engineering homing endonucleases with novel DNA cleavage specificities. The mutant identified from this study can potentially be used in vivo for targeting the new cleavage sequence within genomic DNA.
Collapse
Affiliation(s)
- Zhilei Chen
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | | | | | |
Collapse
|
83
|
Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T. The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 2009; 37:D387-92. [PMID: 18931379 PMCID: PMC2686475 DOI: 10.1093/nar/gkn750] [Citation(s) in RCA: 1570] [Impact Index Per Article: 104.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 10/05/2008] [Indexed: 12/12/2022] Open
Abstract
SWISS-MODEL Repository (http://swissmodel.expasy.org/repository/) is a database of 3D protein structure models generated by the SWISS-MODEL homology-modelling pipeline. The aim of the SWISS-MODEL Repository is to provide access to an up-to-date collection of annotated 3D protein models generated by automated homology modelling for all sequences in Swiss-Prot and for relevant models organisms. Regular updates ensure that target coverage is complete, that models are built using the most recent sequence and template structure databases, and that improvements in the underlying modelling pipeline are fully utilised. As of September 2008, the database contains 3.4 million entries for 2.7 million different protein sequences from the UniProt database. SWISS-MODEL Repository allows the users to assess the quality of the models in the database, search for alternative template structures, and to build models interactively via SWISS-MODEL Workspace (http://swissmodel.expasy.org/workspace/). Annotation of models with functional information and cross-linking with other databases such as the Protein Model Portal (http://www.proteinmodelportal.org) of the PSI Structural Genomics Knowledge Base facilitates the navigation between protein sequence and structure resources.
Collapse
Affiliation(s)
- Florian Kiefer
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Konstantin Arnold
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Michael Künzli
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Lorenza Bordoli
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
84
|
Pieper U, Eswar N, Webb BM, Eramian D, Kelly L, Barkan DT, Carter H, Mankoo P, Karchin R, Marti-Renom MA, Davis FP, Sali A. MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2009; 37:D347-54. [PMID: 18948282 PMCID: PMC2686492 DOI: 10.1093/nar/gkn791] [Citation(s) in RCA: 132] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2008] [Accepted: 10/08/2008] [Indexed: 11/14/2022] Open
Abstract
MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/).
Collapse
Affiliation(s)
- Ursula Pieper
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Narayanan Eswar
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Ben M. Webb
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - David Eramian
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Libusha Kelly
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - David T. Barkan
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Hannah Carter
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Parminder Mankoo
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Rachel Karchin
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Marc A. Marti-Renom
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Fred P. Davis
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences, Byers Hall at Mission Bay, Office 503B, University of California at San Francisco, 1700 4th Street, San Francisco, CA 94158, Graduate Group in Biophysics, Graduate Group in Bioinformatics, University of California at San Francisco, CA, Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA, Structural Genomics Unit, Bioinformatics & Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Avda. Autopista del Saler 16, Valencia 46012, Spain and Howard Hughes Medical Institute, Janelia Farm, 19700 Helix Drive, Ashburn, VA 20147, USA
| |
Collapse
|
85
|
Chelala C, Khan A, Lemoine NR. SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms. Bioinformatics 2008; 25:655-61. [PMID: 19098027 PMCID: PMC2647830 DOI: 10.1093/bioinformatics/btn653] [Citation(s) in RCA: 144] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation: Design a new computational tool allowing scientists to functionally annotate newly discovered and public domain single nucleotide polymorphisms in order to help in prioritizing targets in further disease studies and large-scale genotyping projects. Summary: SNPnexus database provides functional annotation for both novel and public SNPs. Possible effects on the transcriptome and proteome levels are characterized and reported from five major annotation systems providing the most extensive information on alternative splicing. Additional information on HapMap genotype and allele frequency, overlaps with potential regulatory elements or structural variations as well as related genetic diseases can be also retrieved. The SNPnexus database has a user-friendly web interface, providing single or batch query options using SNP identifiers from dbSNP as well as genomic location on clones, contigs or chromosomes. Therefore, SNPnexus is the only database currently providing a complete set of functional annotations of SNPs in public databases and newly detected from sequencing projects. Hence, we describe SNPnexus, provide details of the query options, the annotation categories as well as biological examples of use. Availability: The SNPnexus database is freely available at http://www.snp-nexus.org. Contact:claude.chelala@cancer.org.uk
Collapse
Affiliation(s)
- Claude Chelala
- Centre for Molecular Oncology and Imaging, Institute of Cancer & CR-UK Clinical Centre, Barts & The London School of Medicine (QMUL), Charterhouse Square, London EC1M 6BQ, UK.
| | | | | |
Collapse
|
86
|
Baumbach J, Tauch A, Rahmann S. Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks. Brief Bioinform 2008; 10:75-83. [PMID: 19074493 DOI: 10.1093/bib/bbn055] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To handle changing environmental surroundings and to manage unfavorable conditions, microbial organisms have evolved complex transcriptional regulatory networks. To comprehensively analyze these gene regulatory networks, several online available databases and analysis platforms have been implemented and established. In this article, we address the typical cycle of scientific knowledge exploration and integration in the area of procaryotic transcriptional gene regulation. We briefly review five popular, publicly available systems that support (i) the integration of existing knowledge, (ii) visualization capabilities and (iii) computer analysis to predict promising wet lab targets. We exemplify the benefits of such integrated data analysis platforms by means of four application cases exemplarily performed with the corynebacterial reference database CoryneRegNet.
Collapse
Affiliation(s)
- Jan Baumbach
- International Computer Science Institute, Berkeley, USA.
| | | | | |
Collapse
|
87
|
Torii M, Hu Z, Wu CH, Liu H. BioTagger-GM: a gene/protein name recognition system. J Am Med Inform Assoc 2008; 16:247-55. [PMID: 19074302 DOI: 10.1197/jamia.m2844] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVES Biomedical named entity recognition (BNER) is a critical component in automated systems that mine biomedical knowledge in free text. Among different types of entities in the domain, gene/protein would be the most studied one for BNER. Our goal is to develop a gene/protein name recognition system BioTagger-GM that exploits rich information in terminology sources using powerful machine learning frameworks and system combination. DESIGN BioTagger-GM consists of four main components: (1) dictionary lookup-gene/protein names in BioThesaurus and biomedical terms in UMLS Metathesaurus are tagged in text, (2) machine learning-machine learning systems are trained using dictionary lookup results as one type of feature, (3) post-processing-heuristic rules are used to correct recognition errors, and (4) system combination-a voting scheme is used to combine recognition results from multiple systems. MEASUREMENTS The BioCreAtIvE II Gene Mention (GM) corpus was used to evaluate the proposed method. To test its general applicability, the method was also evaluated on the JNLPBA corpus modified for gene/protein name recognition. The performance of the systems was evaluated through cross-validation tests and measured using precision, recall, and F-Measure. RESULTS BioTagger-GM achieved an F-Measure of 0.8887 on the BioCreAtIvE II GM corpus, which is higher than that of the first-place system in the BioCreAtIvE II challenge. The applicability of the method was also confirmed on the modified JNLPBA corpus. CONCLUSION The results suggest that terminology sources, powerful machine learning frameworks, and system combination can be integrated to build an effective BNER system.
Collapse
Affiliation(s)
- Manabu Torii
- The Imaging Science and Information Systems Center, Department of Oncology, Georgetown University Medical Center, 2115 Wisconsin Avenue NW, Washington, DC 20057, USA.
| | | | | | | |
Collapse
|
88
|
Toll-Riera M, Bosch N, Bellora N, Castelo R, Armengol L, Estivill X, Albà MM. Origin of primate orphan genes: a comparative genomics approach. Mol Biol Evol 2008; 26:603-12. [PMID: 19064677 DOI: 10.1093/molbev/msn281] [Citation(s) in RCA: 182] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Genomes contain a large number of genes that do not have recognizable homologues in other species and that are likely to be involved in important species-specific adaptive processes. The origin of many such "orphan" genes remains unknown. Here we present the first systematic study of the characteristics and mechanisms of formation of primate-specific orphan genes. We determine that codon usage values for most orphan genes fall within the bulk of the codon usage distribution of bona fide human proteins, supporting their current protein-coding annotation. We also show that primate orphan genes display distinctive features in relation to genes of wider phylogenetic distribution: higher tissue specificity, more rapid evolution, and shorter peptide size. We estimate that around 24% are highly divergent members of mammalian protein families. Interestingly, around 53% of the orphan genes contain sequences derived from transposable elements (TEs) and are mostly located in primate-specific genomic regions. This indicates frequent recruitment of TEs as part of novel genes. Finally, we also obtain evidence that a small fraction of primate orphan genes, around 5.5%, might have originated de novo from mammalian noncoding genomic regions.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Biomedical Informatics Research Programme, Fundació Institut Municipal d'Investigació Mèdica, Barcelona, Spain
| | | | | | | | | | | | | |
Collapse
|
89
|
Abstract
Protein-protein interactions (PPIs) play a vital role in initiating infection in a number of pathogens. Identifying which interactions allow a pathogen to infect its host can help us to understand methods of pathogenesis and provide potential targets for therapeutics. Public resources for studying host-pathogen systems, in particular PPIs, are scarce. To facilitate the study of host-pathogen PPIs, we have collected and integrated host-pathogen PPI (HP-PPI) data from a number of public resources to create the Pathogen Interaction Gateway (PIG). PIG provides a text based search and a BLAST interface for searching the HP-PPI data. Each entry in PIG includes information such as the functional annotations and the domains present in the interacting proteins. PIG provides links to external databases to allow for easy navigation among the various websites. Additionally, PIG includes a tool for visualizing a single HP-PPI network or two HP-PPI networks. PIG can be accessed at http://pig.vbi.vt.edu.
Collapse
Affiliation(s)
- Tim Driscoll
- Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | | | | | | |
Collapse
|
90
|
Zhao H. Protein engineering of gene switches and scissors for human gene therapy. J Biotechnol 2008. [DOI: 10.1016/j.jbiotec.2008.07.390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
91
|
Bromberg Y, Rost B. Comprehensive in silico mutagenesis highlights functionally important residues in proteins. Bioinformatics 2008; 24:i207-12. [PMID: 18689826 PMCID: PMC2597370 DOI: 10.1093/bioinformatics/btn268] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Mutating residues into alanine (alanine scanning) is one of the fastest experimental means of probing hypotheses about protein function. Alanine scans can reveal functional hot spots, i.e. residues that alter function upon mutation. In vitro mutagenesis is cumbersome and costly: probing all residues in a protein is typically as impossible as substituting by all non-native amino acids. In contrast, such exhaustive mutagenesis is feasible in silico. RESULTS Previously, we developed SNAP to predict functional changes due to non-synonymous single nucleotide polymorphisms. Here, we applied SNAP to all experimental mutations in the ASEdb database of alanine scans; we identi.ed 70% of the hot spots (>or=1 kCal/mol change in binding energy); more severe changes were predicted more accurately. Encouraged, we carried out a complete all-against-all in silico mutagenesis for human glucokinase. Many of the residues predicted as functionally important have indeed been con.rmed in the literature, others await experimental veri.cation, and our method is ready to aid in the design of in vitro mutagenesis. AVAILABILITY ASEdb and glucokinase scores are available at http://www.rostlab.org/services/SNAP. For submissions of large/whole proteins for processing please contact the author.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry Molecular Biophysics, Columbia University, 630 West 168th St, New York, NY 10032, USA.
| | | |
Collapse
|
92
|
Lemoine F, Labedan B, Froidevaux C. GenoQuery: a new querying module for functional annotation in a genomic warehouse. Bioinformatics 2008; 24:i322-9. [PMID: 18586731 PMCID: PMC2718637 DOI: 10.1093/bioinformatics/btn159] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Motivation: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. Results: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. Availability:http://www.lri.fr/~lemoine/GenoQuery/ Contact:chris@lri.fr, lemoine@lri.fr
Collapse
Affiliation(s)
- Frédéric Lemoine
- Institut de Génétique et Microbiologie, Université Paris-Sud XI, 91405 Orsay Cedex, France
| | | | | |
Collapse
|
93
|
Bazzicalupi C, Bencini A, Bonaccini C, Giorgi C, Gratteri P, Moro S, Palumbo M, Simionato A, Sgrignani J, Sissi C, Valtancoli B. Tuning the Activity of Zn(II) Complexes in DNA Cleavage: Clues for Design of New Efficient Metallo-Hydrolases. Inorg Chem 2008; 47:5473-84. [DOI: 10.1021/ic800085n] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Carla Bazzicalupi
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Andrea Bencini
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Claudia Bonaccini
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Claudia Giorgi
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Paola Gratteri
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Stefano Moro
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Manlio Palumbo
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Alessandro Simionato
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Jacopo Sgrignani
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Claudia Sissi
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| | - Barbara Valtancoli
- Dipartimento di Chimica, Università degli Studi di Firenze, Via della Lastruccia 3, 50019, Sesto Fiorentino, Firenze, Italy, Laboratorio di Molecular Modeling, Cheminformatics and QSAR, Dipartimento di Scienze Farmaceutiche, Laboratorio di Progettazione, Sintesi e Studio di Eterocicli Biologicamente Attivi, Polo Scientifico, Università degli Studi di Firenze, Via Ugo Schiff, 6, 50019 Sesto Fiorentino (FI), Italy, and Dipartimento di Scienze Farmaceutiche, Università degli Studi di Padova, Via Marzolo 5,
| |
Collapse
|
94
|
Chen YH, Liu CK, Chang SC, Lin YJ, Tsai MF, Chen YT, Yao A. GenoWatch: a disease gene mining browser for association study. Nucleic Acids Res 2008; 36:W336-40. [PMID: 18440974 PMCID: PMC2447740 DOI: 10.1093/nar/gkn214] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
A human gene association study often involves several genomic markers such as single nucleotide polymorphisms (SNPs) or short tandem repeat polymorphisms, and many statistically significant markers may be identified during the study. GenoWatch can efficiently extract up-to-date information about multiple markers and their associated genes in batch mode from many relevant biological databases in real-time. The comprehensive gene information retrieved includes gene ontology, function, pathway, disease, related articles in PubMed and so on. Subsequent SNP functional impact analysis and primer design of a target gene for re-sequencing can also be done in a few clicks. The presentation of results has been carefully designed to be as intuitive as possible to all users. The GenoWatch is available at the website http://genepipe.ngc.sinica.edu.tw/genowatch
Collapse
Affiliation(s)
- Yan-Hau Chen
- National Genotyping Center (NGC) and Institute of Biomedical Sciences (IBMS), Academia Sinica, Taipei, Taiwan 11529, R.O.C
| | | | | | | | | | | | | |
Collapse
|
95
|
Hackenberg M, Matthiesen R. Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists. Bioinformatics 2008; 24:1386-93. [PMID: 18434345 DOI: 10.1093/bioinformatics/btn178] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Michael Hackenberg
- Bioinformatics Group, CIC bioGUNE, CIBER-HEPAD, Technology Park of Bizkaia, 48160 Derio, Bizkaia, Spain.
| | | |
Collapse
|
96
|
Tusnády GE, Kalmár L, Hegyi H, Tompa P, Simon I. TOPDOM: database of domains and motifs with conservative location in transmembrane proteins. Bioinformatics 2008; 24:1469-70. [PMID: 18434342 PMCID: PMC2427164 DOI: 10.1093/bioinformatics/btn202] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Summary: The TOPDOM database is a collection of domains and sequence motifs located consistently on the same side of the membrane in α-helical transmembrane proteins. The database was created by scanning well-annotated transmembrane protein sequences in the UniProt database by specific domain or motif detecting algorithms. The identified domains or motifs were added to the database if they were uniformly annotated on the same side of the membrane of the various proteins in the UniProt database. The information about the location of the collected domains and motifs can be incorporated into constrained topology prediction algorithms, like HMMTOP, increasing the prediction accuracy. Availability: The TOPDOM database and the constrained HMMTOP prediction server are available on the page http://topdom.enzim.hu Contact:tusi@enzim.hu; lkalmar@enzim.hu
Collapse
Affiliation(s)
- Gábor E Tusnády
- Institue of Enzymology, BRC, Hungarian Academy of Sciences, H-1113 Karolina út 29, Budapest, Hungary.
| | | | | | | | | |
Collapse
|
97
|
Abstract
Directed evolution has been successfully used to engineer proteins for basic and applied biological research. However, engineering of novel protein functions by directed evolution remains an overwhelming challenge. This challenge may come from the fact that multiple simultaneously or synergistic mutations are required for the creation of a novel protein function. Here we review the key developments in engineering of novel protein functions by using either directed evolution or a combined directed evolution and rational or computational design approach. Specific attention will be paid to a molecular evolution model for generation of novel proteins. The engineered novel proteins should not only broaden the range of applications of proteins but also provide new insights into protein structure-function relationship and protein evolution.
Collapse
Affiliation(s)
- Huimin Zhao
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, 600 South Mathews Avenue, Urbana, Illinois 61801, USA.
| |
Collapse
|
98
|
Sprenger J, Lynn Fink J, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD. LOCATE: a mammalian protein subcellular localization database. Nucleic Acids Res 2007; 36:D230-3. [PMID: 17986452 PMCID: PMC2238969 DOI: 10.1093/nar/gkm950] [Citation(s) in RCA: 101] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
LOCATE is a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of mouse and human proteins. Over the past 2 years, the data in LOCATE have grown substantially. The database now contains high-quality localization data for 20% of the mouse proteome and general localization annotation for nearly 36% of the mouse proteome. The proteome annotated in LOCATE is from the RIKEN FANTOM Consortium Isoform Protein Sequence sets which contains 58 128 mouse and 64 637 human protein isoforms. Other additions include computational subcellular localization predictions, automated computational classification of experimental localization image data, prediction of protein sorting signals and third party submission of literature data. Collectively, this database provides localization proteome for individual subcellular compartments that will underpin future systematic investigations of these regions. It is available at http://locate.imb.uq.edu.au/
Collapse
Affiliation(s)
- Josefine Sprenger
- ARC Centre of Excellence in Bioinformatics, Institute for Molecular Bioscience, The University of Queensland, St Lucia, Queensland 4072, Australia
| | | | | | | | | | | |
Collapse
|
99
|
Abstract
UNLABELLED PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture. AVAILABILITY PfamAlyzer has been integrated with the Pfam database and can be accessed at http://pfam.cgb.ki.se/pfamalyzer.
Collapse
Affiliation(s)
- Volker Hollich
- Department of Cell and Molecular Biology, Karolinska Institutet, S-171 77 Stockholm, Sweden
| | | |
Collapse
|
100
|
Domingues FS, Rahnenführer J, Lengauer T. Conformational analysis of alternative protein structures. Bioinformatics 2007; 23:3131-8. [PMID: 17933849 DOI: 10.1093/bioinformatics/btm499] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Alternative structural models determined experimentally are available for an increasing number of proteins. Structural and functional studies of these proteins need to take these models into consideration as they can present considerable structural differences. The characterization of the structural differences and similarities between these models is a fundamental task in structural biology requiring appropriate methods. RESULTS We propose a method for characterizing sets of alternative structural models. Three types of analysis are performed: grouping according to structural similarity, visualization and detection of structural variation and comparison of subsets for identifying and locating distinct conformational states. The alpha carbon atoms are used in order to analyse the backbone conformations. Alternatively, side-chain atoms are used for detailed conformational analysis of specific sites. The method takes into account estimates of atom coordinate uncertainty. The invariant regions are used to generate optimal superpositions of these models. We present the results obtained for three proteins showing different degrees of conformational variability: relative motion of two structurally conserved subdomains, a disordered subdomain and flexibility in the functional site associated with ligand binding. The method has been applied in the analysis of the alternative models available in SCOP. Considerable structural variability can be observed for most proteins. AVAILABILITY The results of the analysis of the SCOP alternative models, the estimates of coordinate uncertainty as well as the source code of the implementation are available in the STRuster web site: http://struster.bioinf.mpi-inf.mpg.de.
Collapse
|