951
|
Relationship between insertion/deletion (indel) frequency of proteins and essentiality. BMC Bioinformatics 2007; 8:227. [PMID: 17598914 PMCID: PMC1925122 DOI: 10.1186/1471-2105-8-227] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2006] [Accepted: 06/28/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In a previous study, we demonstrated that some essential proteins from pathogenic organisms contained sizable insertions/deletions (indels) when aligned to human proteins of high sequence similarity. Such indels may provide sufficient spatial differences between the pathogenic protein and human proteins to allow for selective targeting. In one example, an indel difference was targeted via large scale in-silico screening. This resulted in selective antibodies and small compounds which were capable of binding to the deletion-bearing essential pathogen protein without any cross-reactivity to the highly similar human protein. The objective of the current study was to investigate whether indels were found more frequently in essential than non-essential proteins. RESULTS We have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomyces cerevisiae, for which high-quality protein essentiality data is available. Using these data, we demonstrated with t-test calculations that the mean indel frequencies in essential proteins were greater than that of non-essential proteins in the three proteomes. The abundance of indels in both types of proteins was also shown to be accurately modeled by the Weibull distribution. However, Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not be used as a marker to accurately discriminate between essential and non-essential proteins in the three proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae and observed that indel-bearing proteins were involved in more interactions and had greater betweenness values within Protein Interaction Networks (PINs). CONCLUSION Overall, our findings demonstrated that indels were not randomly distributed across the studied proteomes and were likely to occur more often in essential proteins and those that were highly connected, indicating a possible role of sequence insertions and deletions in the regulation and modification of protein-protein interactions. Such observations will provide new insights into indel-based drug design using bioinformatics and cheminformatics tools.
Collapse
|
952
|
Zimmer A, Lang D, Richardt S, Frank W, Reski R, Rensing SA. Dating the early evolution of plants: detection and molecular clock analyses of orthologs. Mol Genet Genomics 2007; 278:393-402. [PMID: 17593393 DOI: 10.1007/s00438-007-0257-6] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2007] [Accepted: 05/24/2007] [Indexed: 11/28/2022]
Abstract
Orthologs generally are under selective pressure against loss of function, while paralogs usually accumulate mutations and finally die or deviate in terms of function or regulation. Most ortholog detection methods contaminate the resulting datasets with a substantial amount of paralogs. Therefore we aimed to implement a straightforward method that allows the detection of ortholog clusters with a reduced amount of paralogs from completely sequenced genomes. The described cross-species expansion of the reciprocal best BLAST hit method is a time-effective method for ortholog detection, which results in 68% truly orthologous clusters and the procedure specifically enriches single-copy orthologs. The detection of true orthologs can provide a phylogenetic toolkit to better understand evolutionary processes. In a study across six photosynthetic eukaryotes, nuclear genes of putative mitochondrial origin were shown to be over-represented among single copy orthologs. These orthologs are involved in fundamental biological processes like amino acid metabolism or translation. Molecular clock analyses based on this dataset yielded divergence time estimates for the red/green algae (1,142 MYA), green algae/land plant (725 MYA), mosses/seed plant (496 MYA), gymno-/angiosperm (385 MYA) and monocotyledons/core eudicotyledons (301 MYA) divergence times.
Collapse
Affiliation(s)
- Andreas Zimmer
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Schaenzlestr. 1, 79104, Freiburg, Germany
| | | | | | | | | | | |
Collapse
|
953
|
Bio::NEXUS: a Perl API for the NEXUS format for comparative biological data. BMC Bioinformatics 2007; 8:191. [PMID: 17559666 PMCID: PMC1913543 DOI: 10.1186/1471-2105-8-191] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 06/08/2007] [Indexed: 11/30/2022] Open
Abstract
Background Evolutionary analysis provides a formal framework for comparative analysis of genomic and other data. In evolutionary analysis, observed data are treated as the terminal states of characters that have evolved (via transitions between states) along the branches of a tree. The NEXUS standard of Maddison, et al. (1997; Syst. Biol. 46: 590–621) provides a portable, expressive and flexible text format for representing character-state data and trees. However, due to its complexity, NEXUS is not well supported by software and is not easily accessible to bioinformatics users and developers. Results Bio::NEXUS is an application programming interface (API) implemented in Perl, available from CPAN and SourceForge. The 22 Bio::NEXUS modules define 351 methods in 4229 lines of code, with 2706 lines of POD (Plain Old Documentation). Bio::NEXUS provides an object-oriented interface to reading, writing and manipulating the contents of NEXUS files. It closely follows the extensive explanation of the NEXUS format provided by Maddison et al., supplemented with a few extensions such as support for the NHX (New Hampshire Extended) tree format. Conclusion In spite of some limitations owing to the complexity of NEXUS files and the lack of a formal grammar, NEXUS will continue to be useful for years to come. Bio::NEXUS provides a user-friendly API for NEXUS supplemented with an extensive set of methods for manipulations such as re-rooting trees and selecting subsets of data. Bio::NEXUS can be used as glue code for connecting existing software that uses NEXUS, or as a framework for new applications.
Collapse
|
954
|
Ou HY, He X, Harrison EM, Kulasekara BR, Thani AB, Kadioglu A, Lory S, Hinton JCD, Barer MR, Deng Z, Rajakumar K. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands. Nucleic Acids Res 2007; 35:W97-W104. [PMID: 17537813 PMCID: PMC1933208 DOI: 10.1093/nar/gkm380] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or 'mobile genome' (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate 'inferred contigs' produced by merging adjacent genes classified as 'present'. Collectively these 'fragments' represent a hypothetical 'microarray-visualized genome (MVG)'. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands.
Collapse
Affiliation(s)
- Hong-Yu Ou
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Xinyi He
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Ewan M. Harrison
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Bridget R. Kulasekara
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Ali Bin Thani
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Aras Kadioglu
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Stephen Lory
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Jay C. D. Hinton
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Michael R. Barer
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Zixin Deng
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
| | - Kumar Rajakumar
- Laboratory of Microbial Metabolism and School of Life Science & Biotechnology, Shanghai Jiaotong University, P. R. China, Department of Infection, Immunity and Inflammation, Leicester Medical School, University of Leicester, Leicester LE1 9HN, UK, Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, MA 02115, USA, Molecular Microbiology Group, Institute of Food Research, Norwich Research Park, Norwich NR4 7UA and Department of Clinical Microbiology, University Hospitals of Leicester NHS Trust, Leicester LE1 5WW, UK
- *To whom correspondence should be addressed. +44 116 2231498+44 116 2525030 Correspondence may also be addressed to Zixin Deng. +86 21 62933404+86 21 62932418
| |
Collapse
|
955
|
Gene function prediction based on genomic context clustering and discriminative learning: an application to bacteriophages. BMC Bioinformatics 2007; 8 Suppl 4:S6. [PMID: 17570149 PMCID: PMC1892085 DOI: 10.1186/1471-2105-8-s4-s6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Existing methods for whole-genome comparisons require prior knowledge of related species and provide little automation in the function prediction process. Bacteriophage genomes are an example that cannot be easily analyzed by these methods. This work addresses these shortcomings and aims to provide an automated prediction system of gene function. RESULTS We have developed a novel system called SynFPS to perform gene function prediction over completed genomes. The prediction system is initialized by clustering a large collection of weakly related genomes into groups based on their resemblance in gene distribution. From each individual group, data are then extracted and used to train a Support Vector Machine that makes gene function predictions. Experiments were conducted with 9 different gene functions over 296 bacteriophage genomes. Cross validation results gave an average prediction accuracy of ~80%, which is comparable to other genomic-context based prediction methods. Functional predictions are also made on 3 uncharacterized genes and 12 genes that cannot be identified by sequence alignment. The software is publicly available at http://www.synteny.net/. CONCLUSION The proposed system employs genomic context to predict gene function and detect gene correspondence in whole-genome comparisons. Although our experimental focus is on bacteriophages, the method may be extended to other microbial genomes as they share a number of similar characteristics with phage genomes such as gene order conservation.
Collapse
|
956
|
Houwing S, Kamminga LM, Berezikov E, Cronembold D, Girard A, van den Elst H, Filippov DV, Blaser H, Raz E, Moens CB, Plasterk RHA, Hannon GJ, Draper BW, Ketting RF. A role for Piwi and piRNAs in germ cell maintenance and transposon silencing in Zebrafish. Cell 2007; 129:69-82. [PMID: 17418787 DOI: 10.1016/j.cell.2007.03.026] [Citation(s) in RCA: 811] [Impact Index Per Article: 45.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2006] [Revised: 01/24/2007] [Accepted: 03/20/2007] [Indexed: 11/23/2022]
Abstract
Piwi proteins specify an animal-specific subclass of the Argonaute family that, in vertebrates, is specifically expressed in germ cells. We demonstrate that zebrafish Piwi (Ziwi) is expressed in both the male and the female gonad and is a component of a germline-specifying structure called nuage. Loss of Ziwi function results in a progressive loss of germ cells due to apoptosis during larval development. In animals that have reduced Ziwi function, germ cells are maintained but display abnormal levels of apoptosis in adults. In mammals, Piwi proteins associate with approximately 29-nucleotide-long, testis-specific RNA molecules called piRNAs. Here we show that zebrafish piRNAs are present in both ovary and testis. Many of these are derived from transposons, implicating a role for piRNAs in the silencing of repetitive elements in vertebrates. Furthermore, we show that piRNAs are Dicer independent and that their 3' end likely carries a 2'O-Methyl modification.
Collapse
Affiliation(s)
- Saskia Houwing
- Hubrecht Laboratory, Uppsalalaan 8, Utrecht, Netherlands
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
957
|
Rausch C, Hoof I, Weber T, Wohlleben W, Huson DH. Phylogenetic analysis of condensation domains in NRPS sheds light on their functional evolution. BMC Evol Biol 2007; 7:78. [PMID: 17506888 PMCID: PMC1894796 DOI: 10.1186/1471-2148-7-78] [Citation(s) in RCA: 255] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Accepted: 05/16/2007] [Indexed: 10/25/2022] Open
Abstract
BACKGROUND Non-ribosomal peptide synthetases (NRPSs) are large multimodular enzymes that synthesize a wide range of biologically active natural peptide compounds, of which many are pharmacologically important. Peptide bond formation is catalyzed by the Condensation (C) domain. Various functional subtypes of the C domain exist: An LCL domain catalyzes a peptide bond between two L-amino acids, a DCL domain links an L-amino acid to a growing peptide ending with a D-amino acid, a Starter C domain (first denominated and classified as a separate subtype here) acylates the first amino acid with a beta-hydroxy-carboxylic acid (typically a beta-hydroxyl fatty acid), and Heterocyclization (Cyc) domains catalyze both peptide bond formation and subsequent cyclization of cysteine, serine or threonine residues. The homologous Epimerization (E) domain flips the chirality of the last amino acid in the growing peptide; Dual E/C domains catalyze both epimerization and condensation. RESULTS In this paper, we report on the reconstruction of the phylogenetic relationship of NRPS C domain subtypes and analyze in detail the sequence motifs of recently discovered subtypes (Dual E/C, DCL and Starter domains) and their characteristic sequence differences, mutually and in comparison with LCL domains. Based on their phylogeny and the comparison of their sequence motifs, LCL and Starter domains appear to be more closely related to each other than to other subtypes, though pronounced differences in some segments of the protein account for the unequal donor substrates (amino vs. beta-hydroxy-carboxylic acid). Furthermore, on the basis of phylogeny and the comparison of sequence motifs, we conclude that Dual E/C and DCL domains share a common ancestor. In the same way, the evolutionary origin of a C domain of unknown function in glycopeptide (GP) NRPSs can be determined to be an LCL domain. In the case of two GP C domains which are most similar to DCL but which have LCL activity, we postulate convergent evolution. CONCLUSION We systematize all C domain subtypes including the novel Starter C domain. With our results, it will be easier to decide the subtype of unknown C domains as we provide profile Hidden Markov Models (pHMMs) for the sequence motifs as well as for the entire sequences. The determined specificity conferring positions will be helpful for the mutation of one subtype into another, e.g. turning DCL to LCL, which can be a useful step for obtaining novel products.
Collapse
Affiliation(s)
- Christian Rausch
- Center for Bioinformatics Tübingen (ZBIT), Eberhard-Karls-Universität Tübingen, Sand 14, 72076 Tübingen, Germany
| | - Ilka Hoof
- Center for Bioinformatics Tübingen (ZBIT), Eberhard-Karls-Universität Tübingen, Sand 14, 72076 Tübingen, Germany
- Center for Biological Sequence Analysis, BioCentrum, Danmarks Tekniske Universitet, Building 208, 2800 Lyngby, Denmark
| | - Tilmann Weber
- Department of Microbiology/Biotechnology, Eberhard-Karls-Universität Tübingen, Auf der Morgenstelle 28, 72076 Tübingen, Germany
| | - Wolfgang Wohlleben
- Department of Microbiology/Biotechnology, Eberhard-Karls-Universität Tübingen, Auf der Morgenstelle 28, 72076 Tübingen, Germany
| | - Daniel H Huson
- Center for Bioinformatics Tübingen (ZBIT), Eberhard-Karls-Universität Tübingen, Sand 14, 72076 Tübingen, Germany
| |
Collapse
|
958
|
Liang C, Wang G, Liu L, Ji G, Liu Y, Chen J, Webb JS, Reese G, Dean JFD. WebTraceMiner: a web service for processing and mining EST sequence trace files. Nucleic Acids Res 2007; 35:W137-42. [PMID: 17488839 PMCID: PMC1933163 DOI: 10.1093/nar/gkm299] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
Expressed sequence tags (ESTs) remain a dominant approach for characterizing the protein-encoding portions of various genomes. Due to inherent deficiencies, they also present serious challenges for data quality control. Before GenBank submission, EST sequences are typically screened and trimmed of vector and adapter/linker sequences, as well as polyA/T tails. Removal of these sequences presents an obstacle for data validation of error-prone ESTs and impedes data mining of certain functional motifs, whose detection relies on accurate annotation of positional information for polyA tails added posttranscriptionally. As raw DNA sequence information is made increasingly available from public repositories, such as NCBI Trace Archive, new tools will be necessary to reanalyze and mine this data for new information. WebTraceMiner (www.conifergdb.org/software/wtm) was designed as a public sequence processing service for raw EST traces, with a focus on detection and mining of sequence features that help characterize 3′ and 5′ termini of cDNA inserts, including vector fragments, adapter/linker sequences, insert-flanking restriction endonuclease recognition sites and polyA or polyT tails. WebTraceMiner complements other public EST resources and should prove to be a unique tool to facilitate data validation and mining of error-prone ESTs (e.g. discovery of new functional motifs).
Collapse
Affiliation(s)
- Chun Liang
- Department of Botany, Miami University, Oxford, Ohio 45056, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
959
|
Abstract
MOTIVATION The genome sequencing revolution is approaching a landmark figure of 1000 completely sequenced genomes. Coupled with fast-declining, per-base sequencing costs, this influx of DNA sequence data has encouraged laboratory scientists to engage large datasets in comparative sequence analyses for making evolutionary, functional and translational inferences. However, the majority of the scientists at the forefront of experimental research are not bioinformaticians, so a gap exists between the user-friendly software needed and the scripting/programming infrastructure often employed for the analysis of large numbers of genes, long genomic segments and groups of sequences. We see an urgent need for the expansion of the fundamental paradigms under which biologist-friendly software tools are designed and developed to fulfill the needs of biologists to analyze large datasets by using sophisticated computational methods. We argue that the design principles need to be sensitive to the reality that comparatively small teams of biologists have historically developed some of the most popular biological software packages in molecular evolutionary analysis. Furthermore, biological intuitiveness and investigator empowerment need to take precedence over the current supposition that biologists should re-tool and become programmers when analyzing genome scale datasets.
Collapse
Affiliation(s)
- Sudhir Kumar
- Center for Evolutionary Functional Genomics, The Biodesign Institute and School of Life Sciences, Arizona State University, Tempe, Arizona 85287-5301, USA.
| | | |
Collapse
|
960
|
Mitchell RAC, Dupree P, Shewry PR. A novel bioinformatics approach identifies candidate genes for the synthesis and feruloylation of arabinoxylan. PLANT PHYSIOLOGY 2007; 144:43-53. [PMID: 17351055 PMCID: PMC1913792 DOI: 10.1104/pp.106.094995] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Accepted: 03/06/2007] [Indexed: 05/14/2023]
Abstract
Arabinoxylans (AXs) are major components of graminaceous plant cell walls, including those in the grain and straw of economically important cereals. Despite some recent advances in identifying the genes encoding biosynthetic enzymes for a number of other plant cell wall polysaccharides, the genes encoding enzymes of the final stages of AX synthesis have not been identified. We have therefore adopted a novel bioinformatics approach based on estimation of differential expression of orthologous genes between taxonomic divisions of species. Over 3 million public domain cereal and dicot expressed sequence tags were mapped onto the complete sets of rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana) genes, respectively. It was assumed that genes in cereals involved in AX biosynthesis would be expressed at high levels and that their orthologs in dicotyledonous plants would be expressed at much lower levels. Considering all rice genes encoding putative glycosyl transferases (GTs) predicted to be integral membrane proteins, genes in the GT43, GT47, and GT61 families emerged as much the strongest candidates. When the search was widened to all other rice or Arabidopsis genes predicted to encode integral membrane proteins, cereal genes in Pfam family PF02458 emerged as candidates for the feruloylation of AX. Our analysis, known activities, and recent findings elsewhere are most consistent with genes in the GT43 families encoding beta-1,4-xylan synthases, genes in the GT47 family encoding xylan alpha-1,2- or alpha-1,3-arabinosyl transferases, and genes in the GT61 family encoding feruloyl-AX beta-1,2-xylosyl transferases.
Collapse
Affiliation(s)
- Rowan A C Mitchell
- Biomathematics and Bioinformatics Division , Rothamsted Research, Harpenden, Hertfordshire, United Kingdom.
| | | | | |
Collapse
|
961
|
Gruber AR, Neuböck R, Hofacker IL, Washietl S. The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures. Nucleic Acids Res 2007; 35:W335-8. [PMID: 17452347 PMCID: PMC1933143 DOI: 10.1093/nar/gkm222] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many non-coding RNA genes and cis-acting regulatory elements of mRNAs contain RNA secondary structures that are critical for their function. Such functional RNAs can be predicted on the basis of thermodynamic stability and evolutionary conservation. We present a web server that uses the RNAz algorithm to detect functional RNA structures in multiple alignments of nucleotide sequences. The server provides access to a complete and fully automatic analysis pipeline that allows not only to analyze single alignments in a variety of formats, but also to conduct complex screens of large genomic regions. Results are presented on a website that is illustrated by various structure representations and can be downloaded for local view. The web server is available at: rna.tbi.univie.ac.at/RNAz.
Collapse
Affiliation(s)
| | | | | | - Stefan Washietl
- *To whom correspondence should be addressed. +43-1-4277-52744+43-1-4277-52793
| |
Collapse
|
962
|
Abstract
With the recent increase in the available number of high-quality, full-length mitochondrial sequences, it is now possible to construct and analyze a comprehensive human mitochondrial consensus sequence. Using a data set of 827 carefully selected sequences, it is shown that modern humans contain extremely low levels of divergence from the mitochondrial consensus sequence, differing by a mere 21.6 nt sites on average. Fully 84.1% of the mitochondrial genome was found to be invariant and ‘private’ mutations accounted for 43.8% of the variable sites. Ninety eight percent of the variant sites had a primary nucleotide with an allele frequency of 0.90 or greater. Interestingly, the few truly ambiguous nucleotide sites could all be reliably assigned to either a purine or pyrimidine ancestral state. A comparison of this consensus sequence to several ancestral sequences derived from phylogenetic studies reveals a great deal of similarity, where, as expected, the most phylogenetically informative nucleotides in the ancestral studies tended to be the most variable nucleotides in the consensus. Allowing for this fact, the consensus approach provides variation data on the positions that do not contribute to phylogenetic reconstructions, and these data provide a baseline for measuring human mitochondrial variation in populations worldwide.
Collapse
Affiliation(s)
- Robert W Carter
- FMS Foundation, 7160 Stone Hill Rd., Livonia, NY 14487, USA.
| |
Collapse
|
963
|
Elsik CG, Mackey AJ, Reese JT, Milshina NV, Roos DS, Weinstock GM. Creating a honey bee consensus gene set. Genome Biol 2007; 8:R13. [PMID: 17241472 PMCID: PMC1839126 DOI: 10.1186/gb-2007-8-1-r13] [Citation(s) in RCA: 260] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2006] [Revised: 10/06/2006] [Indexed: 11/21/2022] Open
Abstract
A high-quality consensus gene set for the honey bee (Apis mellifera) created using a new algorithm (GLEAN) is described. Background We wished to produce a single reference gene set for honey bee (Apis mellifera). Our motivation was twofold. First, we wished to obtain an improved set of gene models with increased coverage of known genes, while maintaining gene model quality. Second, we wished to provide a single official gene list that the research community could further utilize for consistent and comparable analyses and functional annotation. Results We created a consensus gene set for honey bee (Apis mellifera) using GLEAN, a new algorithm that uses latent class analysis to automatically combine disparate gene prediction evidence in the absence of known genes. The consensus gene models had increased representation of honey bee genes without sacrificing quality compared with any one of the input gene predictions. When compared with manually annotated gold standards, the consensus set of gene models was similar or superior in quality to each of the input sets. Conclusion Most eukaryotic genome projects produce multiple gene sets because of the variety of gene prediction programs. Each of the gene prediction programs has strengths and weaknesses, and so the multiplicity of gene sets offers users a more comprehensive collection of genes to use than is available from a single program. On the other hand, the availability of multiple gene sets is also a cause for uncertainty among users as regards which set they should use. GLEAN proved to be an effective method to combine gene lists into a single reference set.
Collapse
Affiliation(s)
- Christine G Elsik
- Department of Animal Science, Texas A&M University, TAMU, College Station, Texas 77843, USA
| | - Aaron J Mackey
- Penn Genomics Institute, University of Pennsylvania, S. University Avenue, Philadelphia, Pennsylvania 19104, USA
- GlaxoSmithKline, S. Collegeville Road, Collegeville, Pennsylvania 19426, USA
| | - Justin T Reese
- Department of Animal Science, Texas A&M University, TAMU, College Station, Texas 77843, USA
| | - Natalia V Milshina
- Department of Animal Science, Texas A&M University, TAMU, College Station, Texas 77843, USA
| | - David S Roos
- Penn Genomics Institute, University of Pennsylvania, S. University Avenue, Philadelphia, Pennsylvania 19104, USA
| | - George M Weinstock
- Human Genome Sequencing Center, Baylor College of Medicine, Baylor Plaza, Houston, Texas 77030, USA
| |
Collapse
|
964
|
Farwick A, Jordan U, Fuellen G, Huchon D, Catzeflis F, Brosius J, Schmitz J. Automated scanning for phylogenetically informative transposed elements in rodents. Syst Biol 2007; 55:936-48. [PMID: 17345675 DOI: 10.1080/10635150601064806] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
Transposed elements constitute an attractive, useful source of phylogenetic markers to elucidate the evolutionary history of their hosts. Frequent and successive amplifications over evolutionary time are important requirements for utilizing their presence or absence as landmarks of evolution. Although transposed elements are well distributed in rodent taxa, the generally high degree of genomic sequence divergence among species complicates our access to presence/absence data. With this in mind we developed a novel, high-throughput computational strategy, called CPAL (Conserved Presence/Absence Locus-finder), to identify genome-wide distributed, phylogenetically informative transposed elements flanked by highly conserved regions. From a total of 232 extracted chromosomal mouse loci we randomly selected 14 of these plus 2 others from previous test screens and attempted to amplify them via PCR in representative rodent species. All loci were amplifiable and ultimately contributed 31 phylogenetically informative markers distributed throughout the major groups of Rodentia.
Collapse
Affiliation(s)
- Astrid Farwick
- Institute of Experimental Pathology, ZMBE, University of Münster, Von-Esmarch-Str. 56, 48149 Münster, Germany
| | | | | | | | | | | | | |
Collapse
|
965
|
Pang CNI, Hayen A, Wilkins MR. Surface accessibility of protein post-translational modifications. J Proteome Res 2007; 6:1833-45. [PMID: 17428077 DOI: 10.1021/pr060674u] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Protein post-translational modifications are crucial to the function of many proteins. In this study, we have investigated the structural environment of 8378 incidences of 44 types of post-translational modifications with 19 different approaches. We show that modified amino acids likely to be involved in protein-protein interactions, such as ester-linked phosphorylation, methylarginine, acetyllysine, sulfotyrosine, hydroxyproline, and hydroxylysine, are clearly surface associated. Other modifications, including O-GlcNAc, phosphohistidine, 4-aspartylphosphate, methyllysine, and ADP-ribosylarginine, are either not surface associated or are in a protein's core. Artifactual modifications were found to be randomly distributed throughout the protein. We discuss how the surface accessibility of post-translational modifications can be important for protein-protein interactivity.
Collapse
Affiliation(s)
- Chi Nam Ignatius Pang
- Systems Biology Group, School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia
| | | | | |
Collapse
|
966
|
Chaudhuri RR, Ren CP, Desmond L, A. Vincent G, Silman NJ, Brehm JK, Elmore MJ, Hudson MJ, Forsman M, Isherwood KE, Guryčová D, Minton NP, Titball RW, Pallen MJ, Vipond R. Genome sequencing shows that European isolates of Francisella tularensis subspecies tularensis are almost identical to US laboratory strain Schu S4. PLoS One 2007; 2:e352. [PMID: 17406676 PMCID: PMC1832225 DOI: 10.1371/journal.pone.0000352] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2007] [Accepted: 03/12/2007] [Indexed: 11/23/2022] Open
Abstract
Background Francisella tularensis causes tularaemia, a life-threatening zoonosis, and has potential as a biowarfare agent. F. tularensis subsp. tularensis, which causes the most severe form of tularaemia, is usually confined to North America. However, a handful of isolates from this subspecies was obtained in the 1980s from ticks and mites from Slovakia and Austria. Our aim was to uncover the origins of these enigmatic European isolates. Methodology/Principal Findings We determined the complete genome sequence of FSC198, a European isolate of F. tularensis subsp. tularensis, by whole-genome shotgun sequencing and compared it to that of the North American laboratory strain Schu S4. Apparent differences between the two genomes were resolved by re-sequencing discrepant loci in both strains. We found that the genome of FSC198 is almost identical to that of Schu S4, with only eight SNPs and three VNTR differences between the two sequences. Sequencing of these loci in two other European isolates of F. tularensis subsp. tularensis confirmed that all three European isolates are also closely related to, but distinct from Schu S4. Conclusions/Significance The data presented here suggest that the Schu S4 laboratory strain is the most likely source of the European isolates of F. tularensis subsp. tularensis and indicate that anthropogenic activities, such as movement of strains or animal vectors, account for the presence of these isolates in Europe. Given the highly pathogenic nature of this subspecies, the possibility that it has become established wild in the heartland of Europe carries significant public health implications.
Collapse
Affiliation(s)
- Roy R. Chaudhuri
- Division of Immunity and Infection, University of Birmingham, Edgbaston, United Kindgom
| | - Chuan-Peng Ren
- Division of Immunity and Infection, University of Birmingham, Edgbaston, United Kindgom
| | - Leah Desmond
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| | - Gemma A. Vincent
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| | - Nigel J. Silman
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| | - John K. Brehm
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| | - Michael J. Elmore
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| | - Michael J. Hudson
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| | - Mats Forsman
- Department of NBC Analysis, Division of NBC Defence, FOI Swedish Defence Research Agency, Umeå, Sweden
| | - Karen E. Isherwood
- Defence Science and Technology Laboratory, Porton Down, Salisbury, United Kindgom
| | - Darina Guryčová
- Department of Epidemiology, Medical Facility, Comenius University, Bratislava, Slovak Republic
| | - Nigel P. Minton
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
- Centre for Biomolecular Sciences, Institute of Infection, Immunity and Inflammation, University of Nottingham, Nottingham, United Kingdom
| | - Richard W. Titball
- Defence Science and Technology Laboratory, Porton Down, Salisbury, United Kindgom
| | - Mark J. Pallen
- Division of Immunity and Infection, University of Birmingham, Edgbaston, United Kindgom
- * To whom correspondence should be addressed. E-mail:
| | - Richard Vipond
- Health Protection Agency, Centre for Emergency Preparedness and Response, Porton Down, Salisbury, United Kindgom
| |
Collapse
|
967
|
Richardt S, Lang D, Reski R, Frank W, Rensing SA. PlanTAPDB, a phylogeny-based resource of plant transcription-associated proteins. PLANT PHYSIOLOGY 2007; 143:1452-66. [PMID: 17337525 PMCID: PMC1851845 DOI: 10.1104/pp.107.095760] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Diversification of transcription-associated protein (TAP) families during land plant evolution is a key process yielding increased complexity of plant life. Understanding the evolutionary relationships between these genes is crucial to gain insight into plant evolution. We have determined a substantial set of TAPs that are focused on, but not limited to, land plants using PSI-BLAST searches and subsequent filtering and clustering steps. Phylogenies were created in an automated way using a combination of distance and maximum likelihood methods. Comparison of the data to previously published work confirmed their accuracy and usefulness for the majority of gene families. Evidence is presented that the flowering plant apical stem cell regulator WUSCHEL evolved from an ancestral homeobox gene that was already present after the water-to-land transition. The presence of distinct expanded gene families, such as COP1 and HIT in moss, is discussed within the evolutionary backdrop. Comparative analyses revealed that almost all angiosperm transcription factor families were already present in the earliest land plants, whereas many are missing among unicellular algae. A global analysis not only of transcription factors but also of transcriptional regulators and novel putative families is presented. A wealth of data about plant TAP families and all data accrued throughout their automated detection and analysis are made available via the PlanTAPDB Web interface. Evolutionary relationships of these genes are readily accessible to the nonexpert at a mouse-click. Initial analyses of selected gene families revealed that PlanTAPDB can easily be exerted for knowledge discovery.
Collapse
Affiliation(s)
- Sandra Richardt
- Plant Biotechnology, Faculty of Biology, University of Freiburg, D-79104 Freiburg, Germany
| | | | | | | | | |
Collapse
|
968
|
Ortutay C, Siermala M, Vihinen M. ImmTree: database of evolutionary relationships of genes and proteins in the human immune system. Immunome Res 2007; 3:4. [PMID: 17376226 PMCID: PMC1845140 DOI: 10.1186/1745-7580-3-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2007] [Accepted: 03/21/2007] [Indexed: 11/10/2022] Open
Abstract
Background The immune system, which is a complex machinery, is based on the highly coordinated expression of a wide array of genes and proteins. The evolutionary history of the human immune system is not well characterised. Although several studies related to the development and evolution of immunological processes have been published, a full-scale genome-based analysis is still missing. A database focused on the evolutionary relationships of immune related genes would contribute to and facilitate research on immunology and evolutionary biology. Results An Internet resource called ImmTree was constructed for studying the evolution and evolutionary trees of the human immune system. ImmTree contains information about orthologs in 80 species collected from the HomoloGene, OrthoMCL and EGO databases. In addition to phylogenetic trees, the service provides data for the comparison of human-mouse ortholog pairs, including synonymous and non-synonymous mutation rates, Z values, and Ka/Ks quotients. A versatile search engine allows complex queries from the database. Currently, data is available for 847 human immune system related genes and proteins. Conclusion ImmTree provides a unique data set of genes and proteins from the human immune system, their phylogenetics, and information for comparisons of human-mouse ortholog pairs, synonymous and non-synonymous mutation rates, as well as other statistical information.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
| | - Markku Siermala
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
| | - Mauno Vihinen
- Institute of Medical Technology, FI-33014 University of Tampere, Finland
- Research Unit, Tampere University Hospital, FI-33520 Tampere, Finland
| |
Collapse
|
969
|
Abstract
Polyadenylation of nascent transcripts is one of the key mRNA processing events in eukaryotic cells. A large number of human and mouse genes have alternative polyadenylation sites, or poly(A) sites, leading to mRNA variants with different protein products and/or 3′-untranslated regions (3′-UTRs). PolyA_DB 2 contains poly(A) sites identified for genes in several vertebrate species, including human, mouse, rat, chicken and zebrafish, using alignments between cDNA/ESTs and genome sequences. Several new features have been added to the database since its last release, including syntenic genome regions for human poly(A) sites in seven other vertebrates and cis-element information adjacent to poly(A) sites. Trace sequences are used to provide additional evidence for poly(A/T) tails in cDNA/ESTs. The updated database is intended to broaden poly(A) site coverage in vertebrate genomes, and provide means to assess the authenticity of poly(A) sites identified by bioinformatics. The URL for this database is .
Collapse
Affiliation(s)
| | | | | | - Bin Tian
- To whom correspondence should be addressed. Tel: +1 973 972 3615; Fax: +1 973 972 5594;
| |
Collapse
|
970
|
Rambaldi D, Felice B, Praz V, Bucher P, Cittaro D, Guffanti A. Splicy: a web-based tool for the prediction of possible alternative splicing events from Affymetrix probeset data. BMC Bioinformatics 2007; 8 Suppl 1:S17. [PMID: 17430561 PMCID: PMC1885846 DOI: 10.1186/1471-2105-8-s1-s17] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Background The Affymetrix™ technology is nowadays a well-established method for the analysis of gene expression profiles in cancer research studies. However, changes in gene expression levels are not the only way to link genes and disease. The existence of gene isoforms specifically linked with cancer or apoptosis is increasingly found in literature. Hence it is of great interest to associate the results of a gene expression study with updated evidences on the transcript structure and its possible variants. Results We present here a web-based software tool, Splicy, whose primary task is to retrieve data on the mapping of Affymetrix™ probes to single exons of gene transcripts and displaying graphically this information projected on the gene physical structure. Starting from a list of Affymetrix™ probesets the program produces a series of graphical displays, each relative to a transcript associated with the gene targeted by a given probe. The information on the transcript-by-transcript and exon-by-exon mapping of probe pairs can be retrieved both graphically and in the form of tab-separated files. The mapping of single probes to NCBI RefSeq or EMBL cDNAs is handled by the ISREC mapping tables used in the CleanEx Expression Reference Database Project. We currently maintain these mappings for most popular human and mouse Affymetrix™ chips, and Splicy can be queried for matches with human and mouse NCBI RefSeq or EMBL cDNAs. Conclusion Splicy generates probeset annotations and images describing the relation between the single probes and intron/exon structure of the target transcript in all its known variants. We think that Splicy will be useful for giving to the researcher a clearer picture of the possible transcript variants linked with a given gene and an additional view on the interpretation of microarray experiment data. Splicy is publicly available and has been realized in the framework of a bioinformatics grant from the Italian Cancer Research Association.
Collapse
Affiliation(s)
| | - Barbara Felice
- The IFOM-IEO Campus, Via Adamello, 16 – 20139 Milano, Italy
| | - Viviane Praz
- ISREC, Ch. des Boveresses 155, Epalinges, Switzerland
| | - Philip Bucher
- ISREC, Ch. des Boveresses 155, Epalinges, Switzerland
| | - Davide Cittaro
- The IFOM-IEO Campus, Via Adamello, 16 – 20139 Milano, Italy
| | - Alessandro Guffanti
- The IFOM-IEO Campus, Via Adamello, 16 – 20139 Milano, Italy
- CNR-ITB, Via Fantoli 16/15 – 20138 Milano, Italy
| |
Collapse
|
971
|
Gama-Carvalho M, Barbosa-Morais NL, Brodsky AS, Silver PA, Carmo-Fonseca M. Genome-wide identification of functionally distinct subsets of cellular mRNAs associated with two nucleocytoplasmic-shuttling mammalian splicing factors. Genome Biol 2007; 7:R113. [PMID: 17137510 PMCID: PMC1794580 DOI: 10.1186/gb-2006-7-11-r113] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Revised: 10/18/2006] [Accepted: 11/30/2006] [Indexed: 11/19/2022] Open
Abstract
A genome wide identification of mRNAs that were associated with the splicing factor subunit U2AF65 suggests that U2AF65 associates with specific subsets of spliced mRNAs and may be involved in novel cellular functions in addition to splicing. Background Pre-mRNA splicing is an essential step in gene expression that occurs co-transcriptionally in the cell nucleus, involving a large number of RNA binding protein splicing factors, in addition to core spliceosome components. Several of these proteins are required for the recognition of intronic sequence elements, transiently associating with the primary transcript during splicing. Some protein splicing factors, such as the U2 small nuclear RNP auxiliary factor (U2AF), are known to be exported to the cytoplasm, despite being implicated solely in nuclear functions. This observation raises the question of whether U2AF associates with mature mRNA-ribonucleoprotein particles in transit to the cytoplasm, participating in additional cellular functions. Results Here we report the identification of RNAs immunoprecipitated by a monoclonal antibody specific for the U2AF 65 kDa subunit (U2AF65) and demonstrate its association with spliced mRNAs. For comparison, we analyzed mRNAs associated with the polypyrimidine tract binding protein (PTB), a splicing factor that also binds to intronic pyrimidine-rich sequences but additionally participates in mRNA localization, stability, and translation. Our results show that 10% of cellular mRNAs expressed in HeLa cells associate differentially with U2AF65 and PTB. Among U2AF65-associated mRNAs there is a predominance of transcription factors and cell cycle regulators, whereas PTB-associated transcripts are enriched in mRNA species that encode proteins implicated in intracellular transport, vesicle trafficking, and apoptosis. Conclusion Our results show that U2AF65 associates with specific subsets of spliced mRNAs, strongly suggesting that it is involved in novel cellular functions in addition to splicing.
Collapse
Affiliation(s)
- Margarida Gama-Carvalho
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisboa, Portugal
| | - Nuno L Barbosa-Morais
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisboa, Portugal
- Hutchison/MRC Research Centre, Department of Oncology, University of Cambridge, Hills Road, Cambridge CB2 0XZ, UK
| | - Alexander S Brodsky
- Department of Systems Biology, Harvard Medical School, 200 Longwood Ave, Alpert 536, Boston, MA 02115, USA
- Department of Molecular Biology, Cell Biology and Biochemistry and Center for Genomics & Proteomics, Brown University, 69 Brown Street, Providence, Rhode Island 02912, USA
| | - Pamela A Silver
- Department of Systems Biology, Harvard Medical School, 200 Longwood Ave, Alpert 536, Boston, MA 02115, USA
| | - Maria Carmo-Fonseca
- Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Av. Prof. Egas Moniz, 1649-028 Lisboa, Portugal
| |
Collapse
|
972
|
Tolentino HD, Matters MD, Walop W, Law B, Tong W, Liu F, Fontelo P, Kohl K, Payne DC. A UMLS-based spell checker for natural language processing in vaccine safety. BMC Med Inform Decis Mak 2007; 7:3. [PMID: 17295907 PMCID: PMC1805499 DOI: 10.1186/1472-6947-7-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2006] [Accepted: 02/12/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Institute of Medicine has identified patient safety as a key goal for health care in the United States. Detecting vaccine adverse events is an important public health activity that contributes to patient safety. Reports about adverse events following immunization (AEFI) from surveillance systems contain free-text components that can be analyzed using natural language processing. To extract Unified Medical Language System (UMLS) concepts from free text and classify AEFI reports based on concepts they contain, we first needed to clean the text by expanding abbreviations and shortcuts and correcting spelling errors. Our objective in this paper was to create a UMLS-based spelling error correction tool as a first step in the natural language processing (NLP) pipeline for AEFI reports. METHODS We developed spell checking algorithms using open source tools. We used de-identified AEFI surveillance reports to create free-text data sets for analysis. After expansion of abbreviated clinical terms and shortcuts, we performed spelling correction in four steps: (1) error detection, (2) word list generation, (3) word list disambiguation and (4) error correction. We then measured the performance of the resulting spell checker by comparing it to manual correction. RESULTS We used 12,056 words to train the spell checker and tested its performance on 8,131 words. During testing, sensitivity, specificity, and positive predictive value (PPV) for the spell checker were 74% (95% CI: 74-75), 100% (95% CI: 100-100), and 47% (95% CI: 46%-48%), respectively. CONCLUSION We created a prototype spell checker that can be used to process AEFI reports. We used the UMLS Specialist Lexicon as the primary source of dictionary terms and the WordNet lexicon as a secondary source. We used the UMLS as a domain-specific source of dictionary terms to compare potentially misspelled words in the corpus. The prototype sensitivity was comparable to currently available tools, but the specificity was much superior. The slow processing speed may be improved by trimming it down to the most useful component algorithms. Other investigators may find the methods we developed useful for cleaning text using lexicons specific to their area of interest.
Collapse
Affiliation(s)
- Herman D Tolentino
- Bacterial Vaccine-Preventable Diseases Branch, Epidemiology and Surveillance Division, National Immunization Program, Centers for Disease Control and Prevention, Atlanta GA, 30333, USA
- Public Health Informatics Fellowship Program, Office of Workforce and Career Development, Centers for Disease Control and Prevention, Atlanta GA, 30333, USA
| | - Michael D Matters
- Public Health Informatics Fellowship Program, Office of Workforce and Career Development, Centers for Disease Control and Prevention, Atlanta GA, 30333, USA
- Division for Heart Disease and Stroke Prevention, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta GA, 30341, USA
| | - Wikke Walop
- Immunization & Respiratory Infections Division, Centre for Infectious Disease Prevention & Control, Public Health Agency of Canada, Ottawa, Ontario K1A 0K9, Canada
| | - Barbara Law
- Immunization & Respiratory Infections Division, Centre for Infectious Disease Prevention & Control, Public Health Agency of Canada, Ottawa, Ontario K1A 0K9, Canada
| | - Wesley Tong
- Honours Biology and Pharmacology Programme, McMaster University, Hamilton, Ontario L8S 4L8, Canada
| | - Fang Liu
- Office of High Performance Computing and Communications, National Library of Medicine, National Institutes of Health, Bethesda MD, 20894, USA
| | - Paul Fontelo
- Office of High Performance Computing and Communications, National Library of Medicine, National Institutes of Health, Bethesda MD, 20894, USA
| | - Katrin Kohl
- Immunization Safety Office, Office of the Chief Science Officer, Centers for Disease Control and Prevention, Atlanta GA, 30333, USA
| | - Daniel C Payne
- Bacterial Vaccine-Preventable Diseases Branch, Epidemiology and Surveillance Division, National Immunization Program, Centers for Disease Control and Prevention, Atlanta GA, 30333, USA
| |
Collapse
|
973
|
Ortutay C, Siermala M, Vihinen M. Molecular characterization of the immune system: emergence of proteins, processes, and domains. Immunogenetics 2007; 59:333-48. [PMID: 17294181 DOI: 10.1007/s00251-007-0191-0] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 01/08/2007] [Indexed: 12/27/2022]
Abstract
Many genes and proteins are required to carry out the processes of innate and adaptive immunity. For many studies, including systems biology, it is necessary to have a clear and comprehensive definition of the immune system, including the genes and proteins that take part in immunological processes. We have identified and cataloged a large portion of the human immunology-related genes, which we call the essential immunome. The 847 identified genes and proteins were annotated, and their chromosomal localizations were compared to the mouse genome. Relation to disease was also taken into account. We identified numerous pseudogenes, many of which are expressed, and found two putative new genes. We also carried out an evolutionary analysis of immune processes based on gene orthologs to gain an overview of the evolutionary past and molecular present of the human immune system. A list of genes and proteins were compiled. A comprehensive characterization of the member genes and proteins, including the corresponding pseudogenes is presented. Immunome genes were found to have three types of emergence in independent studies of their ontologies, domains, and functions.
Collapse
Affiliation(s)
- Csaba Ortutay
- Institute of Medical Technology, University of Tampere, 33014, Tampere, Finland
| | | | | |
Collapse
|
974
|
Chang WJ, Addis VM, Li AJ, Axelsson E, Ardell DH, Landweber LF. Intron Evolution and Information processing in the DNA polymerase alpha gene in spirotrichous ciliates: a hypothesis for interconversion between DNA and RNA deletion. Biol Direct 2007; 2:6. [PMID: 17270054 PMCID: PMC1805493 DOI: 10.1186/1745-6150-2-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2006] [Accepted: 02/01/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The somatic DNA molecules of spirotrichous ciliates are present as linear chromosomes containing mostly single-gene coding sequences with short 5' and 3' flanking regions. Only a few conserved motifs have been found in the flanking DNA. Motifs that may play roles in promoting and/or regulating transcription have not been consistently detected. Moreover, comparing subtelomeric regions of 1,356 end-sequenced somatic chromosomes failed to identify more putatively conserved motifs. RESULTS We sequenced and compared DNA and RNA versions of the DNA polymerase alpha (pol alpha) gene from nine diverged spirotrichous ciliates. We identified a G-C rich motif aaTACCGC(G/C/T) upstream from transcription start sites in all nine pol alpha orthologs. Furthermore, we consistently found likely polyadenylation signals, similar to the eukaryotic consensus AAUAAA, within 35 nt upstream of the polyadenylation sites. Numbers of introns differed among orthologs, suggesting independent gain or loss of some introns during the evolution of this gene. Finally, we discuss the occurrence of short direct repeats flanking some introns in the DNA pol alpha genes. These introns flanked by direct repeats resemble a class of DNA sequences called internal eliminated sequences (IES) that are deleted from ciliate chromosomes during development. CONCLUSION Our results suggest that conserved motifs are present at both 5' and 3' untranscribed regions of the DNA pol alpha genes in nine spirotrichous ciliates. We also show that several independent gains and losses of introns in the DNA pol alpha genes have occurred in the spirotrichous ciliate lineage. Finally, our statistical results suggest that proven introns might also function in an IES removal pathway. This could strengthen a recent hypothesis that introns evolve into IESs, explaining the scarcity of introns in spirotrichs. Alternatively, the analysis suggests that ciliates might occasionally use intron splicing to correct, at the RNA level, failures in IES excision during developmental DNA elimination.
Collapse
Affiliation(s)
- Wei-Jen Chang
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
- Department of Biology, Hamilton College, Clinton, NY 13323, USA
| | - Victoria M Addis
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Anya J Li
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Elin Axelsson
- Linnaeus Centre for Bioinformatics, Uppsala University, Box 598, SE 751 24 Uppsala Sweden
| | - David H Ardell
- Linnaeus Centre for Bioinformatics, Uppsala University, Box 598, SE 751 24 Uppsala Sweden
| | - Laura F Landweber
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
975
|
Church SA, Livingstone K, Lai Z, Kozik A, Knapp SJ, Michelmore RW, Rieseberg LH. Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences. J Mol Evol 2007; 64:171-80. [PMID: 17200807 DOI: 10.1007/s00239-005-0299-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2005] [Accepted: 10/03/2006] [Indexed: 10/23/2022]
Abstract
Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.
Collapse
Affiliation(s)
- Sheri A Church
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.
| | | | | | | | | | | | | |
Collapse
|
976
|
Analysis of 13000 unique Citrus clusters associated with fruit quality, production and salinity tolerance. BMC Genomics 2007; 8:31. [PMID: 17254327 PMCID: PMC1796867 DOI: 10.1186/1471-2164-8-31] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2006] [Accepted: 01/25/2007] [Indexed: 12/19/2022] Open
Abstract
Background Improvement of Citrus, the most economically important fruit crop in the world, is extremely slow and inherently costly because of the long-term nature of tree breeding and an unusual combination of reproductive characteristics. Aside from disease resistance, major commercial traits in Citrus are improved fruit quality, higher yield and tolerance to environmental stresses, especially salinity. Results A normalized full length and 9 standard cDNA libraries were generated, representing particular treatments and tissues from selected varieties (Citrus clementina and C. sinensis) and rootstocks (C. reshni, and C. sinenis × Poncirus trifoliata) differing in fruit quality, resistance to abscission, and tolerance to salinity. The goal of this work was to provide a large expressed sequence tag (EST) collection enriched with transcripts related to these well appreciated agronomical traits. Towards this end, more than 54000 ESTs derived from these libraries were analyzed and annotated. Assembly of 52626 useful sequences generated 15664 putative transcription units distributed in 7120 contigs, and 8544 singletons. BLAST annotation produced significant hits for more than 80% of the hypothetical transcription units and suggested that 647 of these might be Citrus specific unigenes. The unigene set, composed of ~13000 putative different transcripts, including more than 5000 novel Citrus genes, was assigned with putative functions based on similarity, GO annotations and protein domains Conclusion Comparative genomics with Arabidopsis revealed the presence of putative conserved orthologs and single copy genes in Citrus and also the occurrence of both gene duplication events and increased number of genes for specific pathways. In addition, phylogenetic analysis performed on the ammonium transporter family and glycosyl transferase family 20 suggested the existence of Citrus paralogs. Analysis of the Citrus gene space showed that the most important metabolic pathways known to affect fruit quality were represented in the unigene set. Overall, the similarity analyses indicated that the sequences of the genes belonging to these varieties and rootstocks were essentially identical, suggesting that the differential behaviour of these species cannot be attributed to major sequence divergences. This Citrus EST assembly contributes both crucial information to discover genes of agronomical interest and tools for genetic and genomic analyses, such as the development of new markers and microarrays.
Collapse
|
977
|
Burman C, Maqueira B, Coadwell J, Evans PD. Eleven new putative aminergic G-protein coupled receptors from Amphioxus (Branchiostoma floridae): identification, sequence analysis and phylogenetic relationship. INVERTEBRATE NEUROSCIENCE 2007; 7:87-98. [PMID: 17225134 DOI: 10.1007/s10158-006-0041-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2006] [Accepted: 12/19/2006] [Indexed: 11/29/2022]
Abstract
We have identified eleven novel aminergic-like G-protein coupled receptor (GPCRs) sequences (named AmphiAmR1-11) by searching the genomic trace sequence database for the amphioxus species, Branchiostoma floridae. They share many of the structural motifs that have been used to characterize vertebrate and invertebrate aminergic GPCRs. A preliminary classification of these receptors has been carried out using both BLAST and Hidden Markov Model analyses. The amphioxus genome appears to express a number of D1-like dopamine receptor sequences, including one related to insect dopamine receptors. It also expresses a number of receptors that resemble invertebrate octopamine/tyramine receptors and others that resemble vertebrate alpha-adrenergic receptors. Amphioxus also expresses receptors that resemble vertebrate histamine receptors. Several of the novel receptor sequences have been identified in amphioxus cDNA libraries from a number of tissues.
Collapse
Affiliation(s)
- Chloe Burman
- The Inositide Laboratory, The Babraham Institute, Cambridge, UK
| | | | | | | |
Collapse
|
978
|
Batley J, Jewell E, Edwards D. Automated discovery of single nucleotide polymorphism and simple sequence repeat molecular genetic markers. Methods Mol Biol 2007; 406:473-94. [PMID: 18287708 DOI: 10.1007/978-1-59745-535-0_23] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Molecular genetic markers represent one of the most powerful tools for the analysis of genomes. Molecular marker technology has developed rapidly over the last decade, and two forms of sequence-based markers, simple sequence repeats (SSRs), also known as microsatellites, and single nucleotide polymorphisms (SNPs), now predominate applications in modern genetic analysis. The availability of large sequence data sets permits mining for SSRs and SNPs, which may then be applied to genetic trait mapping and marker-assisted selection. Here, we describe Web-based automated methods for the discovery of these SSRs and SNPs from sequence data. SSRPrimer enables the real-time discovery of SSRs within submitted DNA sequences, with the concomitant design of PCR primers for SSR amplification. Alternatively, users may browse the SSR Taxonomy Tree to identify predetermined SSR amplification primers for any species represented within the GenBank database. SNPServer uses a redundancy-based approach to identify SNPs within DNA sequence data. Following submission of a sequence of interest, SNPServer uses BLAST to identify similar sequences, CAP3 to cluster and assemble these sequences, and then the SNP discovery software autoSNP to detect SNPs and insertion/deletion (indel) polymorphisms.
Collapse
Affiliation(s)
- Jacqueline Batley
- Australian Centre for Plant Functional Genomics, School of Land, Crop and Food Sciences and ARC Centre of Excellence for Intergrative Legume Research, CILR, The University of Queensland, Brisbane, Australia
| | | | | |
Collapse
|
979
|
Kim KH, Cho Y, LA Rota M, Cramer RA, Lawrence CB. Functional analysis of the Alternaria brassicicola non-ribosomal peptide synthetase gene AbNPS2 reveals a role in conidial cell wall construction. MOLECULAR PLANT PATHOLOGY 2007; 8:23-39. [PMID: 20507476 DOI: 10.1111/j.1364-3703.2006.00366.x] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
SUMMARY Alternaria brassicicola is a necrotrophic pathogen causing black spot disease on virtually all cultivated Brassica crops worldwide. In many plant pathosystems fungal secondary metabolites derived from non-ribosomal peptide synthetases (NPSs) are phytotoxic virulence factors or are antibiotics thought to be important for niche competition with other micro-organisms. However, many of the functions of NPS genes and their products are largely unknown. In this study, we investigated the function of one of the A. brassicicola NPS genes, AbNPS2. The predicted amino acid sequence of AbNPS2 showed high sequence similarity with A. brassicae, AbrePsy1, Cochliobolus heterostrophus, NPS4 and a Stagonospora nodorum NPS. The AbNPS2 open reading frame was predicted to be 22 kb in length and encodes a large protein (7195 amino acids) showing typical NPS modular organization. Gene expression analysis of AbNPS2 in wild-type fungus indicated that it is expressed almost exclusively in conidia and conidiophores, broadly in the reproductive developmental phase. AbNPS2 gene disruption mutants showed abnormal spore cell wall morphology and a decreased hydrophobicity phenotype. Conidia of abnps2 mutants displayed an aberrantly inflated cell wall and an increase in lipid bodies compared with wild-type. Further phenotypic analyses of abnps2 mutants showed decreased spore germination rates both in vitro and in vivo, and a marked reduction in sporulation in vivo compared with wild-type fungus. Moreover, virulence tests on Brassicas with abnps2 mutants revealed a significant reduction in lesion size compared with wild-type but only when aged spores were used in experiments. Collectively, these results indicate that AbNPS2 plays an important role in development and virulence.
Collapse
Affiliation(s)
- Kwang-Hyung Kim
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA
| | | | | | | | | |
Collapse
|
980
|
Crabtree J, Angiuoli SV, Wortman JR, White OR. Sybil: methods and software for multiple genome comparison and visualization. Methods Mol Biol 2007; 408:93-108. [PMID: 18314579 DOI: 10.1007/978-1-59745-547-3_6] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
With the successful completion of genome sequencing projects for a variety of model organisms, the selection of candidate organisms for future sequencing efforts has been guided increasingly by a desire to enable comparative genomics. This trend has both depended on and encouraged the development of software tools that can elucidate and capitalize on the similarities and differences between genomes. "Sybil," one such tool, is a primarily web-based software package whose primary goal is to facilitate the analysis and visualization of comparative genome data, with a particular emphasis on protein and gene cluster data. Herein, a two-phase protein clustering algorithm, used to generate protein clusters suitable for analysis through Sybil and a method for creating graphical displays of protein or gene clusters that span multiple genomes are described. When combined, these two relatively simple techniques provide the user of the Sybil software (The Institute for Genomic Research [TIGR] Bioinformatics Department) with a browsable graphical display of his or her "input" genomes, showing which genes are conserved based on the parameters supplied to the protein clustering algorithm. For any given protein cluster the graphical display consists of a local alignment of the genomes in which the clustered genes are located. The genomes are arranged in a vertical stack, as in a multiple alignment, and shaded areas are used to connect genes in the same cluster, thus displaying conservation at the protein level in the context of the underlying genomic sequences. The authors have found this display-and slight variants thereof-useful for a variety of annotation and comparison tasks, ranging from identifying "missed" gene models or single-exon discrepancies between orthologous genes, to finding large or small regions of conserved gene synteny, and investigating the properties of the breakpoints between such regions.
Collapse
|
981
|
Stajich JE, Dietrich FS, Roy SW. Comparative genomic analysis of fungal genomes reveals intron-rich ancestors. Genome Biol 2007; 8:R223. [PMID: 17949488 PMCID: PMC2246297 DOI: 10.1186/gb-2007-8-10-r223] [Citation(s) in RCA: 105] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2006] [Revised: 10/12/2007] [Accepted: 10/19/2007] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Eukaryotic protein-coding genes are interrupted by spliceosomal introns, which are removed from transcripts before protein translation. Many facets of spliceosomal intron evolution, including age, mechanisms of origins, the role of natural selection, and the causes of the vast differences in intron number between eukaryotic species, remain debated. Genome sequencing and comparative analysis has made possible whole genome analysis of intron evolution to address these questions. RESULTS We analyzed intron positions in 1,161 sets of orthologous genes across 25 eukaryotic species. We find strong support for an intron-rich fungus-animal ancestor, with more than four introns per kilobase, comparable to the highest known modern intron densities. Indeed, the fungus-animal ancestor is estimated to have had more introns than any of the extant fungi in this study. Thus, subsequent fungal evolution has been characterized by widespread and recurrent intron loss occurring in all fungal clades. These results reconcile three previously proposed methods for estimation of ancestral intron number, which previously gave very different estimates of ancestral intron number for eight eukaryotic species, as well as a fourth more recent method. We do not find a clear inverse correspondence between rates of intron loss and gain, contrary to the predictions of selection-based proposals for interspecific differences in intron number. CONCLUSION Our results underscore the high intron density of eukaryotic ancestors and the widespread importance of intron loss through eukaryotic evolution.
Collapse
Affiliation(s)
- Jason E Stajich
- Department of Molecular Genetics and Microbiology, Center for Genome Technology, Institute for Genome Science and Policy, Duke University, Durham, NC 27710, USA
- Miller Institute for Basic Research and Department of Plant and Microbial Biology, 111 Koshland Hall #3102, University of California, Berkeley, CA 94720-3102, USA
| | - Fred S Dietrich
- Department of Molecular Genetics and Microbiology, Center for Genome Technology, Institute for Genome Science and Policy, Duke University, Durham, NC 27710, USA
| | - Scott W Roy
- Department of Molecular Genetics and Microbiology, Center for Genome Technology, Institute for Genome Science and Policy, Duke University, Durham, NC 27710, USA
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| |
Collapse
|
982
|
Designing primers for whole genome PCR scanning using the software package GenoFrag: a software package for the design of primers dedicated to whole-genome scanning by LR-PCR. Methods Mol Biol 2007; 402:349-68. [PMID: 17951805 DOI: 10.1007/978-1-59745-528-2_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Whole-genome polymerase chain reaction (PCR) scanning (WGPS) is based on the PCR amplification of small-sized chromosomes (e.g., bacterial chromosomes) by long-range PCR with a set of primers designed using a reference strain and applied to amplify several other strains. Such an approach of genome variability has specific requirements for the selection of primers and the design of primer pairs for the optimal coverage of the chromosome. To facilitate such analysis, we have developed GenoFrag, a software package for the design of primers optimized for whole-genome scanning by long-range PCR. GenoFrag works in a two-step procedure: first, a list of primers is selected according to the basic criteria, and second, the list of primer candidates is used for the coverage of the whole chromosome. These two steps are presented here with a part of the algorithm scripts developed for this software. Examples of what can be done using GenoFrag are illustrated by results obtained from the online version of the software. GenoFrag has already been validated in long-range (LR)-PCR experiment on several bacterial species. It is a robust and reliable tool for primer design for WGPS.
Collapse
|
983
|
Abstract
The BioPerl toolkit provides a library of hundreds of routines for processing sequence, annotation, alignment, and sequence analysis reports. It often serves as a bridge between different computational biology applications assisting the user to construct analysis pipelines. This chapter illustrates how BioPerl facilitates tasks such as writing scripts summarizing information from BLAST reports or extracting key annotation details from a GenBank sequence record.
Collapse
|
984
|
Abstract
cTrans is a comprehensive utility used to generate polypeptide databases from cDNA sequences. The goal is achieved through integrating four main functions, including retrieving sequences of species of interest from the downloaded packages from dbEST of GenBank, format conversion, checking and deleting vector and adaptor contamination, and translating the cDNA sequences in all six frames and selecting specific translations for database construction in a user-defined length threshold. In addition, this utility is also applicable to cDNA sequences produced by users themselves.
Collapse
Affiliation(s)
- Haibin Xu
- The Applied Plant Genomics Laboratory & National Key Laboratory for Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Jiangsu, P.R. China
| | | | | | | | | |
Collapse
|
985
|
Jones M, Blaxter M. TaxMan: a taxonomic database manager. BMC Bioinformatics 2006; 7:536. [PMID: 17176465 PMCID: PMC1766369 DOI: 10.1186/1471-2105-7-536] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2006] [Accepted: 12/18/2006] [Indexed: 11/10/2022] Open
Abstract
Background Phylogenetic analysis of large, multiple-gene datasets, assembled from public sequence databases, is rapidly becoming a popular way to approach difficult phylogenetic problems. Supermatrices (concatenated multiple sequence alignments of multiple genes) can yield more phylogenetic signal than individual genes. However, manually assembling such datasets for a large taxonomic group is time-consuming and error-prone. Additionally, sequence curation, alignment and assessment of the results of phylogenetic analysis are made particularly difficult by the potential for a given gene in a given species to be unrepresented, or to be represented by multiple or partial sequences. We have developed a software package, TaxMan, that largely automates the processes of sequence acquisition, consensus building, alignment and taxon selection to facilitate this type of phylogenetic study. Results TaxMan uses freely available tools to allow rapid assembly, storage and analysis of large, aligned DNA and protein sequence datasets for user-defined sets of species and genes. The user provides GenBank format files and a list of gene names and synonyms for the loci to analyse. Sequences are extracted from the GenBank files on the basis of annotation and sequence similarity. Consensus sequences are built automatically. Alignment is carried out (where possible, at the protein level) and aligned sequences are stored in a database. TaxMan can automatically determine the best subset of taxa to examine phylogeny at a given taxonomic level. By using the stored aligned sequences, large concatenated multiple sequence alignments can be generated rapidly for a subset and output in analysis-ready file formats. Trees resulting from phylogenetic analysis can be stored and compared with a reference taxonomy. Conclusion TaxMan allows rapid automated assembly of a multigene datasets of aligned sequences for large taxonomic groups. By extracting sequences on the basis of both annotation and BLAST similarity, it ensures that all available sequence data can be brought to bear on a phylogenetic problem, but remains fast enough to cope with many thousands of records. By automatically assisting in the selection of the best subset of taxa to address a particular phylogenetic problem, TaxMan greatly speeds up the process of generating multiple sequence alignments for phylogenetic analysis. Our results indicate that an automated phylogenetic workbench can be a useful tool when correctly guided by user knowledge.
Collapse
Affiliation(s)
- Martin Jones
- Institute of Evolutionary Biology, King's Buildings, Ashworth Laboratories, West Mains Road, Edinburgh EH9 3JT, UK
| | - Mark Blaxter
- Institute of Evolutionary Biology, King's Buildings, Ashworth Laboratories, West Mains Road, Edinburgh EH9 3JT, UK
| |
Collapse
|
986
|
Krause L, McHardy AC, Nattkemper TW, Pühler A, Stoye J, Meyer F. GISMO--gene identification using a support vector machine for ORF classification. Nucleic Acids Res 2006; 35:540-9. [PMID: 17175534 PMCID: PMC1802617 DOI: 10.1093/nar/gkl1083] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license.
Collapse
Affiliation(s)
- Lutz Krause
- Center for Biotechnology, Bielefeld University (CeBiTec), D-33594 Bielefeld, Germany.
| | | | | | | | | | | |
Collapse
|
987
|
Bazykin GA, Dushoff J, Levin SA, Kondrashov AS. Bursts of nonsynonymous substitutions in HIV-1 evolution reveal instances of positive selection at conservative protein sites. Proc Natl Acad Sci U S A 2006; 103:19396-401. [PMID: 17164328 PMCID: PMC1698441 DOI: 10.1073/pnas.0609484103] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The fixation of a new allele can be driven by Darwinian positive selection or can be due to random genetic drift. Identifying instances of positive selection is a difficult task, because its impact is routinely obscured by the action of negative selection. The nature of the genetic code dictates that positive selection in favor of an amino acid replacement should often cause a burst of two or three nucleotide substitutions at a single codon site, because a large fraction of amino acid replacements cannot be achieved after just one nucleotide substitution. Here, we study pairs of successive nonsynonymous substitutions at one codon in the course of evolution of HIV-1 genes within HIV-1 populations inhabiting infected individuals. Such pairs are more numerous and more clumped than expected if different substitutions were independent and than what is observed for pairs of successive synonymous substitutions. Bursts of nonsynonymous substitutions in HIV-1 evolution cannot be explained by mutational biases and must, therefore, be due to positive selection. Both reversals, exact or imprecise, of fixed deleterious mutations and acquisitions of amino acids with new properties are responsible for the bursts. Temporal clumping is strongest at codon sites with a low overall rate of nonsynonymous evolution, implying that a substantial fraction of replacements of conservative amino acids are driven by positive selection. We identified many conservative sites of HIV-1 proteins that occasionally experience positive selection.
Collapse
Affiliation(s)
- Georgii A. Bazykin
- *Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544; and
| | - Jonathan Dushoff
- *Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544; and
| | - Simon A. Levin
- *Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544; and
- To whom correspondence may be addressed. E-mail:
or
| | - Alexey S. Kondrashov
- Life Sciences Institute, University of Michigan, 210 Washtenaw Avenue, Ann Arbor, MI 48109-2216
- To whom correspondence may be addressed. E-mail:
or
| |
Collapse
|
988
|
Fast sequence evolution of Hox and Hox-derived genes in the genus Drosophila. BMC Evol Biol 2006; 6:106. [PMID: 17163987 PMCID: PMC1764764 DOI: 10.1186/1471-2148-6-106] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Accepted: 12/12/2006] [Indexed: 12/02/2022] Open
Abstract
Background It is expected that genes that are expressed early in development and have a complex expression pattern are under strong purifying selection and thus evolve slowly. Hox genes fulfill these criteria and thus, should have a low evolutionary rate. However, some observations point to a completely different scenario. Hox genes are usually highly conserved inside the homeobox, but very variable outside it. Results We have measured the rates of nucleotide divergence and indel fixation of three Hox genes, labial (lab), proboscipedia (pb) and abdominal-A (abd-A), and compared them with those of three genes derived by duplication from Hox3, bicoid (bcd), zerknüllt (zen) and zerknüllt-related (zen2), and 15 non-Hox genes in sets of orthologous sequences of three species of the genus Drosophila. These rates were compared to test the hypothesis that Hox genes evolve slowly. Our results show that the evolutionary rate of Hox genes is higher than that of non-Hox genes when both amino acid differences and indels are taken into account: 43.39% of the amino acid sequence is altered in Hox genes, versus 30.97% in non-Hox genes and 64.73% in Hox-derived genes. Microsatellites scattered along the coding sequence of Hox genes explain partially, but not fully, their fast sequence evolution. Conclusion These results show that Hox genes have a higher evolutionary dynamics than other developmental genes, and emphasize the need to take into account indels in addition to nucleotide substitutions in order to accurately estimate evolutionary rates.
Collapse
|
989
|
Giles SS, Stajich JE, Nichols C, Gerrald QD, Alspaugh JA, Dietrich F, Perfect JR. The Cryptococcus neoformans catalase gene family and its role in antioxidant defense. EUKARYOTIC CELL 2006; 5:1447-59. [PMID: 16963629 PMCID: PMC1563583 DOI: 10.1128/ec.00098-06] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
In the present study, we sought to elucidate the contribution of the Cryptococcus neoformans catalase gene family to antioxidant defense. We employed bioinformatics techniques to identify four members of the C. neoformans catalase gene family and created mutants lacking single or multiple catalase genes. Based on a phylogenetic analysis, CAT1 and CAT3 encode putative spore-specific catalases, CAT2 encodes a putative peroxisomal catalase, and CAT4 encodes a putative cytosolic catalase. Only Cat1 exhibited detectable biochemical activity in vitro, and Cat1 activity was constitutive in the yeast form of this organism. Although they were predicted to be important in spores, neither CAT1 nor CAT3 was essential for mating or spore viability. Consistent with previous studies of Saccharomyces cerevisiae, the single (cat1, cat2, cat3, and cat4) and quadruple (cat1 cat2 cat3 cat4) catalase mutant strains exhibited no oxidative-stress phenotypes under conditions in which either exogenous or endogenous levels of reactive oxygen species were elevated. In addition, there were no significant differences in the mean times to mortality between groups of mice infected with C. neoformans catalase mutant strains (the cat1 and cat1 cat2 cat3 cat4 mutants) and those infected with wild-type strain H99. We conclude from the results of this study that C. neoformans possesses a robust antioxidant system, composed of functionally overlapping and compensatory components that provide protection against endogenous and exogenous oxidative stresses.
Collapse
Affiliation(s)
- Steven S Giles
- Department of Cell Biology, Duke University Medical Center, Durham, NC 27710, USA
| | | | | | | | | | | | | |
Collapse
|
990
|
Abstract
Here we introduce a computer database that allows for the rapid retrieval of physicochemical properties, Gene Ontology, and Kyoto Encyclopedia of Genes and Genomes information about a protein or a list of proteins. We applied PIGOK analyzing Schizosaccharomyces pombe proteins displaying differential expression under oxidative stress and identified their biological functions and pathways. The database is available on the Internet at http://pc4-133.ludwig.ucl.ac.uk/pigok.html.
Collapse
Affiliation(s)
- Richard J Jacob
- Department of Biochemistry, University College London, Gower Street, London WC1E 6BT, United Kingdom
| | | |
Collapse
|
991
|
del Val C, Kuryshev VY, Glatting KH, Ernst P, Hotz-Wagenblatt A, Poustka A, Suhai S, Wiemann S. CAFTAN: a tool for fast mapping, and quality assessment of cDNAs. BMC Bioinformatics 2006; 7:473. [PMID: 17064411 PMCID: PMC1636072 DOI: 10.1186/1471-2105-7-473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 10/25/2006] [Indexed: 11/10/2022] Open
Abstract
Background The German cDNA Consortium has been cloning full length cDNAs and continued with their exploitation in protein localization experiments and cellular assays. However, the efficient use of large cDNA resources requires the development of strategies that are capable of a speedy selection of truly useful cDNAs from biological and experimental noise. To this end we have developed a new high-throughput analysis tool, CAFTAN, which simplifies these efforts and thus fills the gap between large-scale cDNA collections and their systematic annotation and application in functional genomics. Results CAFTAN is built around the mapping of cDNAs to the genome assembly, and the subsequent analysis of their genomic context. It uses sequence features like the presence and type of PolyA signals, inner and flanking repeats, the GC-content, splice site types, etc. All these features are evaluated in individual tests and classify cDNAs according to their sequence quality and likelihood to have been generated from fully processed mRNAs. Additionally, CAFTAN compares the coordinates of mapped cDNAs with the genomic coordinates of reference sets from public available resources (e.g., VEGA, ENSEMBL). This provides detailed information about overlapping exons and the structural classification of cDNAs with respect to the reference set of splice variants. The evaluation of CAFTAN showed that is able to correctly classify more than 85% of 5950 selected "known protein-coding" VEGA cDNAs as high quality multi- or single-exon. It identified as good 80.6 % of the single exon cDNAs and 85 % of the multiple exon cDNAs. The program is written in Perl and in a modular way, allowing the adoption of this strategy to other tasks like EST-annotation, or to extend it by adding new classification rules and new organism databases as they become available. We think that it is a very useful program for the annotation and research of unfinished genomes. Conclusion CAFTAN is a high-throughput sequence analysis tool, which performs a fast and reliable quality prediction of cDNAs. Several thousands of cDNAs can be analyzed in a short time, giving the curator/scientist a first quick overview about the quality and the already existing annotation of a set of cDNAs. It supports the rejection of low quality cDNAs and helps in the selection of likely novel splice variants, and/or completely novel transcripts for new experiments.
Collapse
Affiliation(s)
- Coral del Val
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
- Dept. Computer Science and Artificial Intelligence, ETSI Informatics University of Granada, C/Daniel Saucedo Aranda s/n 18071, Granada, Spain
| | - Vladimir Yurjevich Kuryshev
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Karl-Heinz Glatting
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Peter Ernst
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Agnes Hotz-Wagenblatt
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Annemarie Poustka
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Sandor Suhai
- DKFZ, German Cancer Research Center, Division Molecular Biophysics, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| | - Stefan Wiemann
- DKFZ, German Cancer Research Center, Division of Molecular Genome Analysis, Im Neuenheimer Feld 580, D-69120 Heidelberg, Germany
| |
Collapse
|
992
|
Naito K, Cho E, Yang G, Campbell MA, Yano K, Okumoto Y, Tanisaka T, Wessler SR. Dramatic amplification of a rice transposable element during recent domestication. Proc Natl Acad Sci U S A 2006; 103:17620-5. [PMID: 17101970 PMCID: PMC1693796 DOI: 10.1073/pnas.0605421103] [Citation(s) in RCA: 166] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2006] [Indexed: 11/18/2022] Open
Abstract
Despite the prevalence of transposable elements in the genomes of higher eukaryotes, what is virtually unknown is how they amplify to very high copy numbers without killing their host. Here, we report the discovery of rice strains where a miniature inverted-repeat transposable element (mPing) has amplified from approximately 50 to approximately 1,000 copies in four rice strains. We characterized 280 of the insertions and found that 70% were within 5 kb of coding regions but that insertions into exons and introns were significantly underrepresented. Further analyses of gene expression and transposable-element activity demonstrate that the ability of mPing to attain high copy numbers is because of three factors: (i) the rapid selection against detrimental insertions, (ii) the neutral or minimal effect of the remaining insertions on gene transcription, and (iii) the continued mobility of mPingelements in strains that already have > 1,000 copies. The rapid increase in mPing copy number documented in this study represents a potentially valuable source of population diversity in self-fertilizing plants like rice.
Collapse
Affiliation(s)
- Ken Naito
- *Department of Plant Biology, University of Georgia, Athens, GA 30602; and
- Division of Agronomy and Horticulture Science, Graduate School of Agriculture, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502, Japan
| | - Eunyoung Cho
- *Department of Plant Biology, University of Georgia, Athens, GA 30602; and
| | - Guojun Yang
- *Department of Plant Biology, University of Georgia, Athens, GA 30602; and
| | - Matthew A Campbell
- *Department of Plant Biology, University of Georgia, Athens, GA 30602; and
| | - Kentaro Yano
- Division of Agronomy and Horticulture Science, Graduate School of Agriculture, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502, Japan
| | - Yutaka Okumoto
- Division of Agronomy and Horticulture Science, Graduate School of Agriculture, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502, Japan
| | - Takatoshi Tanisaka
- Division of Agronomy and Horticulture Science, Graduate School of Agriculture, Kyoto University, Kitashirakawa, Sakyo-ku, Kyoto 606-8502, Japan
| | - Susan R. Wessler
- *Department of Plant Biology, University of Georgia, Athens, GA 30602; and
| |
Collapse
|
993
|
Abstract
POGs/PlantRBP (http://plantrbp.uoregon.edu/) is a relational database that integrates data from rice, Arabidopsis, and maize by placing the complete Arabidopsis and rice proteomes and available maize sequences into 'putative orthologous groups' (POGs). Annotation efforts will focus on predicted RNA binding proteins (RBPs): i.e. those with known RNA binding domains or otherwise implicated in RNA function. POGs form the heart of the database, and were assigned using a mutual-best-hit-strategy after performing BLAST comparisons of the predicted Arabidopsis and rice proteomes. Each POG entry includes orthologs in Arabidopsis and rice, annotated with domain organization, gene models, phylogenetic trees, and multiple intracellular targeting predictions. A graphical display maps maize sequences on to their most similar rice gene model. The database can be queried using any combination of gene name, accession, domain, and predicted intracellular location, or using BLAST. Useful features of the database include the ability to search for proteins with both a specified domain content and intracellular location, the concurrent display of mutual best hits and phylogenetic trees which facilitates evaluation of POG assignments, the association of maize sequences with POGs, and the display of targeting predictions and domain organization for all POG members, which reveals consistency, or lack thereof, of those predictions.
Collapse
Affiliation(s)
| | | | - Alice Barkan
- To whom correspondence should be addressed. Tel: +1 541 346 5145; Fax: +1 541 346 5891;
| |
Collapse
|
994
|
Kazakov AE, Cipriano MJ, Novichkov PS, Minovitsky S, Vinogradov DV, Arkin A, Mironov AA, Gelfand MS, Dubchak I. RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes. Nucleic Acids Res 2006; 35:D407-12. [PMID: 17142223 PMCID: PMC1669780 DOI: 10.1093/nar/gkl865] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
RegTransBase is a manually curated database of regulatory interactions in prokaryotes that captures the knowledge in public scientific literature using a controlled vocabulary. Although several databases describing interactions between regulatory proteins and their binding sites are already being maintained, they either focus mostly on the model organisms Escherichia coli and Bacillus subtilis or are entirely computationally derived. RegTransBase describes a large number of regulatory interactions reported in many organisms and contains the following types of experimental data: the activation or repression of transcription by an identified direct regulator, determining the transcriptional regulatory function of a protein (or RNA) directly binding to DNA (RNA), mapping or prediction of a binding site for a regulatory protein and characterization of regulatory mutations. Currently, RegTransBase content is derived from about 3000 relevant articles describing over 7000 experiments in relation to 128 microbes. It contains data on the regulation of about 7500 genes and evidence for 6500 interactions with 650 regulators. RegTransBase also contains manually created position weight matrices (PWM) that can be used to identify candidate regulatory sites in over 60 species. RegTransBase is available at .
Collapse
Affiliation(s)
- Alexei E. Kazakov
- Institute for Information Transmission Problems, RAS. Bolshoi Karetny pereulok 19Moscow, 127994, Russia
| | - Michael J. Cipriano
- Lawrence Berkeley National Laboratory, 1 Cyclotron RoadBerkeley, CA 94720, USA
| | - Pavel S. Novichkov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBuilding 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Simon Minovitsky
- Lawrence Berkeley National Laboratory, 1 Cyclotron RoadBerkeley, CA 94720, USA
| | - Dmitry V. Vinogradov
- Institute for Information Transmission Problems, RAS. Bolshoi Karetny pereulok 19Moscow, 127994, Russia
| | - Adam Arkin
- Lawrence Berkeley National Laboratory, 1 Cyclotron RoadBerkeley, CA 94720, USA
- Howard Hughes Medical Institute4000 Jones Bridge Road Chevy Chase, MD 20815-6789, USA
- Department of Bioengineering, University of CaliforniaBerkeley, CA, 94710, USA
- Virtual Institute of Microbial Stress and Survival, BerkeleyCA, 94710, USA
| | - Andrey A. Mironov
- Institute for Information Transmission Problems, RAS. Bolshoi Karetny pereulok 19Moscow, 127994, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State UniversityVorobievy Gory 1-73, Moscow 119992, Russia
- State Research Center GosNIIGenetika. 1-j Dorozhny proezd 1Moscow, 117545, Russia
| | - Mikhail S. Gelfand
- Institute for Information Transmission Problems, RAS. Bolshoi Karetny pereulok 19Moscow, 127994, Russia
- Faculty of Bioengineering and Bioinformatics, Moscow State UniversityVorobievy Gory 1-73, Moscow 119992, Russia
- State Research Center GosNIIGenetika. 1-j Dorozhny proezd 1Moscow, 117545, Russia
| | - Inna Dubchak
- Lawrence Berkeley National Laboratory, 1 Cyclotron RoadBerkeley, CA 94720, USA
- Department of Energy Joint Genome Institute2800 Mitchell Drive,Walnut Creek, CA 94598, USA
- To whom correspondence should be addressed. Tel: +1 510 495 2419; Fax: +1 510 486 5614;
| |
Collapse
|
995
|
Beuming T, Shi L, Javitch JA, Weinstein H. A comprehensive structure-based alignment of prokaryotic and eukaryotic neurotransmitter/Na+ symporters (NSS) aids in the use of the LeuT structure to probe NSS structure and function. Mol Pharmacol 2006; 70:1630-42. [PMID: 16880288 DOI: 10.1124/mol.106.026120] [Citation(s) in RCA: 230] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The recently elucidated crystal structure of a prokaryotic member of the neurotransmitter/sodium symporter (NSS) family (Yamashita et al., 2005) is a major advance toward understanding structure-function relationships in this important class of transporters. To aid in the generalization of these results, we present here a comprehensive sequence alignment of all known prokaryotic and eukaryotic NSS proteins, based on the crystal structure of the leucine transporter from Aquifex aeolicus (LeuT). Regions of low sequence identity between prokaryotic and eukaryotic transporters were aligned with the aid of a number of bioinformatics tools, and the resulting alignments were validated by comparison with experimental data. In a number of regions, including the transmembrane segments 4, 5, and 9 as well as extracellular loops 2, 3, and 4, our alignment differs from the one proposed previously [Nature (Lond) 437: 215-223, 2005]. Important similarities and differences among the sequences of NSS proteins in regions likely to determine selectivity in substrate binding and mechanisms of transport regulation are discussed in the context of the LeuT structure and the alignment.
Collapse
Affiliation(s)
- Thijs Beuming
- Department of Physiology and Biophysics, and HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | | | | | |
Collapse
|
996
|
Ibrahim AEK, Thorne NP, Baird K, Barbosa-Morais NL, Tavaré S, Collins VP, Wyllie AH, Arends MJ, Brenton JD. MMASS: an optimized array-based method for assessing CpG island methylation. Nucleic Acids Res 2006; 34:e136. [PMID: 17041235 PMCID: PMC1635254 DOI: 10.1093/nar/gkl551] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2006] [Revised: 05/10/2006] [Accepted: 07/14/2006] [Indexed: 12/31/2022] Open
Abstract
We describe an optimized microarray method for identifying genome-wide CpG island methylation called microarray-based methylation assessment of single samples (MMASS) which directly compares methylated to unmethylated sequences within a single sample. To improve previous methods we used bioinformatic analysis to predict an optimized combination of methylation-sensitive enzymes that had the highest utility for CpG-island probes and different methods to produce unmethylated representations of test DNA for more sensitive detection of differential methylation by hybridization. Subtraction or methylation-dependent digestion with McrBC was used with optimized (MMASS-v2) or previously described (MMASS-v1, MMASS-sub) methylation-sensitive enzyme combinations and compared with a published McrBC method. Comparison was performed using DNA from the cell line HCT116. We show that the distribution of methylation microarray data is inherently skewed and requires exogenous spiked controls for normalization and that analysis of digestion of methylated and unmethylated control sequences together with linear fit models of replicate data showed superior statistical power for the MMASS-v2 method. Comparison with previous methylation data for HCT116 and validation of CpG islands from PXMP4, SFRP2, DCC, RARB and TSEN2 confirmed the accuracy of MMASS-v2 results. The MMASS-v2 method offers improved sensitivity and statistical power for high-throughput microarray identification of differential methylation.
Collapse
Affiliation(s)
- Ashraf E K Ibrahim
- Department of Pathology, Division of Molecular Histopathology, Addenbrooke's Hospital Hills Road, Cambridge CB2 2XZ, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
997
|
Huang J, Gutteridge A, Honda W, Kanehisa M. MIMOX: a web tool for phage display based epitope mapping. BMC Bioinformatics 2006; 7:451. [PMID: 17038191 PMCID: PMC1618411 DOI: 10.1186/1471-2105-7-451] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2006] [Accepted: 10/12/2006] [Indexed: 11/22/2022] Open
Abstract
Background Phage display is widely used in basic research such as the exploration of protein-protein interaction sites and networks, and applied research such as the development of new drugs, vaccines, and diagnostics. It has also become a promising method for epitope mapping. Research on new algorithms that assist and automate phage display based epitope mapping has attracted many groups. Most of the existing tools have not been implemented as an online service until now however, making it less convenient for the community to access, utilize, and evaluate them. Results We present MIMOX, a free web tool that helps to map the native epitope of an antibody based on one or more user supplied mimotopes and the antigen structure. MIMOX was coded in Perl using modules from the Bioperl project. It has two sections. In the first section, MIMOX provides a simple interface for ClustalW to align a set of mimotopes. It also provides a simple statistical method to derive the consensus sequence and embeds JalView as a Java applet to view and manage the alignment. In the second section, MIMOX can map a single mimotope or a consensus sequence of a set of mimotopes, on to the corresponding antigen structure and search for all of the clusters of residues that could represent the native epitope. NACCESS is used to evaluate the surface accessibility of the candidate clusters; and Jmol is embedded to view them interactively in their 3D context. Initial case studies show that MIMOX can reproduce mappings from existing tools such as FINDMAP and 3DEX, as well as providing novel, rational results. Conclusion A web-based tool called MIMOX has been developed for phage display based epitope mapping. As a publicly available online service in this area, it is convenient for the community to access, utilize, and evaluate, complementing other existing programs. MIMOX is freely available at .
Collapse
Affiliation(s)
- Jian Huang
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
- School of Life Science and Technology, University of Electronic Science and Technology of China, China
| | - Alex Gutteridge
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Wataru Honda
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Minoru Kanehisa
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
998
|
Kim SH, Elango N, Warden C, Vigoda E, Yi SV. Heterogeneous genomic molecular clocks in primates. PLoS Genet 2006; 2:e163. [PMID: 17029560 PMCID: PMC1592237 DOI: 10.1371/journal.pgen.0020163] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2006] [Accepted: 08/10/2006] [Indexed: 12/22/2022] Open
Abstract
Using data from primates, we show that molecular clocks in sites that have been part of a CpG dinucleotide in recent past (CpG sites) and non-CpG sites are of markedly different nature, reflecting differences in their molecular origins. Notably, single nucleotide substitutions at non-CpG sites show clear generation-time dependency, indicating that most of these substitutions occur by errors during DNA replication. On the other hand, substitutions at CpG sites occur relatively constantly over time, as expected from their primary origin due to methylation. Therefore, molecular clocks are heterogeneous even within a genome. Furthermore, we propose that varying frequencies of CpG dinucleotides in different genomic regions may have contributed significantly to conflicting earlier results on rate constancy of mammalian molecular clock. Our conclusion that different regions of genomes follow different molecular clocks should be considered when inferring divergence times using molecular data and in phylogenetic analysis.
Collapse
Affiliation(s)
- Seong-Ho Kim
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Navin Elango
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Charles Warden
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Eric Vigoda
- College of Computing, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| | - Soojin V Yi
- School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America
| |
Collapse
|
999
|
Boehm AM, Sickmann A. A comprehensive dictionary of protein accession codes for complete protein accession identifier alias resolving. Proteomics 2006; 6:4223-6. [PMID: 16888720 DOI: 10.1002/pmic.200600018] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In mass spectrometry-based proteomics, protein identification results usually consist of peptide sequences and database-dependent accession identifiers of the matching proteins. Often certain annotations are only available in particular databases that in turn must be queried by a certain identifier. In order to simplify and unify the tracing of identified proteins back to their original annotation information, a system capable of set-oriented mapping the different accession identifiers of proteins derived from multiple sequence database sources has been developed. This allows unification of the access to protein information and tracing to other online resources providing additional information as well as resolving cross-references of protein identifications. The interface of seqDB is available via http://www.protein-ms.de following the link to seqDB.
Collapse
Affiliation(s)
- Andreas M Boehm
- Rudolf-Virchow-Center for Experimental Biomedicine, Julius-Maximilians-Universität Würzburg, Germany
| | | |
Collapse
|
1000
|
Kamatani T, Yamamoto T. Comparison of codon usage and tRNAs in mitochondrial genomes of Candida species. Biosystems 2006; 90:362-70. [PMID: 17123703 DOI: 10.1016/j.biosystems.2006.09.039] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2006] [Revised: 09/30/2006] [Accepted: 09/30/2006] [Indexed: 10/24/2022]
Abstract
To gain insight into the nature of the mitochondrial genomes (mtDNA) of different Candida species, the synonymous codon usage bias of mitochondrial protein coding genes and the tRNAs in C. albicans, C. parapsilosis, C. stellata, C. glabrata and the closely related yeast Saccharomyces cerevisiae were analyzed. Common features of the mtDNA in Candida species are a strong A+T pressure on protein coding genes, and insufficient mitochondrial tRNA species are encoded to perform protein synthesis. The wobble site of the anticodon is always U for the NNR (NNA and NNG) codon families, which are dominated by A-ending codons, and always G for the NNY (NNC and NNU) codon families, which is dominated by U-ending codons, and always U for the NNN (NNA, NNU, NNC and NNG) codon families, which are dominated by A-ending codons and U-ending codons. Patterns of synonymous codon usage of Candida species can be classified into three groups: (1) optimal codon-anticodon usage, Glu, Lys, Leu (translated by anti-codon UAA), Gln, Arg (translated by anti-codon UCU) and Trp are containing NNR codons. NNA, whose corresponding tRNA is encoded in the mtDNA, is used preferentially. (2) Non-optimal codon-anticodon usage, Cys, Asp, Phe, His, Asn, Ser (translated by anti-codon GCU) and Tyr are containing NNY codons. The NNU codon, whose corresponding tRNA is not encoded in the mtDNA, is used preferentially. (3) Combined codon-anticodon usage, Ala, Gly, Leu (translated by anti-codon UAG), Pro, Ser (translated by anti-codon UGA), Thr and Val are containing NNN codons. NNA (tRNA encoded in the mtDNA) and NNU (tRNA not encoded in the mtDNA) are used preferentially. In conclusion, we propose that in Candida species, codons containing A or U at third position are used preferentially, regardless of whether corresponding tRNAs are encoded in the mtDNA. These results might be useful in understanding the common features of the mtDNA in Candida species and patterns of synonymous codon usage.
Collapse
|