201
|
Abstract
About one quarter to one third of all bacterial genes encode proteins of the inner or outer bacterial membrane. These proteins perform essential physiological functions, such as the import or export of metabolites, the homeostasis of metal ions, the extrusion of toxic substances or antibiotics, and the generation or conversion of energy. The last years have witnessed completion of a plethora of whole-genome sequences of bacteria important for biotechnology or medicine, which is the foundation for proteome and other functional genome analyses. In this review, we discuss the challenges in membrane proteome analysis, starting from sample preparation and leading to MS-data analysis and quantification. The current state of available proteomics technologies as well as their advantages and disadvantages will be described with a focus on shotgun proteomics. Then, we will briefly introduce the most abundant proteins and protein families present in bacterial membranes before bacterial membrane proteomics studies of the last years will be presented. It will be shown how these works enlarged our knowledge about the physiological adaptations that take place in bacteria during fine chemical production, bioremediation, protein overexpression, and during infections. Furthermore, several examples from literature demonstrate the suitability of membrane proteomics for the identification of antigens and different pathogenic strains, as well as the elucidation of membrane protein structure and function.
Collapse
Affiliation(s)
- Ansgar Poetsch
- Lehrstuhl für Biochemie der Pflanzen, Ruhr Universität Bochum, Bochum, Germany.
| | | |
Collapse
|
202
|
Wurm Y, Uva P, Ricci F, Wang J, Jemielity S, Iseli C, Falquet L, Keller L. Fourmidable: a database for ant genomics. BMC Genomics 2009; 10:5. [PMID: 19126223 PMCID: PMC2639375 DOI: 10.1186/1471-2164-10-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 01/06/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Fourmidable is an infrastructure to curate and share the emerging genetic, molecular, and functional genomic data and protocols for ants. DESCRIPTION The Fourmidable assembly pipeline groups nucleotide sequences into clusters before independently assembling each cluster. Subsequently, assembled sequences are annotated via Interproscan and BLAST against general and insect-specific databases. Gene-specific information can be retrieved using gene identifiers, searching for similar sequences or browsing through inferred Gene Ontology annotations. The database will readily scale as ultra-high throughput sequence data and sequences from additional species become available. CONCLUSION Fourmidable currently houses EST data from two ant species and microarray gene expression data for one of these. Fourmidable is publicly available at http://fourmidable.unil.ch.
Collapse
Affiliation(s)
- Yannick Wurm
- Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Paolo Uva
- Istituto di Ricerche di Biologia Molecolare, Merck Research Laboratories, 00040 Pomezia, Rome, Italy
| | - Frédéric Ricci
- Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - John Wang
- Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland
| | - Stephanie Jemielity
- Institut for Infectious Diseases, University of Bern, CH-3010 Bern, Switzerland
| | - Christian Iseli
- Ludwig Institute for Cancer Research, CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Laurent Falquet
- Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland
| | - Laurent Keller
- Department of Ecology and Evolution, Biophore, University of Lausanne, CH-1015 Lausanne, Switzerland
| |
Collapse
|
203
|
Beeby M, Bobik TA, Yeates TO. Exploiting genomic patterns to discover new supramolecular protein assemblies. Protein Sci 2009; 18:69-79. [PMID: 19177352 PMCID: PMC2708037 DOI: 10.1002/pro.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2008] [Revised: 09/19/2008] [Accepted: 09/22/2008] [Indexed: 01/29/2023]
Abstract
Bacterial microcompartments are supramolecular protein assemblies that function as bacterial organelles by compartmentalizing particular enzymes and metabolic intermediates. The outer shells of these microcompartments are assembled from multiple paralogous structural proteins. Because the paralogs are required to assemble together, their genes are often transcribed together from the same operon, giving rise to a distinctive genomic pattern: multiple, typically small, paralogous proteins encoded in close proximity on the bacterial chromosome. To investigate the generality of this pattern in supramolecular assemblies, we employed a comparative genomics approach to search for protein families that show the same kind of genomic pattern as that exhibited by bacterial microcompartments. The results indicate that a variety of large supramolecular assemblies fit the pattern, including bacterial gas vesicles, bacterial pili, and small heat-shock protein complexes. The search also retrieved several widely distributed protein families of presently unknown function. The proteins from one of these families were characterized experimentally and found to show a behavior indicative of supramolecular assembly. We conclude that cotranscribed paralogs are a common feature of diverse supramolecular assemblies, and a useful genomic signature for discovering new kinds of large protein assemblies from genomic data.
Collapse
Affiliation(s)
- Morgan Beeby
- UCLA-DOE Institute for Genomics and Proteomics, University of California Los AngelesLos Angeles, California 90095
| | - Thomas A Bobik
- Biochemistry, Biophysics and Molecular Biology, Iowa State UniversityAmes, Iowa 50011
| | - Todd O Yeates
- UCLA-DOE Institute for Genomics and Proteomics, University of California Los AngelesLos Angeles, California 90095
- Department of Chemistry and Biochemistry, University of California Los AngelesCalifornia 90095-1569
- Molecular Biology Institute, Paul D. Boyer HallLos Angeles, California 90095-1570
| |
Collapse
|
204
|
Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R. The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res 2009; 37:D396-403. [PMID: 18957448 PMCID: PMC2686469 DOI: 10.1093/nar/gkn803] [Citation(s) in RCA: 448] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 10/09/2008] [Accepted: 10/10/2008] [Indexed: 11/25/2022] Open
Abstract
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an extensive, publicly available GO annotation resource. Currently covering over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and most comprehensive open-source contributor to the GO Consortium (GOC) project. Over the last five years, the group has augmented the number and coverage of their electronic pipelines and a number of new manual annotation projects and collaborations now further enhance this resource. A range of files facilitate the download of annotations for particular species, and GO term information and associated annotations can also be viewed and downloaded from the newly developed GOA QuickGO tool (http://www.ebi.ac.uk/QuickGO), which allows users to precisely tailor their annotation set.
Collapse
Affiliation(s)
- Daniel Barrell
- Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | |
Collapse
|
205
|
Tweedie S, Ashburner M, Falls K, Leyland P, McQuilton P, Marygold S, Millburn G, Osumi-Sutherland D, Schroeder A, Seal R, Zhang H. FlyBase: enhancing Drosophila Gene Ontology annotations. Nucleic Acids Res 2009; 37:D555-9. [PMID: 18948289 PMCID: PMC2686450 DOI: 10.1093/nar/gkn788] [Citation(s) in RCA: 604] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2008] [Revised: 10/08/2008] [Accepted: 10/09/2008] [Indexed: 11/13/2022] Open
Abstract
FlyBase (http://flybase.org) is a database of Drosophila genetic and genomic information. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. This article describes recent changes to the FlyBase GO annotation strategy that are improving the quality of the GO annotation data. Many of these changes stem from our participation in the GO Reference Genome Annotation Project--a multi-database collaboration producing comprehensive GO annotation sets for 12 diverse species.
Collapse
Affiliation(s)
- Susan Tweedie
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
206
|
Bhasi A, Philip P, Manikandan V, Senapathy P. ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes. Nucleic Acids Res 2009; 37:D703-11. [PMID: 18984624 PMCID: PMC2686582 DOI: 10.1093/nar/gkn746] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Revised: 10/02/2008] [Accepted: 10/03/2008] [Indexed: 11/27/2022] Open
Abstract
We have developed ExDom, a unique database for the comparative analysis of the exon-intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon-intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon-intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon-intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/.
Collapse
Affiliation(s)
- Ashwini Bhasi
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| | - Philge Philip
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| | - Vinu Manikandan
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| | - Periannan Senapathy
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| |
Collapse
|
207
|
Abstract
Proteolytic enzymes play an essential role in many biological and pathological processes. Taking advantage of the recent availability of several mammalian genome sequences and by using a set of computational approaches, we have annotated and compared the degradome or complete repertoire of proteases of different mammalian species including human, mouse, rat, and chimpanzee. These studies have allowed us to expand our knowledge about the complexity, evolution, and diversity of proteolytic systems, which represent about 2% of the studied genomes. In this chapter, we review the genomic and computational methodologies used in this degradomic analysis and summarize the main findings derived from comparison of mammalian degradomes.
Collapse
Affiliation(s)
- Gonzalo R Ordóñez
- Departamento de Bioquímica y Biología Molecular, Facultad de Medicina, Instituto Universitario de Oncología, Universidad de Oviedo, Oviedo, Spain
| | | | | | | |
Collapse
|
208
|
Protein Sequence Databases. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
209
|
Wichadakul D, McDermott J, Samudrala R. Prediction and integration of regulatory and protein-protein interactions. Methods Mol Biol 2009; 541:101-43. [PMID: 19381527 DOI: 10.1007/978-1-59745-243-4_6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Knowledge of transcriptional regulatory interactions (TRIs) is essential for exploring functional genomics and systems biology in any organism. While several results from genome-wide analysis of transcriptional regulatory networks are available, they are limited to model organisms such as yeast ( 1 ) and worm ( 2 ). Beyond these networks, experiments on TRIs study only individual genes and proteins of specific interest. In this chapter, we present a method for the integration of various data sets to predict TRIs for 54 organisms in the Bioverse ( 3 ). We describe how to compile and handle various formats and identifiers of data sets from different sources and how to predict TRIs using a homology-based approach, utilizing the compiled data sets. Integrated data sets include experimentally verified TRIs, binding sites of transcription factors, promoter sequences, protein subcellular localization, and protein families. Predicted TRIs expand the networks of gene regulation for a large number of organisms. The integration of experimentally verified and predicted TRIs with other known protein-protein interactions (PPIs) gives insight into specific pathways, network motifs, and the topological dynamics of an integrated network with gene expression under different conditions, essential for exploring functional genomics and systems biology.
Collapse
|
210
|
McDowall MD, Scott MS, Barton GJ. PIPs: human protein-protein interaction prediction database. Nucleic Acids Res 2009; 37:D651-6. [PMID: 18988626 PMCID: PMC2686497 DOI: 10.1093/nar/gkn870] [Citation(s) in RCA: 203] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2008] [Revised: 09/25/2008] [Accepted: 10/18/2008] [Indexed: 12/14/2022] Open
Abstract
The PIPs database (http://www.compbio.dundee.ac.uk/www-pips) is a resource for studying protein-protein interactions in human. It contains predictions of >37,000 high probability interactions of which >34,000 are not reported in the interaction databases HPRD, BIND, DIP or OPHID. The interactions in PIPs were calculated by a Bayesian method that combines information from expression, orthology, domain co-occurrence, post-translational modifications and sub-cellular location. The predictions also take account of the topology of the predicted interaction network. The web interface to PIPs ranks predictions according to their likelihood of interaction broken down by the contribution from each information source and with easy access to the evidence that supports each prediction. Where data exists in OPHID, HPRD, DIP or BIND for a protein pair this is also reported in the output tables returned by a search. A network browser is included to allow convenient browsing of the interaction network for any protein in the database. The PIPs database provides a new resource on protein-protein interactions in human that is straightforward to browse, or can be exploited completely, for interaction network modelling.
Collapse
Affiliation(s)
| | | | - Geoffrey J. Barton
- School of Life Sciences Research, College of Life Sciences, University of Dundee, Dow Street, Dundee, DD1 5EH, UK
| |
Collapse
|
211
|
Fey P, Gaudet P, Curk T, Zupan B, Just EM, Basu S, Merchant SN, Bushmanova YA, Shaulsky G, Kibbe WA, Chisholm RL. dictyBase--a Dictyostelium bioinformatics resource update. Nucleic Acids Res 2009; 37:D515-9. [PMID: 18974179 PMCID: PMC2686522 DOI: 10.1093/nar/gkn844] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2008] [Revised: 10/14/2008] [Accepted: 10/15/2008] [Indexed: 12/14/2022] Open
Abstract
dictyBase (http://dictybase.org) is the model organism database for Dictyostelium discoideum. It houses the complete genome sequence, ESTs and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome. This dictyBase update describes the annotations and features implemented since 2006, including improved strain and phenotype representation, integration of predicted transcriptional regulatory elements, protein domain information, biochemical pathways, improved searching and a wiki tool that allows members of the research community to provide annotations.
Collapse
Affiliation(s)
- Petra Fey
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Pascale Gaudet
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Tomaz Curk
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Blaz Zupan
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Eric M. Just
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Siddhartha Basu
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Sohel N. Merchant
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Yulia A. Bushmanova
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Gad Shaulsky
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Warren A. Kibbe
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Rex L. Chisholm
- dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| |
Collapse
|
212
|
Ogata Y, Sakurai N, Aoki K, Suzuki H, Okazaki K, Saito K, Shibata D. KAGIANA: an excel-based tool for retrieving summary information on Arabidopsis genes. PLANT & CELL PHYSIOLOGY 2009; 50:173-7. [PMID: 19043069 PMCID: PMC2638708 DOI: 10.1093/pcp/pcn179] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2008] [Accepted: 11/17/2008] [Indexed: 05/21/2023]
Abstract
Various public databases provide Arabidopsis gene information via the internet. It is useful to abstract information obtained from such databases. We have developed the KAGIANA tool, which allows a user to retrieve summary information obtained from selective databases and to access pages for a gene of interest in those databases. The tool is based on Microsoft Excel and provides several macro programs for gene expression analyses. It can assist plant biologists in accessing omics information for plant biology. The KAGIANA tool is freely available at http://pmnedo.kazusa.or.jp/kagiana/.
Collapse
Affiliation(s)
- Yoshiyuki Ogata
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Nozomu Sakurai
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Koh Aoki
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Hideyuki Suzuki
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Koei Okazaki
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
| | - Kazuki Saito
- Graduate School of Pharmaceutical Science, Chiba University, Yayoi-cho 1-33, Inage-ku, Chiba, 263-8522 Japan
| | - Daisuke Shibata
- Kazusa DNA Research Institute, Kazusa-Kamatari 2-6-7, Kisarazu, Chiba, 292-0818 Japan
- *Corresponding author: E-mail, ; Fax, +81-438-52-3948
| |
Collapse
|
213
|
Abstract
Set enrichment analytical methods have become commonplace tools applied to the analysis and interpretation of biological data. The statistical techniques are used to identify categorical biases within lists of genes, proteins, or metabolites. The goal is to discover the shared functions or properties of the biological items represented within the lists. Application of these methods can provide great biological insight, including the discovery of participation in the same biological activity or pathway, shared interacting genes or regulators, common cellular compartmentalization, or association with disease. The methods require ordered or unordered lists of biological items as input, understanding of the reference set from which the lists were selected, categorical classifiers describing the items, and a statistical algorithm to assess bias of each classifier. Due to the complexity of most algorithms and the number of calculations performed, computer software is almost always used for execution of the algorithm, as well as for presentation of the results. This chapter will provide an overview of the statistical methods used to perform an enrichment analysis. Guidelines for assembly of the requisite information will be presented, with a focus on careful definition of the sets used by the statistical algorithms. The need for multiple test correction when working with large libraries of classifiers is emphasized, and we outline several options for performing the corrections. Finally, interpreting the results of such analysis will be discussed along with examples of recent research utilizing the techniques.
Collapse
Affiliation(s)
- Charles A Tilford
- Research & Development, Bristol-Myers Squibb Company, Pennington, NJ, USA
| | | |
Collapse
|
214
|
Letunic I, Doerks T, Bork P. SMART 6: recent updates and new developments. Nucleic Acids Res 2009. [PMID: 18978020 DOI: 10.1093/nar] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2023] Open
Abstract
Simple modular architecture research tool (SMART) is an online tool (http://smart.embl.de/) for the identification and annotation of protein domains. It provides a user-friendly platform for the exploration and comparative study of domain architectures in both proteins and genes. The current release of SMART contains manually curated models for 784 protein domains. Recent developments were focused on further data integration and improving user friendliness. The underlying protein database based on completely sequenced genomes was greatly expanded and now includes 630 species, compared to 191 in the previous release. As an initial step towards integrating information on biological pathways into SMART, our domain annotations were extended with data on metabolic pathways and links to several pathways resources. The interaction network view was completely redesigned and is now available for more than 2 million proteins. In addition to the standard web access to the database, users can now query SMART using distributed annotation system (DAS) or through a simple object access protocol (SOAP) based web service.
Collapse
|
215
|
Park H, Huxley-Jones J, Boot-Handford RP, Bishop PN, Attwood TK, Bella J. LRRCE: a leucine-rich repeat cysteine capping motif unique to the chordate lineage. BMC Genomics 2008; 9:599. [PMID: 19077264 PMCID: PMC2637281 DOI: 10.1186/1471-2164-9-599] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2008] [Accepted: 12/12/2008] [Indexed: 01/27/2023] Open
Abstract
Background The small leucine-rich repeat proteins and proteoglycans (SLRPs) form an important family of regulatory molecules that participate in many essential functions. They typically control the correct assembly of collagen fibrils, regulate mineral deposition in bone, and modulate the activity of potent cellular growth factors through many signalling cascades. SLRPs belong to the group of extracellular leucine-rich repeat proteins that are flanked at both ends by disulphide-bonded caps that protect the hydrophobic core of the terminal repeats. A capping motif specific to SLRPs has been recently described in the crystal structures of the core proteins of decorin and biglycan. This motif, designated as LRRCE, differs in both sequence and structure from other, more widespread leucine-rich capping motifs. To investigate if the LRRCE motif is a common structural feature found in other leucine-rich repeat proteins, we have defined characteristic sequence patterns and used them in genome-wide searches. Results The LRRCE motif is a structural element exclusive to the main group of SLRPs. It appears to have evolved during early chordate evolution and is not found in protein sequences from non-chordate genomes. Our search has expanded the family of SLRPs to include new predicted protein sequences, mainly in fishes but with intriguing putative orthologs in mammals. The chromosomal locations of the newly predicted SLRP genes would support the large-scale genome or gene duplications that are thought to have occurred during vertebrate evolution. From this expanded list we describe a new class of SLRP sequences that could be representative of an ancestral SLRP gene. Conclusion Given its exclusivity the LRRCE motif is a useful annotation tool for the identification and classification of new SLRP sequences in genome databases. The expanded list of members of the SLRP family offers interesting insights into early vertebrate evolution and suggests an early chordate evolutionary origin for the LRRCE capping motif.
Collapse
Affiliation(s)
- Hosil Park
- Faculty of Life Sciences, Wellcome Trust Centre for Cell-Matrix Research, University of Manchester, Manchester, UK.
| | | | | | | | | | | |
Collapse
|
216
|
GeneDistiller--distilling candidate genes from linkage intervals. PLoS One 2008; 3:e3874. [PMID: 19057649 PMCID: PMC2587712 DOI: 10.1371/journal.pone.0003874] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 11/10/2008] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Linkage studies often yield intervals containing several hundred positional candidate genes. Different manual or automatic approaches exist for the determination of the gene most likely to cause the disease. While the manual search is very flexible and takes advantage of the researchers' background knowledge and intuition, it may be very cumbersome to collect and study the relevant data. Automatic solutions on the other hand usually focus on certain models, remain "black boxes" and do not offer the same degree of flexibility. METHODOLOGY We have developed a web-based application that combines the advantages of both approaches. Information from various data sources such as gene-phenotype associations, gene expression patterns and protein-protein interactions was integrated into a central database. Researchers can select which information for the genes within a candidate interval or for single genes shall be displayed. Genes can also interactively be filtered, sorted and prioritised according to criteria derived from the background knowledge and preconception of the disease under scrutiny. CONCLUSIONS GeneDistiller provides knowledge-driven, fully interactive and intuitive access to multiple data sources. It displays maximum relevant information, while saving the user from drowning in the flood of data. A typical query takes less than two seconds, thus allowing an interactive and explorative approach to the hunt for the candidate gene. ACCESS GeneDistiller can be freely accessed at http://www.genedistiller.org.
Collapse
|
217
|
Jung K, Park J, Choi J, Park B, Kim S, Ahn K, Choi J, Choi D, Kang S, Lee YH. SNUGB: a versatile genome browser supporting comparative and functional fungal genomics. BMC Genomics 2008; 9:586. [PMID: 19055845 PMCID: PMC2649115 DOI: 10.1186/1471-2164-9-586] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2008] [Accepted: 12/04/2008] [Indexed: 12/24/2022] Open
Abstract
Background Since the full genome sequences of Saccharomyces cerevisiae were released in 1996, genome sequences of over 90 fungal species have become publicly available. The heterogeneous formats of genome sequences archived in different sequencing centers hampered the integration of the data for efficient and comprehensive comparative analyses. The Comparative Fungal Genomics Platform (CFGP) was developed to archive these data via a single standardized format that can support multifaceted and integrated analyses of the data. To facilitate efficient data visualization and utilization within and across species based on the architecture of CFGP and associated databases, a new genome browser was needed. Results The Seoul National University Genome Browser (SNUGB) integrates various types of genomic information derived from 98 fungal/oomycete (137 datasets) and 34 plant and animal (38 datasets) species, graphically presents germane features and properties of each genome, and supports comparison between genomes. The SNUGB provides three different forms of the data presentation interface, including diagram, table, and text, and six different display options to support visualization and utilization of the stored information. Information for individual species can be quickly accessed via a new tool named the taxonomy browser. In addition, SNUGB offers four useful data annotation/analysis functions, including 'BLAST annotation.' The modular design of SNUGB makes its adoption to support other comparative genomic platforms easy and facilitates continuous expansion. Conclusion The SNUGB serves as a powerful platform supporting comparative and functional genomics within the fungal kingdom and also across other kingdoms. All data and functions are available at the web site .
Collapse
Affiliation(s)
- Kyongyong Jung
- Fungal Bioinformatics Laboratory, Seoul National University, Seoul, Korea.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
218
|
Comparative genomic analysis of carbon and nitrogen assimilation mechanisms in three indigenous bioleaching bacteria: predictions and validations. BMC Genomics 2008; 9:581. [PMID: 19055775 PMCID: PMC2607301 DOI: 10.1186/1471-2164-9-581] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2008] [Accepted: 12/03/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Carbon and nitrogen fixation are essential pathways for autotrophic bacteria living in extreme environments. These bacteria can use carbon dioxide directly from the air as their sole carbon source and can use different sources of nitrogen such as ammonia, nitrate, nitrite, or even nitrogen from the air. To have a better understanding of how these processes occur and to determine how we can make them more efficient, a comparative genomic analysis of three bioleaching bacteria isolated from mine sites in Chile was performed. This study demonstrated that there are important differences in the carbon dioxide and nitrogen fixation mechanisms among bioleaching bacteria that coexist in mining environments. RESULTS In this study, we probed that both Acidithiobacillus ferrooxidans and Acidithiobacillus thiooxidans incorporate CO2 via the Calvin-Benson-Bassham cycle; however, the former bacterium has two copies of the Rubisco type I gene whereas the latter has only one copy. In contrast, we demonstrated that Leptospirillum ferriphilum utilizes the reductive tricarboxylic acid cycle for carbon fixation. Although all the species analyzed in our study can incorporate ammonia by an ammonia transporter, we demonstrated that Acidithiobacillus thiooxidans could also assimilate nitrate and nitrite but only Acidithiobacillus ferrooxidans could fix nitrogen directly from the air. CONCLUSION The current study utilized genomic and molecular evidence to verify carbon and nitrogen fixation mechanisms for three bioleaching bacteria and provided an analysis of the potential regulatory pathways and functional networks that control carbon and nitrogen fixation in these microorganisms.
Collapse
|
219
|
Coates BS, Sumerford DV, Hellmich RL, Lewis LC. Mining an Ostrinia nubilalis midgut expressed sequence tag (EST) library for candidate genes and single nucleotide polymorphisms (SNPs). INSECT MOLECULAR BIOLOGY 2008; 17:607-620. [PMID: 19133073 DOI: 10.1111/j.1365-2583.2008.00833.x] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Genes expressed in lepidopteran midgut tissues are involved in digestion and Bacillus thuringiensis (Bt) toxin resistance traits. Five hundred and thirty five unique transcripts were annotated from 1745 high quality O. nubilalis larval midgut expressed sequence tags (ESTs). Full-length cDNA sequence of 12 putative serine proteinase genes and 3 partial O. nubilalis aminopeptidase N protein genes, apn1, apn3, and apn4, were obtained, and genes may have roles in plant feeding and Bt toxin resistance traits of Ostrinia larvae. The EST library was not normalized and insert frequencies reflect transcript levels under the initial treatment conditions and redundancy of inserts from highly expressed transcripts allowed prediction of putative single nucleotide polymorphisms (SNPs). Ten di-, tri- or tetranucleotide repeat unit microsatellite loci were identified, and minisatellite repeats were observed within the C-termini of two encoded serine proteinases. Molecular markers showed polymorphism at 28 SNP loci and one microsatellite locus, and Mendelian inheritance indicated that markers were applicable to genome mapping applications. This O. nubilalis larval midgut EST collection is a resource for gene discovery, expression information, and allelic variation for use in genetic marker development.
Collapse
Affiliation(s)
- B S Coates
- USDA-ARS, Corn Insect and Crop Genetics Research Unit, Genetics Laboratory, Iowa State University, Ames, Iowa 50011, USA.
| | | | | | | |
Collapse
|
220
|
Mabey Gilsenan JE, Atherton G, Bartholomew J, Giles PF, Attwood TK, Denning DW, Bowyer P. Aspergillus genomes and the Aspergillus cloud. Nucleic Acids Res 2008; 37:D509-14. [PMID: 19039001 PMCID: PMC2686514 DOI: 10.1093/nar/gkn876] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Aspergillus Genomes is a public resource for viewing annotated genes predicted by various Aspergillus sequencing projects. It has arisen from the union of two significant resources: the Aspergillus/Aspergillosis website and the Central Aspergillus Data REpository (CADRE). The former has primarily served the medical community, providing information about Aspergillus and associated diseases to medics, patients and scientists; the latter has focused on the fungal genomic community, providing a central repository for sequences and annotation extracted from Aspergillus Genomes. By merging these databases, genomes benefit from extensive cross-linking with medical information to create a unique resource, spanning genomics and clinical aspects of the genus. Aspergillus Genomes is accessible from http://www.aspergillus-genomes.org.uk.
Collapse
Affiliation(s)
- Jane E Mabey Gilsenan
- School of Medicine, The University Hospital of South Manchester (Wythenshawe), Manchester M23 9LT, UK.
| | | | | | | | | | | | | |
Collapse
|
221
|
Droc G, Périn C, Fromentin S, Larmande P. OryGenesDB 2008 update: database interoperability for functional genomics of rice. Nucleic Acids Res 2008; 37:D992-5. [PMID: 19036791 PMCID: PMC2686528 DOI: 10.1093/nar/gkn821] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
OryGenesDB (http://orygenesdb.cirad.fr/index.html) is a database developed for rice reverse genetics. OryGenesDB contains FSTs (flanking sequence tags) of various mutagens and functional genomics data, collected from both international insertion collections and the literature. The current release of OryGenesDB contains 171 000 FSTs, and annotations divided among 10 specific categories, totaling 78 annotation layers. Several additional tools have been added to the main interface; these tools enable the user to retrieve FSTs and design probes to analyze insertion lines. The major innovation of OryGenesDB 2008, besides updating the data and tools, is a new tool, Orylink, which was developed to speed up rice functional genomics by taking advantage of the resources developed in two related databases, Oryza Tag Line and GreenPhylDB. Orylink was designed to field complex queries across these three databases and store both the queries and their results in an intuitive manner. Orylink offers a simple and powerful virtual workbench for functional genomics. Alternatively, the Web services developed for Orylink can be used independently of its Web interface, increasing the interoperability between these different bioinformatics applications.
Collapse
Affiliation(s)
- Gaëtan Droc
- CIRAD Dept BIOS UMR DAP - TA40/03, 34398 Montpellier, France
| | | | | | | |
Collapse
|
222
|
Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C, Gough J. SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res 2008; 37:D380-6. [PMID: 19036790 PMCID: PMC2686452 DOI: 10.1093/nar/gkn762] [Citation(s) in RCA: 330] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.
Collapse
Affiliation(s)
- Derek Wilson
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, Department of Computer Science, University of Bristol, The Merchant Venturers Building, Bristol BS8 1UB, UK.
| | | | | | | | | | | | | | | |
Collapse
|
223
|
Osorio H, Martínez V, Nieto PA, Holmes DS, Quatrini R. Microbial iron management mechanisms in extremely acidic environments: comparative genomics evidence for diversity and versatility. BMC Microbiol 2008; 8:203. [PMID: 19025650 PMCID: PMC2631029 DOI: 10.1186/1471-2180-8-203] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2008] [Accepted: 11/24/2008] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Iron is an essential nutrient but can be toxic at high intracellular concentrations and organisms have evolved tightly regulated mechanisms for iron uptake and homeostasis. Information on iron management mechanisms is available for organisms living at circumneutral pH. However, very little is known about how acidophilic bacteria, especially those used for industrial copper bioleaching, cope with environmental iron loads that can be 1018 times the concentration found in pH neutral environments. This study was motivated by the need to fill this lacuna in knowledge. An understanding of how microorganisms thrive in acidic ecosystems with high iron loads requires a comprehensive investigation of the strategies to acquire iron and to coordinate this acquisition with utilization, storage and oxidation of iron through metal responsive regulation. In silico prediction of iron management genes and Fur regulation was carried out for three Acidithiobacilli: Acidithiobacillus ferrooxidans (iron and sulfur oxidizer) A. thiooxidans and A. caldus (sulfur oxidizers) that can live between pH 1 and pH 5 and for three strict iron oxidizers of the Leptospirillum genus that live at pH 1 or below. RESULTS Acidithiobacilli have predicted FeoB-like Fe(II) and Nramp-like Fe(II)-Mn(II) transporters. They also have 14 different TonB dependent ferri-siderophore transporters of diverse siderophore affinity, although they do not produce classical siderophores. Instead they have predicted novel mechanisms for dicitrate synthesis and possibly also for phosphate-chelation mediated iron uptake. It is hypothesized that the unexpectedly large number and diversity of Fe(III)-uptake systems confers versatility to this group of acidophiles, especially in higher pH environments (pH 4-5) where soluble iron may not be abundant. In contrast, Leptospirilla have only a FtrI-Fet3P-like permease and three TonB dependent ferri-dicitrate siderophore systems. This paucity of iron uptake systems could reflect their obligatory occupation of extremely low pH environments where high concentrations of soluble iron may always be available and were oxidized sulfur species might not compromise iron speciation dynamics. Presence of bacterioferritin in the Acidithiobacilli, polyphosphate accumulation functions and variants of FieF-like diffusion facilitators in both Acidithiobacilli and Leptospirilla, indicate that they may remove or store iron under conditions of variable availability. In addition, the Fe(II)-oxidizing capacity of both A. ferrooxidans and Leptospirilla could itself be a way to evade iron stress imposed by readily available Fe(II) ions at low pH. Fur regulatory sites have been predicted for a number of gene clusters including iron related and non-iron related functions in both the Acidithiobacilli and Leptospirilla, laying the foundation for the future discovery of iron regulated and iron-phosphate coordinated regulatory control circuits. CONCLUSION In silico analyses of the genomes of acidophilic bacteria are beginning to tease apart the mechanisms that mediate iron uptake and homeostasis in low pH environments. Initial models pinpoint significant differences in abundance and diversity of iron management mechanisms between Leptospirilla and Acidithiobacilli, and begin to reveal how these two groups respond to iron cycling and iron fluctuations in naturally acidic environments and in industrial operations. Niche partitions and ecological successions between acidophilic microorganisms may be partially explained by these observed differences. Models derived from these analyses pave the way for improved hypothesis testing and well directed experimental investigation. In addition, aspects of these models should challenge investigators to evaluate alternative iron management strategies in non-acidophilic model organisms.
Collapse
Affiliation(s)
- Héctor Osorio
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
- Depto. de Ciencias Biologicas, Facultad de Ciencias de la Salud, Universidad Andres Bello, Santiago, Chile
| | - Verónica Martínez
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
| | - Pamela A Nieto
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
| | - David S Holmes
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
- Depto. de Ciencias Biologicas, Facultad de Ciencias de la Salud, Universidad Andres Bello, Santiago, Chile
| | - Raquel Quatrini
- Center for Bioinformatics and Genome Biology, Fundación Ciencia para la Vida, MIFAB, Santiago, Chile
| |
Collapse
|
224
|
Holzmann J, Frank P, Löffler E, Bennett KL, Gerner C, Rossmanith W. RNase P without RNA: identification and functional reconstitution of the human mitochondrial tRNA processing enzyme. Cell 2008; 135:462-74. [PMID: 18984158 DOI: 10.1016/j.cell.2008.09.013] [Citation(s) in RCA: 432] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2008] [Revised: 07/17/2008] [Accepted: 09/02/2008] [Indexed: 11/26/2022]
Abstract
tRNAs are synthesized as immature precursors, and on their way to functional maturity, extra nucleotides at their 5' ends are removed by an endonuclease called RNase P. All RNase P enzymes characterized so far are composed of an RNA plus one or more proteins, and tRNA 5' end maturation is considered a universal ribozyme-catalyzed process. Using a combinatorial purification/proteomics approach, we identified the components of human mitochondrial RNase P and reconstituted the enzymatic activity from three recombinant proteins. We thereby demonstrate that human mitochondrial RNase P is a protein enzyme that does not require a trans-acting RNA component for catalysis. Moreover, the mitochondrial enzyme turns out to be an unexpected type of patchwork enzyme, composed of a tRNA methyltransferase, a short-chain dehydrogenase/reductase-family member, and a protein of hitherto unknown functional and evolutionary origin, possibly representing the enzyme's metallonuclease moiety. Apparently, animal mitochondria lost the seemingly ubiquitous RNA world remnant after reinventing RNase P from preexisting components.
Collapse
Affiliation(s)
- Johann Holzmann
- Center for Anatomy & Cell Biology, Medical University of Vienna, 1090 Vienna, Austria
| | | | | | | | | | | |
Collapse
|
225
|
The UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. J Proteomics 2008; 72:567-73. [PMID: 19084081 DOI: 10.1016/j.jprot.2008.11.010] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Revised: 11/04/2008] [Accepted: 11/10/2008] [Indexed: 11/21/2022]
Abstract
The UniProt knowledgebase, UniProtKB, is the main product of the UniProt consortium. It consists of two sections, UniProtKB/Swiss-Prot, the manually curated section, and UniProtKB/TrEMBL, the computer translation of the EMBL/GenBank/DDBJ nucleotide sequence database. Taken together, these two sections cover all the proteins characterized or inferred from all publicly available nucleotide sequences. The Plant Proteome Annotation Program (PPAP) of UniProtKB/Swiss-Prot focuses on the manual annotation of plant-specific proteins and protein families. Our major effort is currently directed towards the two model plants Arabidopsis thaliana and Oryza sativa. In UniProtKB/Swiss-Prot, redundancy is minimized by merging all data from different sources in a single entry. The proposed protein sequence is frequently modified after comparison with ESTs, full length transcripts or homologous proteins from other species. The information present in manually curated entries allows the reconstruction of all described isoforms. The annotation also includes proteomics data such as PTM and protein identification MS experimental results. UniProtKB and the other products of the UniProt consortium are accessible online at www.uniprot.org.
Collapse
|
226
|
Vizcaíno JA, Mueller M, Hermjakob H, Martens L. Charting online OMICS resources: A navigational chart for clinical researchers. Proteomics Clin Appl 2008; 3:18-29. [PMID: 21136933 DOI: 10.1002/prca.200800082] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2008] [Indexed: 12/22/2022]
Abstract
The life sciences have sprouted several popular and successful OMICS technologies that span all levels of biological information transfer. Ever since the start of the Human Genome Project, the then revolutionary idea to make all resulting data publicly available has been central to all of the efforts across OMICS technologies. As a result, a great variety of publicly available data repositories and resources is currently available to the research community. This widespread availability of data does come at the price of increased confusion on the part of the users, especially for those that see the OMICS technologies as tools to help unravel a larger biological or clinical question. We therefore provide a comprehensive overview of the available resources across OMICS fields, with a special emphasis on those databases that are relevant to the study of proteins. Additionally, we also describe various integrative systems that have been established, and highlight new developments in the field that can revolutionize the way in which live data integration is achieved over the internet.
Collapse
Affiliation(s)
- Juan Antonio Vizcaíno
- EMBL Outstation, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK
| | | | | | | |
Collapse
|
227
|
Grasso LC, Maindonald J, Rudd S, Hayward DC, Saint R, Miller DJ, Ball EE. Microarray analysis identifies candidate genes for key roles in coral development. BMC Genomics 2008; 9:540. [PMID: 19014561 PMCID: PMC2629781 DOI: 10.1186/1471-2164-9-540] [Citation(s) in RCA: 108] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 11/14/2008] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Anthozoan cnidarians are amongst the simplest animals at the tissue level of organization, but are surprisingly complex and vertebrate-like in terms of gene repertoire. As major components of tropical reef ecosystems, the stony corals are anthozoans of particular ecological significance. To better understand the molecular bases of both cnidarian development in general and coral-specific processes such as skeletogenesis and symbiont acquisition, microarray analysis was carried out through the period of early development - when skeletogenesis is initiated, and symbionts are first acquired. RESULTS Of 5081 unique peptide coding genes, 1084 were differentially expressed (P <or= 0.05) in comparisons between four different stages of coral development, spanning key developmental transitions. Genes of likely relevance to the processes of settlement, metamorphosis, calcification and interaction with symbionts were characterised further and their spatial expression patterns investigated using whole-mount in situ hybridization. CONCLUSION This study is the first large-scale investigation of developmental gene expression for any cnidarian, and has provided candidate genes for key roles in many aspects of coral biology, including calcification, metamorphosis and symbiont uptake. One surprising finding is that some of these genes have clear counterparts in higher animals but are not present in the closely-related sea anemone Nematostella. Secondly, coral-specific processes (i.e. traits which distinguish corals from their close relatives) may be analogous to similar processes in distantly related organisms. This first large-scale application of microarray analysis demonstrates the potential of this approach for investigating many aspects of coral biology, including the effects of stress and disease.
Collapse
Affiliation(s)
- Lauretta C Grasso
- Centre for the Molecular Genetics of Development, Research School of Biological Sciences, Australian National University, Canberra, Australia.
| | | | | | | | | | | | | |
Collapse
|
228
|
Taji T, Sakurai T, Mochida K, Ishiwata A, Kurotani A, Totoki Y, Toyoda A, Sakaki Y, Seki M, Ono H, Sakata Y, Tanaka S, Shinozaki K. Large-scale collection and annotation of full-length enriched cDNAs from a model halophyte, Thellungiella halophila. BMC PLANT BIOLOGY 2008; 8:115. [PMID: 19014467 PMCID: PMC2621223 DOI: 10.1186/1471-2229-8-115] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Accepted: 11/12/2008] [Indexed: 05/15/2023]
Abstract
BACKGROUND Thellungiella halophila (also known as Thellungiella salsuginea) is a model halophyte with a small plant size, short life cycle, and small genome. It easily undergoes genetic transformation by the floral dipping method used with its close relative, Arabidopsis thaliana. Thellungiella genes exhibit high sequence identity (approximately 90% at the cDNA level) with Arabidopsis genes. Furthermore, Thellungiella not only shows tolerance to extreme salinity stress, but also to chilling, freezing, and ozone stress, supporting the use of Thellungiella as a good genomic resource in studies of abiotic stress tolerance. RESULTS We constructed a full-length enriched Thellungiella (Shan Dong ecotype) cDNA library from various tissues and whole plants subjected to environmental stresses, including high salinity, chilling, freezing, and abscisic acid treatment. We randomly selected about 20,000 clones and sequenced them from both ends to obtain a total of 35 171 sequences. CAP3 software was used to assemble the sequences and cluster them into 9569 nonredundant cDNA groups. We named these cDNAs "RTFL" (RIKEN Thellungiella Full-Length) cDNAs. Information on functional domains and Gene Ontology (GO) terms for the RTFL cDNAs were obtained using InterPro. The 8289 genes assigned to InterPro IDs were classified according to the GO terms using Plant GO Slim. Categorical comparison between the whole Arabidopsis genome and Thellungiella genes showing low identity to Arabidopsis genes revealed that the population of Thellungiella transport genes is approximately 1.5 times the size of the corresponding Arabidopsis genes. This suggests that these genes regulate a unique ion transportation system in Thellungiella. CONCLUSION As the number of Thellungiella halophila (Thellungiella salsuginea) expressed sequence tags (ESTs) was 9388 in July 2008, the number of ESTs has increased to approximately four times the original value as a result of this effort. Our sequences will thus contribute to correct future annotation of the Thellungiella genome sequence. The full-length enriched cDNA clones will enable the construction of overexpressing mutant plants by introduction of the cDNAs driven by a constitutive promoter, the complementation of Thellungiella mutants, and the determination of promoter regions in the Thellungiella genome.
Collapse
Affiliation(s)
- Teruaki Taji
- Faculty of Applied Bioscience, Tokyo University of Agriculture, 1-1-1 Sakuragaoka, Setagaya-ku, Tokyo 156-8502, Japan
- Laboratory of Plant Molecular Biology, RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan
| | - Tetsuya Sakurai
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Keiichi Mochida
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Atsushi Ishiwata
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Atsushi Kurotani
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yasushi Totoki
- RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
- MetaSystems Research Team, RIKEN Advanced Science Institute, Yokohama, 230-0045, Japan
| | - Atsushi Toyoda
- RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yoshiyuki Sakaki
- RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Motoaki Seki
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Hirokazu Ono
- Faculty of Applied Bioscience, Tokyo University of Agriculture, 1-1-1 Sakuragaoka, Setagaya-ku, Tokyo 156-8502, Japan
| | - Yoichi Sakata
- Faculty of Applied Bioscience, Tokyo University of Agriculture, 1-1-1 Sakuragaoka, Setagaya-ku, Tokyo 156-8502, Japan
| | - Shigeo Tanaka
- Faculty of Applied Bioscience, Tokyo University of Agriculture, 1-1-1 Sakuragaoka, Setagaya-ku, Tokyo 156-8502, Japan
| | - Kazuo Shinozaki
- Laboratory of Plant Molecular Biology, RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba, Ibaraki 305-0074, Japan
- RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
229
|
Shionyu M, Yamaguchi A, Shinoda K, Takahashi KI, Go M. AS-ALPS: a database for analyzing the effects of alternative splicing on protein structure, interaction and network in human and mouse. Nucleic Acids Res 2008; 37:D305-9. [PMID: 19015123 PMCID: PMC2686549 DOI: 10.1093/nar/gkn869] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have constructed a database, AS-ALPS (alternative splicing-induced alteration of protein structure), which provides information that would be useful for analyzing the effects of alternative splicing (AS) on protein structure, interactions with other bio-molecules and protein interaction networks in human and mouse. Several AS events have been revealed to contribute to the diversification of protein structure, which results in diversification of interaction partners or affinities, which in turn contributes to regulation of bio-molecular networks. Most AS variants, however, are only known at the sequence level. It is important to determine the effects of AS on protein structure and interaction, and to provide candidates for experimental targets that are relevant to network regulation by AS. For this purpose, the three-dimensional (3D) structures of proteins are valuable sources of information; however, these have not been fully exploited in any other AS-related databases. AS-ALPS is the only AS-related database that describes the spatial relationships between protein regions altered by AS ('AS regions') and both the proteins' hydrophobic cores and sites of inter-molecular interactions. This information makes it possible to infer whether protein structural stability and/or protein interaction are affected by each AS event. AS-ALPS can be freely accessed at http://as-alps.nagahama-i-bio.ac.jp and http://genomenetwork.nig.ac.jp/as-alps/.
Collapse
Affiliation(s)
- Masafumi Shionyu
- Department of Bioscience, Faculty of Bioscience, Nagahama Institute of Bio-Science and Technology, 1266 Tamura-cho, Nagahama, Shiga 526-0829, Japan
| | | | | | | | | |
Collapse
|
230
|
Tremblay PL, Hallenbeck PC. Of blood, brains and bacteria, the Amt/Rh transporter family: emerging role of Amt as a unique microbial sensor. Mol Microbiol 2008; 71:12-22. [PMID: 19007411 DOI: 10.1111/j.1365-2958.2008.06514.x] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Members of the Amt/Rh family of transporters are found almost ubiquitously in all forms of life. However, the molecular state of the substrate (NH(3) or NH(4)(+)) has been the subject of active debate. At least for bacterial Amt proteins, the model emerging from computational, X-ray crystal and mutational analysis is that NH(4)(+) is deprotonated at the exterior, conducted through the membrane as NH(3), and reprotonated at the cytoplasmic interface. A proton concomitantly is transferred from the exterior to the interior, although the mechanism is unclear. Here we discuss recent evidence indicating that an important function of at least some eukaryotic and bacterial Amts is to act as ammonium sensors and regulate cellular metabolism in response to changes in external ammonium concentrations. This is now well documented in the regulation of yeast pseudohyphal development and filamentous growth. As well, membrane sequestration of GlnK, a PII signal transduction protein, by AmtB has been shown to regulate nitrogenase in some diazotrophs, and nitrogen metabolism in some gram-positive bacteria. Formation of GlnK-AmtB membrane complexes might have other, as yet undiscovered, regulatory roles. This possibility is emphasized by the discovery in some genomes of genes for chimeric Amts with fusions to various regulatory elements.
Collapse
Affiliation(s)
- Pier-Luc Tremblay
- Département de microbiologie et immunologie, Université de Montréal, Montréal, Québec H3C 3J7, Canada
| | | |
Collapse
|
231
|
Gagné JP, Isabelle M, Lo KS, Bourassa S, Hendzel MJ, Dawson VL, Dawson TM, Poirier GG. Proteome-wide identification of poly(ADP-ribose) binding proteins and poly(ADP-ribose)-associated protein complexes. Nucleic Acids Res 2008; 36:6959-76. [PMID: 18981049 PMCID: PMC2602769 DOI: 10.1093/nar/gkn771] [Citation(s) in RCA: 314] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Poly(ADP-ribose) (pADPr) is a polymer assembled from the enzymatic polymerization of the ADP-ribosyl moiety of NAD by poly(ADP-ribose) polymerases (PARPs). The dynamic turnover of pADPr within the cell is essential for a number of cellular processes including progression through the cell cycle, DNA repair and the maintenance of genomic integrity, and apoptosis. In spite of the considerable advances in the knowledge of the physiological conditions modulated by poly(ADP-ribosyl)ation reactions, and notwithstanding the fact that pADPr can play a role of mediator in a wide spectrum of biological processes, few pADPr binding proteins have been identified so far. In this study, refined in silico prediction of pADPr binding proteins and large-scale mass spectrometry-based proteome analysis of pADPr binding proteins were used to establish a comprehensive repertoire of pADPr-associated proteins. Visualization and modeling of these pADPr-associated proteins in networks not only reflect the widespread involvement of poly(ADP-ribosyl)ation in several pathways but also identify protein targets that could shed new light on the regulatory functions of pADPr in normal physiological conditions as well as after exposure to genotoxic stimuli.
Collapse
Affiliation(s)
- Jean-Philippe Gagné
- Laval University Medical Research Center, CHUQ, Faculty of Medicine, Laval University, Québec, Canada
| | | | | | | | | | | | | | | |
Collapse
|
232
|
Uribe P, Fuentes D, Valdés J, Shmaryahu A, Zúñiga A, Holmes D, Valenzuela PDT. Preparation and analysis of an expressed sequence tag library from the toxic dinoflagellate Alexandrium catenella. MARINE BIOTECHNOLOGY (NEW YORK, N.Y.) 2008; 10:692-700. [PMID: 18478293 DOI: 10.1007/s10126-008-9107-8] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2007] [Revised: 04/10/2008] [Accepted: 04/10/2008] [Indexed: 05/26/2023]
Abstract
Dinoflagellates of the genus Alexandrium are photosynthetic microalgae that have an extreme importance due to the impact of some toxic species on shellfish aquaculture industry. Alexandrium catenella is the species responsible for the production of paralytic shellfish poisoning in Chile and other geographical areas. We have constructed a cDNA library from midexponential cells of A. catenella grown in culture free of associated bacteria and sequenced 10,850 expressed sequence tags (ESTs) that were assembled into 1,021 contigs and 5,475 singletons for a total of 6,496 unigenes. Approximately 41.6% of the unigenes showed similarity to genes with predicted function. A significant number of unigenes showed similarity with genes from other dinoflagellates, plants, and other protists. Among the identified genes, the most expressed correspond to those coding for proteins of luminescence, carbohydrate metabolism, and photosynthesis. The sequences of 9,847 ESTs have been deposited in Gene Bank (accession numbers EX 454357-464203).
Collapse
Affiliation(s)
- Paulina Uribe
- Fundación Ciencia para la Vida, Av. Zañartu 1482, Nuñoa, Santiago, Chile
| | | | | | | | | | | | | |
Collapse
|
233
|
Yang X, Kalluri UC, Jawdy S, Gunter LE, Yin T, Tschaplinski TJ, Weston DJ, Ranjan P, Tuskan GA. The F-box gene family is expanded in herbaceous annual plants relative to woody perennial plants. PLANT PHYSIOLOGY 2008; 148:1189-200. [PMID: 18775973 PMCID: PMC2577272 DOI: 10.1104/pp.108.121921] [Citation(s) in RCA: 102] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2008] [Accepted: 08/24/2008] [Indexed: 05/20/2023]
Abstract
F-box proteins are generally responsible for substrate recognition in the Skp1-Cullin-F-box complexes that are involved in protein degradation via the ubiquitin-26S proteasome pathway. In plants, F-box genes influence a variety of biological processes, such as leaf senescence, branching, self-incompatibility, and responses to biotic and abiotic stresses. The number of F-box genes in Populus (Populus trichocarpa; approximately 320) is less than half that found in Arabidopsis (Arabidopsis thaliana; approximately 660) or Oryza (Oryza sativa; approximately 680), even though the total number of genes in Populus is equivalent to that in Oryza and 1.5 times that in Arabidopsis. We performed comparative genomics analysis between the woody perennial plant Populus and the herbaceous annual plants Arabidopsis and Oryza in order to explicate the functional implications of this large gene family. Our analyses reveal interspecific differences in genomic distribution, orthologous relationship, intron evolution, protein domain structure, and gene expression. The set of F-box genes shared by these species appear to be involved in core biological processes essential for plant growth and development; lineage-specific differences primarily occurred because of an expansion of the F-box genes via tandem duplications in Arabidopsis and Oryza. The number of F-box genes in the newly sequenced woody species Vitis (Vitis vinifera; 156) and Carica (Carica papaya; 139) is similar to that in Populus, supporting the hypothesis that the F-box gene family is expanded in herbaceous annual plants relative to woody perennial plants. This study provides insights into the relationship between the structure and composition of the F-box gene family in herbaceous and woody species and their associated developmental and physiological features.
Collapse
Affiliation(s)
- Xiaohan Yang
- Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
234
|
Abstract
Simple modular architecture research tool (SMART) is an online tool (http://smart.embl.de/) for the identification and annotation of protein domains. It provides a user-friendly platform for the exploration and comparative study of domain architectures in both proteins and genes. The current release of SMART contains manually curated models for 784 protein domains. Recent developments were focused on further data integration and improving user friendliness. The underlying protein database based on completely sequenced genomes was greatly expanded and now includes 630 species, compared to 191 in the previous release. As an initial step towards integrating information on biological pathways into SMART, our domain annotations were extended with data on metabolic pathways and links to several pathways resources. The interaction network view was completely redesigned and is now available for more than 2 million proteins. In addition to the standard web access to the database, users can now query SMART using distributed annotation system (DAS) or through a simple object access protocol (SOAP) based web service.
Collapse
|
235
|
Price MN, Dehal PS, Arkin AP. FastBLAST: homology relationships for millions of proteins. PLoS One 2008; 3:e3589. [PMID: 18974889 PMCID: PMC2571987 DOI: 10.1371/journal.pone.0003589] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2008] [Accepted: 10/10/2008] [Indexed: 11/18/2022] Open
Abstract
Background All-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding. Methodology/Principal Findings We present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database (“NR”), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query. Conclusions/Significance FastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast.
Collapse
Affiliation(s)
- Morgan N Price
- Physical Biosciences Divison, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
| | | | | |
Collapse
|
236
|
Ding G, Lorenz P, Kreutzer M, Li Y, Thiesen HJ. SysZNF: the C2H2 zinc finger gene database. Nucleic Acids Res 2008; 37:D267-73. [PMID: 18974185 PMCID: PMC2686507 DOI: 10.1093/nar/gkn782] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
C2H2 zinc finger (C2H2-ZNF) genes are one of the largest and most complex gene super-families in metazoan genomes, with hundreds of members in the human and mouse genome. The ongoing investigation of this huge gene family requires computational support to catalog genotype phenotype comparisons of C2H2-ZNF genes between related species and finally to extend the worldwide knowledge on the evolution of C2H2-ZNF genes in general. Here, we systematically collected all the C2H2-ZNF genes in the human and mouse genome and constructed a database named SysZNF to deposit available datasets related to these genes. In the database, each C2H2-ZNF gene entry consists of physical location, gene model (including different transcript forms), Affymetrix gene expression probes, protein domain structures, homologs (and synteny between human and mouse), PubMed references as well as links to relevant public databases. The clustered organization of the C2H2-ZNF genes is highlighted. The database can be searched using text strings or sequence information. The data are also available for batch download from the web site. Moreover, the graphical gene model/protein view system, sequence retrieval system and some other tools embedded in SysZNF facilitate the research on the C2H2 type ZNF genes under an integrative view. The database can be accessed from the URL http://epgd.biosino.org/SysZNF.
Collapse
Affiliation(s)
- Guohui Ding
- Bioinformatics Center, Key Lab of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, P. R. China
| | | | | | | | | |
Collapse
|
237
|
Stage- and gender-specific proteomic analysis of Brugia malayi excretory-secretory products. PLoS Negl Trop Dis 2008; 2:e326. [PMID: 18958170 PMCID: PMC2569413 DOI: 10.1371/journal.pntd.0000326] [Citation(s) in RCA: 121] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2008] [Accepted: 10/01/2008] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION While we lack a complete understanding of the molecular mechanisms by which parasites establish and achieve protection from host immune responses, it is accepted that many of these processes are mediated by products, primarily proteins, released from the parasite. Parasitic nematodes occur in different life stages and anatomical compartments within the host. Little is known about the composition and variability of products released at different developmental stages and their contribution to parasite survival and progression of the infection. METHODOLOGY/PRINCIPAL FINDINGS To gain a deeper understanding on these aspects, we collected and analyzed through 1D-SDS PAGE and LC-MS/MS the Excretory-Secretory Products (ESP) of adult female, adult male and microfilariae of the filarial nematode Brugia malayi, one of the etiological agents of human lymphatic filariasis. This proteomic analysis led to the identification of 228 proteins. The list includes 76 proteins with unknown function as well as also proteins with potential immunoregulatory properties, such as protease inhibitors, cytokine homologues and carbohydrate-binding proteins. Larval and adult ESP differed in composition. Only 32 proteins were shared between all three stages/genders. Consistent with this observation, different gene ontology profiles were associated with the different ESP. CONCLUSIONS/SIGNIFICANCE A comparative analysis of the proteins released in vitro by different forms of a parasitic nematode dwelling in the same host is presented. The catalog of secreted proteins reflects different stage- and gender-specific related processes and different strategies of immune evasion, providing valuable insights on the contribution of each form of the parasite for establishing the host-parasite interaction.
Collapse
|
238
|
Vinogradov AE. Modularity of cellular networks shows general center-periphery polarization. ACTA ACUST UNITED AC 2008; 24:2814-7. [PMID: 18953046 DOI: 10.1093/bioinformatics/btn555] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The modular biology is supposed to be a bridge from the molecular to the systems biology. Using a new approach, it is shown here that the protein interaction networks of yeast Saccharomyces cerevisiae and bacteria Escherichia coli consist of two large-scale modularity layers, central and peripheral, separated by a zone of depressed modularity. This finding based on the analysis of network topology is further supported by the discovery that there are many more Gene Ontology categories (terms) and KEGG biochemical pathways that are overrepresented in the central and peripheral layers than in the intermediate zone. The categories of the central layer are mostly related to nuclear information processing, regulation and cell cycle, whereas the peripheral layer is dealing with various metabolic and energetic processes, transport and cell communication. A similar center-periphery polarization of modularity is found in the protein domain networks ('built-in interactome') and in a powergrid (as a non-biological example). These data suggest a 'polarized modularity' model of cellular networks where the central layer seems to be regulatory and to use information storage of the nucleus, whereas the peripheral layer seems devoted to more specialized tasks and environmental interactions, with a complex 'bus' between the layers.
Collapse
|
239
|
Klimke W, Agarwala R, Badretdin A, Chetvernin S, Ciufo S, Fedorov B, Kiryutin B, O'Neill K, Resch W, Resenchuk S, Schafer S, Tolstoy I, Tatusova T. The National Center for Biotechnology Information's Protein Clusters Database. Nucleic Acids Res 2008; 37:D216-23. [PMID: 18940865 PMCID: PMC2686591 DOI: 10.1093/nar/gkn734] [Citation(s) in RCA: 219] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Rapid increases in DNA sequencing capabilities have led to a vast increase in the data generated from prokaryotic genomic studies, which has been a boon to scientists studying micro-organism evolution and to those who wish to understand the biological underpinnings of microbial systems. The NCBI Protein Clusters Database (ProtClustDB) has been created to efficiently maintain and keep the deluge of data up to date. ProtClustDB contains both curated and uncurated clusters of proteins grouped by sequence similarity. The May 2008 release contains a total of 285 386 clusters derived from over 1.7 million proteins encoded by 3806 nt sequences from the RefSeq collection of complete chromosomes and plasmids from four major groups: prokaryotes, bacteriophages and the mitochondrial and chloroplast organelles. There are 7180 clusters containing 376 513 proteins with curated gene and protein functional annotation. PubMed identifiers and external cross references are collected for all clusters and provide additional information resources. A suite of web tools is available to explore more detailed information, such as multiple alignments, phylogenetic trees and genomic neighborhoods. ProtClustDB provides an efficient method to aggregate gene and protein annotation for researchers and is available at http://www.ncbi.nlm.nih.gov/sites/entrez?db=proteinclusters.
Collapse
Affiliation(s)
- William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Department of Health and Human Services, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
240
|
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJA, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. InterPro: the integrative protein signature database. Nucleic Acids Res 2008; 37:D211-5. [PMID: 18940856 PMCID: PMC2686546 DOI: 10.1093/nar/gkn785] [Citation(s) in RCA: 1438] [Impact Index Per Article: 89.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).
Collapse
Affiliation(s)
- Sarah Hunter
- EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
241
|
Global gene expression profiles for life stages of the deadly amphibian pathogen Batrachochytrium dendrobatidis. Proc Natl Acad Sci U S A 2008; 105:17034-9. [PMID: 18852473 DOI: 10.1073/pnas.0804173105] [Citation(s) in RCA: 84] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Amphibians around the world are being threatened by an emerging pathogen, the chytrid fungus Batrachochytrium dendrobatidis (Bd). Despite intensive ecological study in the decade since Bd was discovered, little is known about the mechanism by which Bd kills frogs. Here, we compare patterns of global gene expression in controlled laboratory conditions for the two phases of the life cycle of Bd: the free-living zoospore and the substrate-embedded sporangia. We find zoospores to be transcriptionally less complex than sporangia. Several transcripts more abundant in zoospores provide clues about how this motile life stage interacts with its environment. Genes with higher levels of expression in sporangia provide new hypotheses about the molecular pathways involved in metabolic activity, flagellar function, and pathogenicity in Bd. We highlight expression patterns for a group of fungalysin metallopeptidase genes, a gene family thought to be involved in pathogenicity in another group of fungal pathogens that similarly cause cutaneous infection of vertebrates. Finally we discuss the challenges inherent in developing a molecular toolkit for chytrids, a basal fungal lineage separated by vast phylogenetic distance from other well characterized fungi.
Collapse
|
242
|
Carver T, Berriman M, Tivey A, Patel C, Böhme U, Barrell BG, Parkhill J, Rajandream MA. Artemis and ACT: viewing, annotating and comparing sequences stored in a relational database. ACTA ACUST UNITED AC 2008; 24:2672-6. [PMID: 18845581 PMCID: PMC2606163 DOI: 10.1093/bioinformatics/btn529] [Citation(s) in RCA: 480] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Artemis and Artemis Comparison Tool (ACT) have become mainstream tools for viewing and annotating sequence data, particularly for microbial genomes. Since its first release, Artemis has been continuously developed and supported with additional functionality for editing and analysing sequences based on feedback from an active user community of laboratory biologists and professional annotators. Nevertheless, its utility has been somewhat restricted by its limitation to reading and writing from flat files. Therefore, a new version of Artemis has been developed, which reads from and writes to a relational database schema, and allows users to annotate more complex, often large and fragmented, genome sequences. RESULTS Artemis and ACT have now been extended to read and write directly to the Generic Model Organism Database (GMOD, http://www.gmod.org) Chado relational database schema. In addition, a Gene Builder tool has been developed to provide structured forms and tables to edit coordinates of gene models and edit functional annotation, based on standard ontologies, controlled vocabularies and free text. AVAILABILITY Artemis and ACT are freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute web sites: http://www.sanger.ac.uk/Software/Artemis/ http://www.sanger.ac.uk/Software/ACT/
Collapse
Affiliation(s)
- Tim Carver
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | | | | | | | | | | | | |
Collapse
|
243
|
Zerlotini A, Heiges M, Wang H, Moraes RLV, Dominitini AJ, Ruiz JC, Kissinger JC, Oliveira G. SchistoDB: a Schistosoma mansoni genome resource. Nucleic Acids Res 2008; 37:D579-82. [PMID: 18842636 PMCID: PMC2686589 DOI: 10.1093/nar/gkn681] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SchistoDB (http://schistoDB.net/) is a genomic database for the parasitic organism Schistosoma mansoni, one of the major causative agents of schistosomiasis worldwide. It currently incorporates sequences and annotation for S. mansoni in a single user-friendly database. Several genomic scale analyses are available as well as ESTs, oligonucleotides, metabolic pathways and drugs. In this article, we describe the data sets and its analyses, how to query the database and tools available in the website.
Collapse
Affiliation(s)
- Adhemar Zerlotini
- Laboratory of Cellular and Molecular Parasitology, Instituto René Rachou - FIOCRUZ, Belo Horizonte, MG, Brazil
| | | | | | | | | | | | | | | |
Collapse
|
244
|
Floris M, Orsini M, Thanaraj TA. Splice-mediated Variants of Proteins (SpliVaP) - data and characterization of changes in signatures among protein isoforms due to alternative splicing. BMC Genomics 2008; 9:453. [PMID: 18831736 PMCID: PMC2573899 DOI: 10.1186/1471-2164-9-453] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2008] [Accepted: 10/02/2008] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND It is often the case that mammalian genes are alternatively spliced; the resulting alternate transcripts often encode protein isoforms that differ in amino acid sequences. Changes among the protein isoforms can alter the cellular properties of proteins. The effect can range from a subtle modulation to a complete loss of function. RESULTS (i) We examined human splice-mediated protein isoforms (as extracted from a manually curated data set, and from a computationally predicted data set) for differences in the annotation for protein signatures (Pfam domains and PRINTS fingerprints) and we characterized the differences & their effects on protein functionalities. An important question addressed relates to the extent of protein isoforms that may lack any known function in the cell. (ii) We present a database that reports differences in protein signatures among human splice-mediated protein isoform sequences. CONCLUSION (i) Characterization: The work points to distinct sets of alternatively spliced genes with varying degrees of annotation for the splice-mediated protein isoforms. Protein molecular functions seen to be often affected are those that relate to: binding, catalytic, transcription regulation, structural molecule, transporter, motor, and antioxidant; and the processes that are often affected are nucleic acid binding, signal transduction, and protein-protein interactions. Signatures are often included/excluded and truncated in length among protein isoforms; truncation is seen as the predominant type of change. Analysis points to the following novel aspects: (a) Analysis using data from the manually curated Vega indicates that one in 8.9 genes can lead to a protein isoform of no "known" function; and one in 18 expressed protein isoforms can be such an "orphan" isoform; the corresponding numbers as seen with computationally predicted ASD data set are: one in 4.9 genes and one in 9.8 isoforms. (b) When swapping of signatures occurs, it is often between those of same functional classifications. (c) Pfam domains can occur in varying lengths, and PRINTS fingerprints can occur with varying number of constituent motifs among isoforms - since such a variation is seen in large number of genes, it could be a general mechanism to modulate protein function. (ii) DATA The reported resource (at http://www.bioinformatica.crs4.org/tools/dbs/splivap/) provides the community ability to access data on splice-mediated protein isoforms (with value-added annotation such as association with diseases) through changes in protein signatures.
Collapse
Affiliation(s)
- Matteo Floris
- CRS4-Bioinformatica, Parco Scientifico e Technologico, POLARIS, Edificio 3, 09010 PULA (CA), Sardinia, Italy.
| | | | | |
Collapse
|
245
|
Nagaraj SH, Gasser RB, Ranganathan S. Needles in the EST haystack: large-scale identification and analysis of excretory-secretory (ES) proteins in parasitic nematodes using expressed sequence tags (ESTs). PLoS Negl Trop Dis 2008; 2:e301. [PMID: 18820748 PMCID: PMC2553489 DOI: 10.1371/journal.pntd.0000301] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2008] [Accepted: 08/27/2008] [Indexed: 11/28/2022] Open
Abstract
Background Parasitic nematodes of humans, other animals and plants continue to impose a significant public health and economic burden worldwide, due to the diseases they cause. Promising antiparasitic drug and vaccine candidates have been discovered from excreted or secreted (ES) proteins released from the parasite and exposed to the immune system of the host. Mining the entire expressed sequence tag (EST) data available from parasitic nematodes represents an approach to discover such ES targets. Methods and Findings In this study, we predicted, using EST2Secretome, a novel, high-throughput, computational workflow system, 4,710 ES proteins from 452,134 ESTs derived from 39 different species of nematodes, parasitic in animals (including humans) or plants. In total, 2,632, 786, and 1,292 ES proteins were predicted for animal-, human-, and plant-parasitic nematodes. Subsequently, we systematically analysed ES proteins using computational methods. Of these 4,710 proteins, 2,490 (52.8%) had orthologues in Caenorhabditis elegans, whereas 621 (13.8%) appeared to be novel, currently having no significant match to any molecule available in public databases. Of the C. elegans homologues, 267 had strong “loss-of-function” phenotypes by RNA interference (RNAi) in this nematode. We could functionally classify 1,948 (41.3%) sequences using the Gene Ontology (GO) terms, establish pathway associations for 573 (12.2%) sequences using Kyoto Encyclopaedia of Genes and Genomes (KEGG), and identify protein interaction partners for 1,774 (37.6%) molecules. We also mapped 758 (16.1%) proteins to protein domains including the nematode-specific protein family “transthyretin-like” and “chromadorea ALT,” considered as vaccine candidates against filariasis in humans. Conclusions We report the large-scale analysis of ES proteins inferred from EST data for a range of parasitic nematodes. This set of ES proteins provides an inventory of known and novel members of ES proteins as a foundation for studies focused on understanding the biology of parasitic nematodes and their interactions with their hosts, as well as for the development of novel drugs or vaccines for parasite intervention and control. Excretory-secretory (ES) proteins are an important class of proteins in many organisms, spanning from bacteria to human beings, and are potential drug targets for several diseases. In this study, we first developed a software platform, EST2Secretome, comprised of carefully selected computational tools to identify and analyse ES proteins from expressed sequence tags (ESTs). By employing EST2Secretome, we analysed 4,710 ES proteins derived from 0.5 million ESTs for 39 economically important and disease-causing parasites from the phylum Nematoda. Several known and novel ES proteins that were either parasite- or nematode-specific were discovered, focussing on those that are either absent from or very divergent from similar molecules in their animal or plant hosts. In addition, we found many nematode-specific protein families of domains “transthyretin-like” and “chromadorea ALT,” considered vaccine candidates for filariasis in humans. We report numerous C. elegans homologues with loss-of-function RNAi phenotypes essential for parasite survival and therefore potential targets for parasite intervention. Overall, by developing freely available software to analyse large-scale EST data, we enabled researchers working on parasites for neglected tropical diseases to select specific genes and/or proteins to carry out directed functional assays for demystifying the molecular complexities of host–parasite interactions in a cell.
Collapse
Affiliation(s)
- Shivashankar H Nagaraj
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | | | | |
Collapse
|
246
|
Davis AP, Murphy CG, Saraceni-Richards CA, Rosenstein MC, Wiegers TC, Mattingly CJ. Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks. Nucleic Acids Res 2008; 37:D786-92. [PMID: 18782832 PMCID: PMC2686584 DOI: 10.1093/nar/gkn580] [Citation(s) in RCA: 215] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The Comparative Toxicogenomics Database (CTD) is a curated database that promotes understanding about the effects of environmental chemicals on human health. Biocurators at CTD manually curate chemical–gene interactions, chemical–disease relationships and gene–disease relationships from the literature. This strategy allows data to be integrated to construct chemical–gene–disease networks. CTD is unique in numerous respects: curation focuses on environmental chemicals; interactions are manually curated; interactions are constructed using controlled vocabularies and hierarchies; additional gene attributes (such as Gene Ontology, taxonomy and KEGG pathways) are integrated; data can be viewed from the perspective of a chemical, gene or disease; results and batch queries can be downloaded and saved; and most importantly, CTD acts as both a knowledgebase (by reporting data) and a discovery tool (by generating novel inferences). Over 116 000 interactions between 3900 chemicals and 13 300 genes have been curated from 270 species, and 5900 gene–disease and 2500 chemical–disease direct relationships have been captured. By integrating these data, 350 000 gene–disease relationships and 77 000 chemical–disease relationships can be inferred. This wealth of chemical–gene–disease information yields testable hypotheses for understanding the effects of environmental chemicals on human health. CTD is freely available at http://ctd.mdibl.org.
Collapse
Affiliation(s)
- Allan Peter Davis
- Department of Bioinformatics, The Mount Desert Island Biological Laboratory, Salisbury Cove, ME 04672, USA
| | | | | | | | | | | |
Collapse
|
247
|
Hydrogenomics of the extremely thermophilic bacterium Caldicellulosiruptor saccharolyticus. Appl Environ Microbiol 2008; 74:6720-9. [PMID: 18776029 DOI: 10.1128/aem.00968-08] [Citation(s) in RCA: 125] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Caldicellulosiruptor saccharolyticus is an extremely thermophilic, gram-positive anaerobe which ferments cellulose-, hemicellulose- and pectin-containing biomass to acetate, CO(2), and hydrogen. Its broad substrate range, high hydrogen-producing capacity, and ability to coutilize glucose and xylose make this bacterium an attractive candidate for microbial bioenergy production. Here, the complete genome sequence of C. saccharolyticus, consisting of a 2,970,275-bp circular chromosome encoding 2,679 predicted proteins, is described. Analysis of the genome revealed that C. saccharolyticus has an extensive polysaccharide-hydrolyzing capacity for cellulose, hemicellulose, pectin, and starch, coupled to a large number of ABC transporters for monomeric and oligomeric sugar uptake. The components of the Embden-Meyerhof and nonoxidative pentose phosphate pathways are all present; however, there is no evidence that an Entner-Doudoroff pathway is present. Catabolic pathways for a range of sugars, including rhamnose, fucose, arabinose, glucuronate, fructose, and galactose, were identified. These pathways lead to the production of NADH and reduced ferredoxin. NADH and reduced ferredoxin are subsequently used by two distinct hydrogenases to generate hydrogen. Whole-genome transcriptome analysis revealed that there is significant upregulation of the glycolytic pathway and an ABC-type sugar transporter during growth on glucose and xylose, indicating that C. saccharolyticus coferments these sugars unimpeded by glucose-based catabolite repression. The capacity to simultaneously process and utilize a range of carbohydrates associated with biomass feedstocks is a highly desirable feature of this lignocellulose-utilizing, biofuel-producing bacterium.
Collapse
|
248
|
Cao PJ, Bartley LE, Jung KH, Ronald PC. Construction of a rice glycosyltransferase phylogenomic database and identification of rice-diverged glycosyltransferases. MOLECULAR PLANT 2008; 1:858-77. [PMID: 19825588 DOI: 10.1093/mp/ssn052] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Glycosyltransferases (GTs; EC 2.4.x.y) constitute a large group of enzymes that form glycosidic bonds through transfer of sugars from activated donor molecules to acceptor molecules. GTs are critical to the biosynthesis of plant cell walls, among other diverse functions. Based on the Carbohydrate-Active enZymes (CAZy) database and sequence similarity searches, we have identified 609 potential GT genes (loci) corresponding to 769 transcripts (gene models) in rice (Oryza sativa), the reference monocotyledonous species. Using domain composition and sequence similarity, these rice GTs were classified into 40 CAZy families plus an additional unknown class. We found that two Pfam domains of unknown function, PF04577 and PF04646, are associated with GT families GT61 and GT31, respectively. To facilitate functional analysis of this important and large gene family, we created a phylogenomic Rice GT Database (http://ricephylogenomics.ucdavis.edu/cellwalls/gt/). Through the database, several classes of functional genomic data, including mutant lines and gene expression data, can be displayed for each rice GT in the context of a phylogenetic tree, allowing for comparative analysis both within and between GT families. Comprehensive digital expression analysis of public gene expression data revealed that most ( approximately 80%) rice GTs are expressed. Based on analysis with Inparanoid, we identified 282 'rice-diverged' GTs that lack orthologs in sequenced dicots (Arabidopsis thaliana, Populus tricocarpa, Medicago truncatula, and Ricinus communis). Combining these analyses, we identified 33 rice-diverged GT genes (45 gene models) that are highly expressed in above-ground, vegetative tissues. From the literature and this analysis, 21 of these loci are excellent targets for functional examination toward understanding and manipulating grass cell wall qualities. Study of the remainder may reveal aspects of hormone and protein metabolism that are critical for rice biology. This list of 33 genes and the Rice GT Database will facilitate the study of GTs and cell wall synthesis in rice and other plants.
Collapse
Affiliation(s)
- Pei-Jian Cao
- Department of Plant Pathology, University of California, Davis, CA 95616, USA
| | | | | | | |
Collapse
|
249
|
Chatr-aryamontri A, Kerrien S, Khadake J, Orchard S, Ceol A, Licata L, Castagnoli L, Costa S, Derow C, Huntley R, Aranda B, Leroy C, Thorneycroft D, Apweiler R, Cesareni G, Hermjakob H. MINT and IntAct contribute to the Second BioCreative challenge: serving the text-mining community with high quality molecular interaction data. Genome Biol 2008; 9 Suppl 2:S5. [PMID: 18834496 PMCID: PMC2559989 DOI: 10.1186/gb-2008-9-s2-s5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background In the absence of consolidated pipelines to archive biological data electronically, information dispersed in the literature must be captured by manual annotation. Unfortunately, manual annotation is time consuming and the coverage of published interaction data is therefore far from complete. The use of text-mining tools to identify relevant publications and to assist in the initial information extraction could help to improve the efficiency of the curation process and, as a consequence, the database coverage of data available in the literature. The 2006 BioCreative competition was aimed at evaluating text-mining procedures in comparison with manual annotation of protein-protein interactions. Results To aid the BioCreative protein-protein interaction task, IntAct and MINT (Molecular INTeraction) provided both the training and the test datasets. Data from both databases are comparable because they were curated according to the same standards. During the manual curation process, the major cause of data loss in mining the articles for information was ambiguity in the mapping of the gene names to stable UniProtKB database identifiers. It was also observed that most of the information about interactions was contained only within the full-text of the publication; hence, text mining of protein-protein interaction data will require the analysis of the full-text of the articles and cannot be restricted to the abstract. Conclusion The development of text-mining tools to extract protein-protein interaction information may increase the literature coverage achieved by manual curation. To support the text-mining community, databases will highlight those sentences within the articles that describe the interactions. These will supply data-miners with a high quality dataset for algorithm development. Furthermore, the dictionary of terms created by the BioCreative competitors could enrich the synonym list of the PSI-MI (Proteomics Standards Initiative-Molecular Interactions) controlled vocabulary, which is used by both databases to annotate their data content.
Collapse
Affiliation(s)
- Andrew Chatr-aryamontri
- Department of Biology, University of Rome, Tor Vergata, Via della Ricerca Scientifica, 00133 Rome, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
250
|
Mulvenna J, Hamilton B, Nagaraj SH, Smyth D, Loukas A, Gorman JJ. Proteomics analysis of the excretory/secretory component of the blood-feeding stage of the hookworm, Ancylostoma caninum. Mol Cell Proteomics 2008; 8:109-21. [PMID: 18753127 DOI: 10.1074/mcp.m800206-mcp200] [Citation(s) in RCA: 146] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Hookworms are blood-feeding intestinal parasites of mammalian hosts and are one of the major human ailments affecting approximately 600 million people worldwide. These parasites form an intimate association with the host and are able to avoid vigorous immune responses in many ways including skewing of the response phenotype to promote parasite survival and longevity. The primary interface between the parasite and the host is the excretory/secretory component, a complex mixture of proteins, carbohydrates, and lipids secreted from the surface or oral openings of the parasite. The composition of this complex mixture is for the most part unknown but is likely to contain proteins important for the parasitic lifestyle and hence suitable as drug or vaccine targets. Using a strategy combining the traditional technology of one-dimensional SDS-PAGE and the newer fractionation technology of OFFGEL electrophoresis we identified 105 proteins from the excretory/secretory products of the blood-feeding stage of the dog hookworm, Ancylostoma caninum. Highly represented among the identified proteins were lectins, including three C-type lectins and three beta-galactoside-specific S-type galectins, as well as a number of proteases belonging to the three major classes found in nematodes, aspartic, cysteine, and metalloproteases. Interestingly 28% of the identified proteins were homologous to activation-associated secreted proteins, a family of cysteine-rich secreted proteins belonging to the sterol carrier protein/Tpx-1/Ag5/PR-1/Sc-7 (TAPS) superfamily. Thirty-four of these proteins were identified suggesting an important role in host-parasite interactions. Other protein families identified included hyaluronidases, lysozyme-like proteins, and transthyretin-like proteins. This work identified a suite of proteins important for the parasitic lifestyle and provides new insight into the biology of hookworm infection.
Collapse
Affiliation(s)
- Jason Mulvenna
- Helminth Biology Laboratory, Division of Infectious Diseases, Queensland Institute of Medical Research, Brisbane, Queensland 4006, Australia.
| | | | | | | | | | | |
Collapse
|