301
|
Christie KR, Hong EL, Cherry JM. Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns. Trends Microbiol 2009; 17:286-94. [PMID: 19577472 PMCID: PMC3057094 DOI: 10.1016/j.tim.2009.04.005] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Revised: 04/20/2009] [Accepted: 04/24/2009] [Indexed: 11/27/2022]
Abstract
The quest to characterize each of the genes of the yeast Saccharomyces cerevisiae has propelled the development and application of novel high-throughput (HTP) experimental techniques. To handle the enormous amount of information generated by these techniques, new bioinformatics tools and resources are needed. Gene Ontology (GO) annotations curated by the Saccharomyces Genome Database (SGD) have facilitated the development of algorithms that analyze HTP data and help predict functions for poorly characterized genes in S. cerevisiae and other organisms. Here, we describe how published results are incorporated into GO annotations at SGD and why researchers can benefit from using these resources wisely to analyze their HTP data and predict gene functions.
Collapse
Affiliation(s)
- Karen R Christie
- Department of Genetics, Stanford University Medical School, Stanford, CA 94305-5120, USA
| | | | | |
Collapse
|
302
|
Antezana E, Egaña M, Blondé W, Illarramendi A, Bilbao I, De Baets B, Stevens R, Mironov V, Kuiper M. The Cell Cycle Ontology: an application ontology for the representation and integrated analysis of the cell cycle process. Genome Biol 2009; 10:R58. [PMID: 19480664 PMCID: PMC2718524 DOI: 10.1186/gb-2009-10-5-r58] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2008] [Revised: 04/17/2009] [Accepted: 05/29/2009] [Indexed: 01/26/2023] Open
Abstract
A software resource for the analysis of cell cycle related molecular networks. The Cell Cycle Ontology ( is an application ontology that automatically captures and integrates detailed knowledge on the cell cycle process. Cell Cycle Ontology is enabled by semantic web technologies, and is accessible via the web for browsing, visualizing, advanced querying, and computational reasoning. Cell Cycle Ontology facilitates a detailed analysis of cell cycle-related molecular network components. Through querying and automated reasoning, it may provide new hypotheses to help steer a systems biology approach to biological network building.
Collapse
Affiliation(s)
- Erick Antezana
- Department of Plant Systems Biology, VIB, Technologiepark 927, B-9052 Gent, Belgium.
| | | | | | | | | | | | | | | | | |
Collapse
|
303
|
Chen X, Jorgenson E, Cheung ST. New tools for functional genomic analysis. Drug Discov Today 2009; 14:754-60. [PMID: 19477290 DOI: 10.1016/j.drudis.2009.05.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2008] [Revised: 04/29/2009] [Accepted: 05/13/2009] [Indexed: 12/28/2022]
Abstract
For the past decade, the development of genomic technology has revolutionized modern biological research and drug discovery. Functional genomic analyses enable biologists to perform analysis of genetic events on a global scale and they have been widely used in gene discovery, biomarker determination, disease classification, and drug target identification. In this article, we provide an overview of the current and emerging tools involved in genomic studies, including expression arrays, microRNA arrays, array CGH, ChIP-on-chip, methylation arrays, mutation analysis, genome-wide association studies, proteomic analysis, integrated functional genomic analysis and related bioinformatic and biostatistical analyses. Using human liver cancer as an example, we provide further information of how these genomic approaches can be applied in cancer research.
Collapse
Affiliation(s)
- Xin Chen
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, United States.
| | | | | |
Collapse
|
304
|
Hu Z, Hung JH, Wang Y, Chang YC, Huang CL, Huyck M, DeLisi C. VisANT 3.5: multi-scale network visualization, analysis and inference based on the gene ontology. Nucleic Acids Res 2009; 37:W115-21. [PMID: 19465394 PMCID: PMC2703932 DOI: 10.1093/nar/gkp406] [Citation(s) in RCA: 153] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Despite its wide usage in biological databases and applications, the role of the gene ontology (GO) in network analysis is usually limited to functional annotation of genes or gene sets with auxiliary information on correlations ignored. Here, we report on new capabilities of VisANT--an integrative software platform for the visualization, mining, analysis and modeling of the biological networks--which extend the application of GO in network visualization, analysis and inference. The new VisANT functions can be classified into three categories. (i) Visualization: a new tree-based browser allows visualization of GO hierarchies. GO terms can be easily dropped into the network to group genes annotated under the term, thereby integrating the hierarchical ontology with the network. This facilitates multi-scale visualization and analysis. (ii) Flexible annotation schema: in addition to conventional methods for annotating network nodes with the most specific functional descriptions available, VisANT also provides functions to annotate genes at any customized level of abstraction. (iii) Finding over-represented GO terms and expression-enriched GO modules: two new algorithms have been implemented as VisANT plugins. One detects over-represented GO annotations in any given sub-network and the other finds the GO categories that are enriched in a specified phenotype or perturbed dataset. Both algorithms take account of network topology (i.e. correlations between genes based on various sources of evidence). VisANT is freely available at http://visant.bu.edu.
Collapse
Affiliation(s)
- Zhenjun Hu
- Center for Advanced Genomic Technology, Program in Bioinformatics, Boston University, Boston, MA 02215, USA.
| | | | | | | | | | | | | |
Collapse
|
305
|
Kaczanowski S, Siedlecki P, Zielenkiewicz P. The High Throughput Sequence Annotation Service (HT-SAS) - the shortcut from sequence to true Medline words. BMC Bioinformatics 2009; 10:148. [PMID: 19445703 PMCID: PMC2694793 DOI: 10.1186/1471-2105-10-148] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2008] [Accepted: 05/16/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature. RESULTS To aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage. CONCLUSION Using HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.
Collapse
Affiliation(s)
- Szymon Kaczanowski
- Bioinformatics Department, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, ul Pawinskiego 5a, Warszawa 02-106, Poland.
| | | | | |
Collapse
|
306
|
Finding disease-specific coordinated functions by multi-function genes: insight into the coordination mechanisms in diseases. Genomics 2009; 94:94-100. [PMID: 19427897 DOI: 10.1016/j.ygeno.2009.05.001] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2009] [Accepted: 05/04/2009] [Indexed: 12/31/2022]
Abstract
We developed an approach using multi-function disease genes to find function pairs whose co-deregulation might induce a disease. Analyzing cancer genes, we found many cancer-specific coordinated function pairs co-deregulated by dysfunction of multi-function genes and other molecular changes in cancer. Studying two subtypes of cardiomyopathy, we found they show certain consistency at the functional coordination level. Our approach can also provide important information for finding novel disease genes as well as their mechanisms in diseases.
Collapse
|
307
|
Blomster HA, Hietakangas V, Wu J, Kouvonen P, Hautaniemi S, Sistonen L. Novel proteomics strategy brings insight into the prevalence of SUMO-2 target sites. Mol Cell Proteomics 2009; 8:1382-90. [PMID: 19240082 DOI: 10.1074/mcp.m800551-mcp200] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
Small ubiquitin-like modifier (SUMO) is covalently conjugated to its target proteins thereby altering their activity. The mammalian SUMO protein family includes four members (SUMO-1-4) of which SUMO-2 and SUMO-3 are conjugated in a stress-inducible manner. The vast majority of known SUMO substrates are recognized by the single SUMO E2-conjugating enzyme Ubc9 binding to a consensus tetrapeptide (PsiKXE where Psi stands for a large hydrophobic amino acid) or extended motifs that contain phosphorylated or negatively charged amino acids called PDSM (phosphorylation-dependent sumoylation motif) and NDSM (negatively charged amino acid-dependent sumoylation motif), respectively. We identified 382 SUMO-2 targets using a novel method based on SUMO protease treatment that improves separation of SUMO substrates on SDS-PAGE before LC-ESI-MS/MS. We also implemented a software SUMOFI (SUMO motif finder) to facilitate identification of motifs for SUMO substrates from a user-provided set of proteins and to classify the substrates according to the type of SUMO-targeting consensus site. Surprisingly more than half of the substrates lacked any known consensus site, suggesting that numerous SUMO substrates are recognized by a yet unknown consensus site-independent mechanism. Gene ontology analysis revealed that substrates in distinct functional categories display strikingly different prevalences of NDSM sites. Given that different types of motifs are bound by Ubc9 using alternative mechanisms, our data suggest that the preference of SUMO-2 targeting mechanism depends on the biological function of the substrate.
Collapse
Affiliation(s)
- Henri A Blomster
- Department of Biology, Abo Akademi University and University of Turku, FI-20521 Turku, Finland
| | | | | | | | | | | |
Collapse
|
308
|
Lindeberg M, Biehl BS, Glasner JD, Perna NT, Collmer A, Collmer CW. Gene Ontology annotation highlights shared and divergent pathogenic strategies of type III effector proteins deployed by the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic Escherichia coli strains. BMC Microbiol 2009; 9 Suppl 1:S4. [PMID: 19278552 PMCID: PMC2654664 DOI: 10.1186/1471-2180-9-s1-s4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Genome-informed identification and characterization of Type III effector repertoires in various bacterial strains and species is revealing important insights into the critical roles that these proteins play in the pathogenic strategies of diverse bacteria. However, non-systematic discipline-specific approaches to their annotation impede analysis of the accumulating wealth of data and inhibit easy communication of findings among researchers working on different experimental systems. The development of Gene Ontology (GO) terms to capture biological processes occurring during the interaction between organisms creates a common language that facilitates cross-genome analyses. The application of these terms to annotate type III effector genes in different bacterial species – the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic strains of Escherichia coli – illustrates how GO can effectively describe fundamental similarities and differences among different gene products deployed as part of diverse pathogenic strategies. In depth descriptions of the GO annotations for P. syringae pv tomato DC3000 effector AvrPtoB and the E. coli effector Tir are described, with special emphasis given to GO capability for capturing information about interacting proteins and taxa. GO-highlighted similarities in biological process and molecular function for effectors from additional pathosystems are also discussed.
Collapse
Affiliation(s)
- Magdalen Lindeberg
- Department of Plant Pathology, Cornell University, Ithaca, NY 14850, USA.
| | | | | | | | | | | |
Collapse
|
309
|
Castelo R, Roverato A. Reverse Engineering Molecular Regulatory Networks from Microarray Data with qp-Graphs. J Comput Biol 2009; 16:213-27. [DOI: 10.1089/cmb.2008.08tt] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Robert Castelo
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
- Research Program on Biomedical Informatics, Institut Municipal d'Investigació Mèdica, Barcelona, Spain
| | - Alberto Roverato
- Department of Statistical Science, Università di Bologna, Bologna, Italy
| |
Collapse
|
310
|
Barrell D, Dimmer E, Huntley RP, Binns D, O'Donovan C, Apweiler R. The GOA database in 2009--an integrated Gene Ontology Annotation resource. Nucleic Acids Res 2009; 37:D396-403. [PMID: 18957448 PMCID: PMC2686469 DOI: 10.1093/nar/gkn803] [Citation(s) in RCA: 448] [Impact Index Per Article: 29.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 10/09/2008] [Accepted: 10/10/2008] [Indexed: 11/25/2022] Open
Abstract
The Gene Ontology Annotation (GOA) project at the EBI (http://www.ebi.ac.uk/goa) provides high-quality electronic and manual associations (annotations) of Gene Ontology (GO) terms to UniProt Knowledgebase (UniProtKB) entries. Annotations created by the project are collated with annotations from external databases to provide an extensive, publicly available GO annotation resource. Currently covering over 160 000 taxa, with greater than 32 million annotations, GOA remains the largest and most comprehensive open-source contributor to the GO Consortium (GOC) project. Over the last five years, the group has augmented the number and coverage of their electronic pipelines and a number of new manual annotation projects and collaborations now further enhance this resource. A range of files facilitate the download of annotations for particular species, and GO term information and associated annotations can also be viewed and downloaded from the newly developed GOA QuickGO tool (http://www.ebi.ac.uk/QuickGO), which allows users to precisely tailor their annotation set.
Collapse
Affiliation(s)
- Daniel Barrell
- Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | |
Collapse
|
311
|
Kohro T, Yamazaki T. Mechanism of Statin-Induced Myopathy Investigated Using Microarray Technology. J Atheroscler Thromb 2009; 16:30-2. [DOI: 10.5551/jat.e812] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
312
|
Gatti DM, Sypa M, Rusyn I, Wright FA, Barry WT. SAFEGUI: resampling-based tests of categorical significance in gene expression data made easy. Bioinformatics 2008; 25:541-2. [PMID: 19098030 PMCID: PMC2642635 DOI: 10.1093/bioinformatics/btn655] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
SUMMARY A large number of websites and applications perform significance testing for gene categories/pathways in microarray data. Many of these packages fail to account for expression correlation between transcripts, with a resultant inflation in Type I error. Array permutation and other resampling-based approaches have been proposed as solutions to this problem. SAFEGUI provides a user-friendly graphical interface for the assessment of categorical significance in microarray studies, while properly accounting for the effects of correlations among genes. SAFEGUI incorporates both permutation and more recently proposed bootstrap algorithms that are demonstrated to be more powerful in detecting differential expression across categories of genes. AVAILABILITY http://cebc.unc.edu/software/.
Collapse
Affiliation(s)
- Daniel M Gatti
- Department of Environmental Sciences & Engineering, Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | | | | | |
Collapse
|
313
|
Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2008; 37:1-13. [PMID: 19033363 PMCID: PMC2615629 DOI: 10.1093/nar/gkn923] [Citation(s) in RCA: 10869] [Impact Index Per Article: 679.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Functional analysis of large gene lists, derived in most cases from emerging high-throughput genomic, proteomic and bioinformatics scanning approaches, is still a challenging and daunting task. The gene-annotation enrichment analysis is a promising high-throughput strategy that increases the likelihood for investigators to identify biological processes most pertinent to their study. Approximately 68 bioinformatics enrichment tools that are currently available in the community are collected in this survey. Tools are uniquely categorized into three major classes, according to their underlying enrichment algorithms. The comprehensive collections, unique tool classifications and associated questions/issues will provide a more comprehensive and up-to-date view regarding the advantages, pitfalls and recent trends in a simpler tool-class level rather than by a tool-by-tool approach. Thus, the survey will help tool designers/developers and experienced end users understand the underlying algorithms and pertinent details of particular tool categories/tools, enabling them to make the best choices for their particular research interests.
Collapse
Affiliation(s)
- Da Wei Huang
- Laboratory of Immunopathogenesis and Bioinformatics, Clinical Services Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702, USA
| | | | | |
Collapse
|
314
|
All dosage compensation is local: Gene-by-gene regulation of sex-biased expression on the chicken Z chromosome. Heredity (Edinb) 2008; 102:312-20. [DOI: 10.1038/hdy.2008.116] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
|
315
|
|
316
|
Behfar A, Faustino RS, Arrell DK, Dzeja PP, Perez-Terzic C, Terzic A. Guided stem cell cardiopoiesis: discovery and translation. J Mol Cell Cardiol 2008; 45:523-9. [PMID: 18835562 DOI: 10.1016/j.yjmcc.2008.09.122] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/25/2008] [Revised: 08/06/2008] [Accepted: 09/08/2008] [Indexed: 01/01/2023]
Abstract
Over 1000 patients have participated worldwide in clinical trials exploring the therapeutic value of bone marrow-derived cells in ischemic heart disease. Meta-analysis evaluation of this global effort indicates that adult stem cell therapy is in general safe, but yields a rather modest level of improvement in cardiac function and structural remodeling in the setting of acute myocardial infarction or chronic heart failure. Although promising, the potential of translating adult stem cell-based therapy from bench to bedside has yet to be fully realized. Inter-trial and inter-patient variability contribute to disparity in the regenerative potential of transplanted stem cells with unpredictable efficacy on follow-up. Strategies that mimic the natural embryonic program for uniform recruitment of cardiogenic progenitors from adult sources are currently tested to secure consistent outcome. Guided cardiopoiesis has been implemented with mesenchymal stem cells obtained from bone marrow of healthy volunteers, using a cocktail of secreted proteins that recapitulate components of the endodermal secretome critical for cardiogenic induction of embryonic mesoderm. With appropriate validation of this newly derived cardiopoietic phenotype, the next generation of trials should achieve demonstrable benefit across patient populations.
Collapse
Affiliation(s)
- Atta Behfar
- Marriott Heart Disease Research Program, Division of Cardiovascular Diseases, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA
| | | | | | | | | | | |
Collapse
|
317
|
Hsing M, Byler KG, Cherkasov A. The use of Gene Ontology terms for predicting highly-connected 'hub' nodes in protein-protein interaction networks. BMC SYSTEMS BIOLOGY 2008; 2:80. [PMID: 18796161 PMCID: PMC2553323 DOI: 10.1186/1752-0509-2-80] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/01/2008] [Accepted: 09/16/2008] [Indexed: 11/10/2022]
Abstract
BACKGROUND Protein-protein interactions mediate a wide range of cellular functions and responses and have been studied rigorously through recent large-scale proteomics experiments and bioinformatics analyses. One of the most important findings of those endeavours was the observation that 'hub' proteins participate in significant numbers of protein interactions and play critical roles in the organization and function of cellular protein interaction networks (PINs) 12. It has also been demonstrated that such hub proteins may constitute an important pool of attractive drug targets.Thus, it is crucial to be able to identify hub proteins based not only on experimental data but also by means of bioinformatics predictions. RESULTS A hub protein classifier has been developed based on the available interaction data and Gene Ontology (GO) annotations for proteins in the Escherichia coli, Saccharomyces cerevisiae, Drosophila melanogaster and Homo sapiens genomes. In particular, by utilizing the machine learning method of boosting trees we were able to create a predictive bioinformatics tool for the identification of proteins that are likely to play the role of a hub in protein interaction networks. Testing the developed hub classifier on external sets of experimental protein interaction data in Methicillin-resistant Staphylococcus aureus (MRSA) 252 and Caenorhabditis elegans demonstrated that our approach can predict hub proteins with a high degree of accuracy.A practical application of the developed bioinformatics method has been illustrated by the effective protein bait selection for large-scale pull-down experiments that aim to map complete protein-protein interaction networks for several species. CONCLUSION The successful development of an accurate hub classifier demonstrated that highly-connected proteins tend to share certain relevant functional properties reflected in their Gene Ontology annotations. It is anticipated that the developed bioinformatics hub classifier will represent a useful tool for the theoretical prediction of highly-interacting proteins, the study of cellular network organizations, and the identification of prospective drug targets - even in those organisms that currently lack large-scale protein interaction data.
Collapse
Affiliation(s)
- Michael Hsing
- Faculty of Graduate Studies, Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada.
| | | | | |
Collapse
|