1
|
Chen J, Goudey B, Geard N, Verspoor K. Integration of background knowledge for automatic detection of inconsistencies in gene ontology annotation. Bioinformatics 2024; 40:i390-i400. [PMID: 38940182 PMCID: PMC11256942 DOI: 10.1093/bioinformatics/btae246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Biological background knowledge plays an important role in the manual quality assurance (QA) of biological database records. One such QA task is the detection of inconsistencies in literature-based Gene Ontology Annotation (GOA). This manual verification ensures the accuracy of the GO annotations based on a comprehensive review of the literature used as evidence, Gene Ontology (GO) terms, and annotated genes in GOA records. While automatic approaches for the detection of semantic inconsistencies in GOA have been developed, they operate within predetermined contexts, lacking the ability to leverage broader evidence, especially relevant domain-specific background knowledge. This paper investigates various types of background knowledge that could improve the detection of prevalent inconsistencies in GOA. In addition, the paper proposes several approaches to integrate background knowledge into the automatic GOA inconsistency detection process. RESULTS We have extended a previously developed GOA inconsistency dataset with several kinds of GOA-related background knowledge, including GeneRIF statements, biological concepts mentioned within evidence texts, GO hierarchy and existing GO annotations of the specific gene. We have proposed several effective approaches to integrate background knowledge as part of the automatic GOA inconsistency detection process. The proposed approaches can improve automatic detection of self-consistency and several of the most prevalent types of inconsistencies. This is the first study to explore the advantages of utilizing background knowledge and to propose a practical approach to incorporate knowledge in automatic GOA inconsistency detection. We establish a new benchmark for performance on this task. Our methods may be applicable to various tasks that involve incorporating biological background knowledge. AVAILABILITY AND IMPLEMENTATION https://github.com/jiyuc/de-inconsistency.
Collapse
Affiliation(s)
- Jiyu Chen
- School of Computing and Information Systems, The University of Melbourne, Parkville 3010, VIC, Australia
- Data61, The Commonwealth Scientific and Industrial Research Organisation, Marsfield 2122, NSW, Australia
| | - Benjamin Goudey
- School of Computing and Information Systems, The University of Melbourne, Parkville 3010, VIC, Australia
| | - Nicholas Geard
- School of Computing and Information Systems, The University of Melbourne, Parkville 3010, VIC, Australia
| | - Karin Verspoor
- School of Computing Technologies, RMIT University, Melbourne, Victoria 3000, Australia
| |
Collapse
|
2
|
Feuermann M, Gaudet P. Interpreting Gene Ontology Annotations Derived from Sequence Homology Methods. Methods Mol Biol 2024; 2836:285-298. [PMID: 38995546 DOI: 10.1007/978-1-0716-4007-4_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
The Gene Ontology (GO) project describes the functions of the gene products of organisms from all kingdoms of life in a standardized way, enabling powerful analyses of experiments involving genome-wide analysis. The scientific literature is used to convert experimental results into GO annotations that systematically classify gene products' functions. However, to address the fact that only a minor fraction of all genes has been characterized experimentally, multiple predictive methods to assign GO annotations have been developed since the inception of GO. Sequence homologies between novel genes and genes with known functions help to approximate the roles of these non-characterized genes. Here we describe the main sequence homology methods to produce annotations: pairwise comparison (BLAST), protein profile models (InterPro), and phylogenetic-based annotation (PAINT). Some of these methods can be implemented with genome analysis pipelines (BLAST and InterPro2GO), while PAINT is curated by the GO consortium.
Collapse
Affiliation(s)
- Marc Feuermann
- SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.
| |
Collapse
|
3
|
de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022; 2022:baac062. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]
Abstract
Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.
Collapse
Affiliation(s)
- Valérie de Crécy-lagard
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Cecilia Arighi
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19713, USA
| | - Jill Babor
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ian Blaby
- US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Crysten Blaby-Haas
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Alan J Bridge
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Stephen K Burley
- RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Stacey Cleveland
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Lucy J Colwell
- Departmenf of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| | - Ana Conesa
- Spanish National Research Council, Institute for Integrative Systems Biology, Paterna, Valencia 46980, Spain
| | - Christian Dallago
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
| | - Antoine Danchin
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
| | - Anita de Waard
- Research Collaboration Unit, Elsevier, Jericho, VT 05465, USA
| | - Adam Deutschbauer
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Raquel Dias
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Yousong Ding
- Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
| | - Gang Fang
- NYU-Shanghai, Shanghai 200120, China
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
| | - John Gerlt
- Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Joshua Goldford
- Physics of Living Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Mark Gorelik
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Christopher Henry
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | - Geoffrey Hutinet
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Marshall Jaroch
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | - Peter D Karp
- Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
| | | | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Aron Marchler-Bauer
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Maria-Jesus Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Claire McWhite
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
| | - Gaurav D Moghe
- Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
| | - Paul Monaghan
- Department of Agricultural Education and Communication, University of Florida, Gainesville, FL 32611, USA
| | - Anne Morgat
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
| | - Christopher J Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Darren A Natale
- Georgetown University Medical Center, Washington, DC 20007, USA
| | - William C Nelson
- Biological Sciences Division, Pacific Northwest National Laboratories, Richland, WA 99354, USA
| | - Seán O’Donoghue
- School of Biotechnology and Biomolecular Sciences, University of NSW, Sydney, NSW 2052, Australia
| | - Christine Orengo
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | - Predrag Radivojac
- Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
| | - Colbie Reed
- Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
| | | | - Dmitri Rodionov
- Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
| | - Irina A Rodionova
- Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
| | - Jeffrey D Rudolf
- Department of Chemistry, University of Florida, Gainesville, FL 32611, USA
| | - Lana Saleh
- New England Biolabs, Ipswich, MA 01938, USA
| | - Gloria Sheynkman
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
| | - Francoise Thibaud-Nissen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
| | - Paul D Thomas
- Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
| | - Peter Uetz
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - David Vallenet
- LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry 91057, France
| | - Erica Watson Carter
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| | | | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Elisha M Wood-Charlson
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jin Xu
- Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
| |
Collapse
|
4
|
Thomas PD, Ebert D, Muruganujan A, Mushayahama T, Albou L, Mi H. PANTHER: Making genome-scale phylogenetics accessible to all. Protein Sci 2022; 31:8-22. [PMID: 34717010 PMCID: PMC8740835 DOI: 10.1002/pro.4218] [Citation(s) in RCA: 586] [Impact Index Per Article: 293.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2021] [Revised: 10/24/2021] [Accepted: 10/26/2021] [Indexed: 02/03/2023]
Abstract
Phylogenetics is a powerful tool for analyzing protein sequences, by inferring their evolutionary relationships to other proteins. However, phylogenetics analyses can be challenging: they are computationally expensive and must be performed carefully in order to avoid systematic errors and artifacts. Protein Analysis THrough Evolutionary Relationships (PANTHER; http://pantherdb.org) is a publicly available, user-focused knowledgebase that stores the results of an extensive phylogenetic reconstruction pipeline that includes computational and manual processes and quality control steps. First, fully reconciled phylogenetic trees (including ancestral protein sequences) are reconstructed for a set of "reference" protein sequences obtained from fully sequenced genomes of organisms across the tree of life. Second, the resulting phylogenetic trees are manually reviewed and annotated with function evolution events: inferred gains and losses of protein function along branches of the phylogenetic tree. Here, we describe in detail the current contents of PANTHER, how those contents are generated, and how they can be used in a variety of applications. The PANTHER knowledgebase can be downloaded or accessed via an extensive API. In addition, PANTHER provides software tools to facilitate the application of the knowledgebase to common protein sequence analysis tasks: exploring an annotated genome by gene function; performing "enrichment analysis" of lists of genes; annotating a single sequence or large batch of sequences by homology; and assessing the likelihood that a genetic variant at a particular site in a protein will have deleterious effects.
Collapse
Affiliation(s)
- Paul D. Thomas
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Laurent‐Philippe Albou
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Population and Public Health SciencesUniversity of Southern CaliforniaLos AngelesCaliforniaUSA
| |
Collapse
|
5
|
Rutherford KM, Harris MA, Oliferenko S, Wood V. JaponicusDB: rapid deployment of a model organism database for an emerging model species. Genetics 2021; 220:6481558. [PMID: 35380656 PMCID: PMC9209809 DOI: 10.1093/genetics/iyab223] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/09/2021] [Indexed: 02/03/2023] Open
Abstract
The fission yeast Schizosaccharomyces japonicus has recently emerged as a powerful system for studying the evolution of essential cellular processes, drawing on similarities as well as key differences between S. japonicus and the related, well-established model Schizosaccharomyces pombe. We have deployed the open-source, modular code and tools originally developed for PomBase, the S. pombe model organism database (MOD), to create JaponicusDB (www.japonicusdb.org), a new MOD dedicated to S. japonicus. By providing a central resource with ready access to a growing body of experimental data, ontology-based curation, seamless browsing and querying, and the ability to integrate new data with existing knowledge, JaponicusDB supports fission yeast biologists to a far greater extent than any other source of S. japonicus data. JaponicusDB thus enables S. japonicus researchers to realize the full potential of studying a newly emerging model species and illustrates the widely applicable power and utility of harnessing reusable PomBase code to build a comprehensive, community-maintainable repository of species-relevant knowledge.
Collapse
Affiliation(s)
- Kim M Rutherford
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Midori A Harris
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
| | - Snezhana Oliferenko
- The Francis Crick Institute, London NW1 1AT, UK,Randall Centre for Cell and Molecular Biophysics, School of Basic and Medical Biosciences, King’s College London, London SE1 1UL, UK,Corresponding author: (S.O.); (V.W.)
| | - Valerie Wood
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK,Corresponding author: (S.O.); (V.W.)
| |
Collapse
|
6
|
Liu L, Zhu S. Computational Methods for Prediction of Human Protein-Phenotype Associations: A Review. PHENOMICS (CHAM, SWITZERLAND) 2021; 1:171-185. [PMID: 36939789 PMCID: PMC9590544 DOI: 10.1007/s43657-021-00019-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 06/05/2021] [Accepted: 06/16/2021] [Indexed: 12/01/2022]
Abstract
Deciphering the relationship between human proteins (genes) and phenotypes is one of the fundamental tasks in phenomics research. The Human Phenotype Ontology (HPO) builds upon a standardized logical vocabulary to describe the abnormal phenotypes encountered in human diseases and paves the way towards the computational analysis of their genetic causes. To date, many computational methods have been proposed to predict the HPO annotations of proteins. In this paper, we conduct a comprehensive review of the existing approaches to predicting HPO annotations of novel proteins, identifying missing HPO annotations, and prioritizing candidate proteins with respect to a certain HPO term. For each topic, we first give the formalized description of the problem, and then systematically revisit the published literatures highlighting their advantages and disadvantages, followed by the discussion on the challenges and promising future directions. In addition, we point out several potential topics to be worthy of exploration including the selection of negative HPO annotations and detecting HPO misannotations. We believe that this review will provide insight to the researchers in the field of computational phenotype analyses in terms of comprehending and developing novel prediction algorithms.
Collapse
Affiliation(s)
- Lizhi Liu
- School of Computer Science, Fudan University, Shanghai, 200433 China
| | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, 200433 China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, 200433 China
- MOE Frontiers Center for Brain Science, Fudan University, Shanghai, 200433 China
- Zhangjiang Fudan International Innovation Center, Shanghai, 200433 China
- Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, 200433 China
| |
Collapse
|
7
|
Wei X, Zhang C, Freddolino PL, Zhang Y. Detecting Gene Ontology misannotations using taxon-specific rate ratio comparisons. Bioinformatics 2021; 36:4383-4388. [PMID: 32470107 DOI: 10.1093/bioinformatics/btaa548] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 03/24/2020] [Accepted: 05/26/2020] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Many protein function databases are built on automated or semi-automated curations and can contain various annotation errors. The correction of such misannotations is critical to improving the accuracy and reliability of the databases. RESULTS We proposed a new approach to detect potentially incorrect Gene Ontology (GO) annotations by comparing the ratio of annotation rates (RAR) for the same GO term across different taxonomic groups, where those with a relatively low RAR usually correspond to incorrect annotations. As an illustration, we applied the approach to 20 commonly studied species in two recent UniProt-GOA releases and identified 250 potential misannotations in the 2018-11-6 release, where only 25% of them were corrected in the 2019-6-3 release. Importantly, 56% of the misannotations are 'Inferred from Biological aspect of Ancestor (IBA)' which is in contradiction with previous observations that attributed misannotations mainly to 'Inferred from Sequence or structural Similarity (ISS)', probably reflecting an error source shift due to the new developments of function annotation databases. The results demonstrated a simple but efficient misannotation detection approach that is useful for large-scale comparative protein function studies. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/RAR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xiaoqiong Wei
- State Key Laboratory of Biotherapy and Cancer Center/Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China.,Department of Computational Medicine and Bioinformatics
| | | | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
8
|
The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res 2021; 49:D325-D334. [PMID: 33290552 PMCID: PMC7779012 DOI: 10.1093/nar/gkaa1113] [Citation(s) in RCA: 1973] [Impact Index Per Article: 657.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Revised: 10/22/2020] [Accepted: 12/02/2020] [Indexed: 12/28/2022] Open
Abstract
The Gene Ontology Consortium (GOC) provides the most comprehensive resource currently available for computable knowledge regarding the functions of genes and gene products. Here, we report the advances of the consortium over the past two years. The new GO-CAM annotation framework was notably improved, and we formalized the model with a computational schema to check and validate the rapidly increasing repository of 2838 GO-CAMs. In addition, we describe the impacts of several collaborations to refine GO and report a 10% increase in the number of GO annotations, a 25% increase in annotated gene products, and over 9,400 new scientific articles annotated. As the project matures, we continue our efforts to review older annotations in light of newer findings, and, to maintain consistency with other ontologies. As a result, 20 000 annotations derived from experimental data were reviewed, corresponding to 2.5% of experimental GO annotations. The website (http://geneontology.org) was redesigned for quick access to documentation, downloads and tools. To maintain an accurate resource and support traceability and reproducibility, we have made available a historical archive covering the past 15 years of GO data with a consistent format and file structure for both the ontology and annotations.
Collapse
|
9
|
Abstract
MOTIVATION With the ever-increasing number and diversity of sequenced species, the challenge to characterize genes with functional information is even more important. In most species, this characterization almost entirely relies on automated electronic methods. As such, it is critical to benchmark the various methods. The Critical Assessment of protein Function Annotation algorithms (CAFA) series of community experiments provide the most comprehensive benchmark, with a time-delayed analysis leveraging newly curated experimentally supported annotations. However, the definition of a false positive in CAFA has not fully accounted for the open world assumption (OWA), leading to a systematic underestimation of precision. The main reason for this limitation is the relative paucity of negative experimental annotations. RESULTS This article introduces a new, OWA-compliant, benchmark based on a balanced test set of positive and negative annotations. The negative annotations are derived from expert-curated annotations of protein families on phylogenetic trees. This approach results in a large increase in the average information content of negative annotations. The benchmark has been tested using the naïve and BLAST baseline methods, as well as two orthology-based methods. This new benchmark could complement existing ones in future CAFA experiments. AVAILABILITY AND IMPLEMENTATION All data, as well as code used for analysis, is available from https://lab.dessimoz.org/20_not. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex Warwick Vesztrocy
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Genetics, Evolution and Environment, University College London, London, WC1E 6BT, UK
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland
- Department of Computer Science, University College London, London, WC1E 6BT, UK
- Centre for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland
| |
Collapse
|
10
|
Walls RL, Cooper L, Elser J, Gandolfo MA, Mungall CJ, Smith B, Stevenson DW, Jaiswal P. The Plant Ontology Facilitates Comparisons of Plant Development Stages Across Species. FRONTIERS IN PLANT SCIENCE 2019; 10:631. [PMID: 31214208 PMCID: PMC6558174 DOI: 10.3389/fpls.2019.00631] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 04/26/2019] [Indexed: 06/09/2023]
Abstract
The Plant Ontology (PO) is a community resource consisting of standardized terms, definitions, and logical relations describing plant structures and development stages, augmented by a large database of annotations from genomic and phenomic studies. This paper describes the structure of the ontology and the design principles we used in constructing PO terms for plant development stages. It also provides details of the methodology and rationale behind our revision and expansion of the PO to cover development stages for all plants, particularly the land plants (bryophytes through angiosperms). As a case study to illustrate the general approach, we examine variation in gene expression across embryo development stages in Arabidopsis and maize, demonstrating how the PO can be used to compare patterns of expression across stages and in developmentally different species. Although many genes appear to be active throughout embryo development, we identified a small set of uniquely expressed genes for each stage of embryo development and also between the two species. Evaluating the different sets of genes expressed during embryo development in Arabidopsis or maize may inform future studies of the divergent developmental pathways observed in monocotyledonous versus dicotyledonous species. The PO and its annotation database (http://www.planteome.org) make plant data for any species more discoverable and accessible through common formats, thus providing support for applications in plant pathology, image analysis, and comparative development and evolution.
Collapse
Affiliation(s)
- Ramona L. Walls
- CyVerse, Bio5 Institute, The University of Arizona, Tucson, AZ, United States
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Maria Alejandra Gandolfo
- Liberty Hyde Bailey Hortorium, Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY, United States
| | | | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
11
|
Foulger RE, Denny P, Hardy J, Martin MJ, Sawford T, Lovering RC. Using the Gene Ontology to Annotate Key Players in Parkinson's Disease. Neuroinformatics 2018; 14:297-304. [PMID: 26825309 PMCID: PMC4896971 DOI: 10.1007/s12021-015-9293-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
The Gene Ontology (GO) is widely recognised as the gold standard bioinformatics resource for summarizing functional knowledge of gene products in a consistent and computable, information-rich language. GO describes cellular and organismal processes across all species, yet until now there has been a considerable gene annotation deficit within the neurological and immunological domains, both of which are relevant to Parkinson’s disease. Here we introduce the Parkinson’s disease GO Annotation Project, funded by Parkinson’s UK and supported by the GO Consortium, which is addressing this deficit by providing GO annotation to Parkinson’s-relevant human gene products, principally through expert literature curation. We discuss the steps taken to prioritise proteins, publications and cellular processes for annotation, examples of how GO annotations capture Parkinson’s-relevant information, and the advantages that a topic-focused annotation approach offers to users. Building on the existing GO resource, this project collates a vast amount of Parkinson’s-relevant literature into a set of high-quality annotations to be utilized by the research community.
Collapse
Affiliation(s)
- R E Foulger
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK.
| | - P Denny
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK
| | - J Hardy
- Department of Molecular Neuroscience, Institute of Neurology, University College London, London, UK
| | - M J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - T Sawford
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, UK
| | - R C Lovering
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK
| |
Collapse
|
12
|
Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol Biol Evol 2018; 34:2115-2122. [PMID: 28460117 PMCID: PMC5850834 DOI: 10.1093/molbev/msx148] [Citation(s) in RCA: 1619] [Impact Index Per Article: 269.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Luis Pedro Coelho
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich, Switzerland
| | - Lars Juhl Jensen
- The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Germany Molecular Medicine Partnership Unit (MMPU), University Hospital Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany.,Max Delbrück Centre for Molecular Medicine, Berlin, Germany.,Department of Bioinformatics, Biocenter University of Würzburg, Würzburg, Germany
| |
Collapse
|
13
|
Roncaglia P, van Dam TJP, Christie KR, Nacheva L, Toedt G, Huynen MA, Huntley RP, Gibson TJ, Lomax J. The Gene Ontology of eukaryotic cilia and flagella. Cilia 2017; 6:10. [PMID: 29177046 PMCID: PMC5688719 DOI: 10.1186/s13630-017-0054-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 10/30/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Recent research into ciliary structure and function provides important insights into inherited diseases termed ciliopathies and other cilia-related disorders. This wealth of knowledge needs to be translated into a computational representation to be fully exploitable by the research community. To this end, members of the Gene Ontology (GO) and SYSCILIA Consortia have worked together to improve representation of ciliary substructures and processes in GO. METHODS Members of the SYSCILIA and Gene Ontology Consortia suggested additions and changes to GO, to reflect new knowledge in the field. The project initially aimed to improve coverage of ciliary parts, and was then broadened to cilia-related biological processes. Discussions were documented in a public tracker. We engaged the broader cilia community via direct consultation and by referring to the literature. Ontology updates were implemented via ontology editing tools. RESULTS So far, we have created or modified 127 GO terms representing parts and processes related to eukaryotic cilia/flagella or prokaryotic flagella. A growing number of biological pathways are known to involve cilia, and we continue to incorporate this knowledge in GO. The resulting expansion in GO allows more precise representation of experimentally derived knowledge, and SYSCILIA and GO biocurators have created 199 annotations to 50 human ciliary proteins. The revised ontology was also used to curate mouse proteins in a collaborative project. The revised GO and annotations, used in comparative 'before and after' analyses of representative ciliary datasets, improve enrichment results significantly. CONCLUSIONS Our work has resulted in a broader and deeper coverage of ciliary composition and function. These improvements in ontology and protein annotation will benefit all users of GO enrichment analysis tools, as well as the ciliary research community, in areas ranging from microscopy image annotation to interpretation of high-throughput studies. We welcome feedback to further enhance the representation of cilia biology in GO.
Collapse
Affiliation(s)
- Paola Roncaglia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- The Gene Ontology Consortium, http://geneontology.org
| | - Teunis J. P. van Dam
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, PO Box 9101, 6500 HB Nijmegen, The Netherlands
- Theoretical Biology and Bioinformatics, Department of Biology, Faculty of Science, Utrecht University, Utrecht, The Netherlands
| | - Karen R. Christie
- The Gene Ontology Consortium, http://geneontology.org
- The Jackson Laboratory, 600 Main Street, Bar Harbor, ME 04609 USA
| | - Lora Nacheva
- Fakultät Biowissenschaften, Universität Heidelberg, Im Neuenheimer Feld 234, 69120 Heidelberg, Germany
| | - Grischa Toedt
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Martijn A. Huynen
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, PO Box 9101, 6500 HB Nijmegen, The Netherlands
| | - Rachael P. Huntley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- Present Address: Centre for Cardiovascular Genetics, University College London, London, WC1E 6JF UK
| | - Toby J. Gibson
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany
| | - Jane Lomax
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD UK
- The Gene Ontology Consortium, http://geneontology.org
- Present Address: SciBite Limited, BioData Innovation Centre, Wellcome Genome Campus, Cambridge, CB10 1DR UK
| |
Collapse
|
14
|
Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, von Mering C, Bork P. Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper. Mol Biol Evol 2017. [PMID: 28460117 DOI: 10.1093/molbev/msx148.] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Orthology assignment is ideally suited for functional inference. However, because predicting orthology is computationally intensive at large scale, and most pipelines are relatively inaccessible (e.g., new assignments only available through database updates), less precise homology-based functional transfer is still the default for (meta-)genome annotation. We, therefore, developed eggNOG-mapper, a tool for functional annotation of large sets of sequences based on fast orthology assignments using precomputed clusters and phylogenies from the eggNOG database. To validate our method, we benchmarked Gene Ontology (GO) predictions against two widely used homology-based approaches: BLAST and InterProScan. Orthology filters applied to BLAST results reduced the rate of false positive assignments by 11%, and increased the ratio of experimentally validated terms recovered over all terms assigned per protein by 15%. Compared with InterProScan, eggNOG-mapper achieved similar proteome coverage and precision while predicting, on average, 41 more terms per protein and increasing the rate of experimentally validated terms recovered over total term assignments per protein by 35%. EggNOG-mapper predictions scored within the top-5 methods in the three GO categories using the CAFA2 NK-partial benchmark. Finally, we evaluated eggNOG-mapper for functional annotation of metagenomics data, yielding better performance than interProScan. eggNOG-mapper runs ∼15× faster than BLAST and at least 2.5× faster than InterProScan. The tool is available standalone and as an online service at http://eggnog-mapper.embl.de.
Collapse
Affiliation(s)
- Jaime Huerta-Cepas
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Kristoffer Forslund
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Luis Pedro Coelho
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Damian Szklarczyk
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich, Switzerland
| | - Lars Juhl Jensen
- The Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Christian von Mering
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Bioinformatics/Systems Biology Group, Swiss Institute of Bioinformatics (SIB), Zurich, Switzerland
| | - Peer Bork
- Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.,Germany Molecular Medicine Partnership Unit (MMPU), University Hospital Heidelberg and European Molecular Biology Laboratory, Heidelberg, Germany.,Max Delbrück Centre for Molecular Medicine, Berlin, Germany.,Department of Bioinformatics, Biocenter University of Würzburg, Würzburg, Germany
| |
Collapse
|
15
|
Abstract
The overarching goal of the Gene Ontology (GO) Consortium is to provide researchers in biology and biomedicine with all current functional information concerning genes and the cellular context under which these occur. When the GO was started in the 1990s surprisingly little attention had been given to how functional information about genes was to be uniformly captured, structured in a computable form, and made accessible to biologists. Because knowledge of gene, protein, ncRNA, and molecular complex roles is continuously accumulating and changing, the GO needed to be a dynamic resource, accurately tracking ongoing research results over time. Here I describe the progress that has been made over the years towards this goal, and the work that still remains to be done, to make of the Gene Ontology (GO) Consortium realize its goal of offering the most comprehensive and up-to-date resource for information on gene function.
Collapse
Affiliation(s)
- Suzanna E Lewis
- Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.
| |
Collapse
|
16
|
Abstract
The Gene Ontology (GO) is a framework designed to represent biological knowledge about gene products' biological roles and the cellular location in which they act. Biocuration is a complex process: the body of scientific literature is large and selection of appropriate GO terms can be challenging. Both these issues are compounded by the fact that our understanding of biology is still incomplete; hence it is important to appreciate that GO is inherently an evolving model. In this chapter, we describe how biocurators create GO annotations from experimental findings from research articles. We describe the current best practices for high-quality literature curation and how GO curators succeed in modeling biology using a relatively simple framework. We also highlight a number of difficulties when translating experimental assays into GO annotations.
Collapse
Affiliation(s)
- Sylvain Poux
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211, Geneva 4, Switzerland
| | - Pascale Gaudet
- CALIPHO group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211, Geneva 4, Switzerland. .,Department of Human Protein Sciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
17
|
Falda M, Lavezzo E, Fontana P, Bianco L, Berselli M, Formentin E, Toppo S. Eliciting the Functional Taxonomy from protein annotations and taxa. Sci Rep 2016; 6:31971. [PMID: 27534507 PMCID: PMC4989186 DOI: 10.1038/srep31971] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 08/01/2016] [Indexed: 11/30/2022] Open
Abstract
The advances of omics technologies have triggered the production of an enormous volume of data coming from thousands of species. Meanwhile, joint international efforts like the Gene Ontology (GO) consortium have worked to provide functional information for a vast amount of proteins. With these data available, we have developed FunTaxIS, a tool that is the first attempt to infer functional taxonomy (i.e. how functions are distributed over taxa) combining functional and taxonomic information. FunTaxIS is able to define a taxon specific functional space by exploiting annotation frequencies in order to establish if a function can or cannot be used to annotate a certain species. The tool generates constraints between GO terms and taxa and then propagates these relations over the taxonomic tree and the GO graph. Since these constraints nearly cover the whole taxonomy, it is possible to obtain the mapping of a function over the taxonomy. FunTaxIS can be used to make functional comparative analyses among taxa, to detect improper associations between taxa and functions, and to discover how functional knowledge is either distributed or missing. A benchmark test set based on six different model species has been devised to get useful insights on the generated taxonomic rules.
Collapse
Affiliation(s)
- Marco Falda
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| | - Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| | - Paolo Fontana
- Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, 38010, Italy
| | - Luca Bianco
- Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, 38010, Italy
| | - Michele Berselli
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| | - Elide Formentin
- Department of Biology, University of Padova, Padova, 35131, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, 35131, Italy
| |
Collapse
|
18
|
Poley JD, Sutherland BJG, Jones SRM, Koop BF, Fast MD. Sex-biased gene expression and sequence conservation in Atlantic and Pacific salmon lice (Lepeophtheirus salmonis). BMC Genomics 2016; 17:483. [PMID: 27377915 PMCID: PMC4932673 DOI: 10.1186/s12864-016-2835-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 06/13/2016] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Salmon lice, Lepeophtheirus salmonis (Copepoda: Caligidae), are highly important ectoparasites of farmed and wild salmonids, and cause multi-million dollar losses to the salmon aquaculture industry annually. Salmon lice display extensive sexual dimorphism in ontogeny, morphology, physiology, behavior, and more. Therefore, the identification of transcripts with differential expression between males and females (sex-biased transcripts) may help elucidate the relationship between sexual selection and sexually dimorphic characteristics. RESULTS Sex-biased transcripts were identified from transcriptome analyses of three L. salmonis populations, including both Atlantic and Pacific subspecies. A total of 35-43 % of all quality-filtered transcripts were sex-biased in L. salmonis, with male-biased transcripts exhibiting higher fold change than female-biased transcripts. For Gene Ontology and functional analyses, a consensus-based approach was used to identify concordantly differentially expressed sex-biased transcripts across the three populations. A total of 127 male-specific transcripts (i.e. those without detectable expression in any female) were identified, and were enriched with reproductive functions (e.g. seminal fluid and male accessory gland proteins). Other sex-biased transcripts involved in morphogenesis, feeding, energy generation, and sensory and immune system development and function were also identified. Interestingly, as observed in model systems, male-biased L. salmonis transcripts were more frequently without annotation compared to female-biased or unbiased transcripts, suggesting higher rates of sequence divergence in male-biased transcripts. CONCLUSIONS Transcriptome differences between male and female L. salmonis described here provide key insights into the molecular mechanisms controlling sexual dimorphism in L. salmonis. This analysis offers targets for parasite control and provides a foundation for further analyses exploring critical topics such as the interaction between sex and drug resistance, sex-specific factors in host-parasite relationships, and reproductive roles within L. salmonis.
Collapse
Affiliation(s)
- Jordan D Poley
- Department of Pathology & Microbiology, Atlantic Veterinary College, University of Prince Edward Island, 550 University Ave, Charlottetown, PE, C1A 4P3, Canada
| | - Ben J G Sutherland
- Department of Biology, Centre for Biomedical Research, University of Victoria, 3800 Finnerty Rd, Victoria, BC, V8W 3 N5, Canada.,Present address: Département de biologie, Institut de Biologie Intégrative et des Systèms (IBIS), Université Laval, 1030 Avenue de la Medecine, Québec, QC, Canada
| | - Simon R M Jones
- Pacific Biological Station, 3190 Hammond Bay Road, Nanaimo, BC, V9T 6 N7, Canada
| | - Ben F Koop
- Department of Biology, Centre for Biomedical Research, University of Victoria, 3800 Finnerty Rd, Victoria, BC, V8W 3 N5, Canada
| | - Mark D Fast
- Department of Pathology & Microbiology, Atlantic Veterinary College, University of Prince Edward Island, 550 University Ave, Charlottetown, PE, C1A 4P3, Canada.
| |
Collapse
|
19
|
Lavezzo E, Falda M, Fontana P, Bianco L, Toppo S. Enhancing protein function prediction with taxonomic constraints--The Argot2.5 web server. Methods 2015; 93:15-23. [PMID: 26318087 DOI: 10.1016/j.ymeth.2015.08.021] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Revised: 08/14/2015] [Accepted: 08/25/2015] [Indexed: 10/23/2022] Open
Abstract
Argot2.5 (Annotation Retrieval of Gene Ontology Terms) is a web server designed to predict protein function. It is an updated version of the previous Argot2 enriched with new features in order to enhance its usability and its overall performance. The algorithmic strategy exploits the grouping of Gene Ontology terms by means of semantic similarity to infer protein function. The tool has been challenged over two independent benchmarks and compared to Argot2, PANNZER, and a baseline method relying on BLAST, proving to obtain a better performance thanks to the contribution of some key interventions in critical steps of the working pipeline. The most effective changes regard: (a) the selection of the input data from sequence similarity searches performed against a clustered version of UniProt databank and a remodeling of the weights given to Pfam hits, (b) the application of taxonomic constraints to filter out annotations that cannot be applied to proteins belonging to the species under investigation. The taxonomic rules are derived from our in-house developed tool, FunTaxIS, that extends those provided by the Gene Ontology consortium. The web server is free for academic users and is available online at http://www.medcomp.medicina.unipd.it/Argot2-5/.
Collapse
Affiliation(s)
- Enrico Lavezzo
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Marco Falda
- Department of Molecular Medicine, University of Padova, Padova, Italy
| | - Paolo Fontana
- Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, Italy
| | - Luca Bianco
- Istituto Agrario San Michele all'Adige Research and Innovation Centre, Foundation Edmund Mach, Trento, Italy
| | - Stefano Toppo
- Department of Molecular Medicine, University of Padova, Padova, Italy.
| |
Collapse
|
20
|
Dahdul WM, Cui H, Mabee PM, Mungall CJ, Osumi-Sutherland D, Walls RL, Haendel MA. Nose to tail, roots to shoots: spatial descriptors for phenotypic diversity in the Biological Spatial Ontology. J Biomed Semantics 2014; 5:34. [PMID: 25140222 PMCID: PMC4137724 DOI: 10.1186/2041-1480-5-34] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 06/16/2014] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Spatial terminology is used in anatomy to indicate precise, relative positions of structures in an organism. While these terms are often standardized within specific fields of biology, they can differ dramatically across taxa. Such differences in usage can impair our ability to unambiguously refer to anatomical position when comparing anatomy or phenotypes across species. We developed the Biological Spatial Ontology (BSPO) to standardize the description of spatial and topological relationships across taxa to enable the discovery of comparable phenotypes. RESULTS BSPO currently contains 146 classes and 58 relations representing anatomical axes, gradients, regions, planes, sides, and surfaces. These concepts can be used at multiple biological scales and in a diversity of taxa, including plants, animals and fungi. The BSPO is used to provide a source of anatomical location descriptors for logically defining anatomical entity classes in anatomy ontologies. Spatial reasoning is further enhanced in anatomy ontologies by integrating spatial relations such as dorsal_to into class descriptions (e.g., 'dorsolateral placode' dorsal_to some 'epibranchial placode'). CONCLUSIONS The BSPO is currently used by projects that require standardized anatomical descriptors for phenotype annotation and ontology integration across a diversity of taxa. Anatomical location classes are also useful for describing phenotypic differences, such as morphological variation in position of structures resulting from evolution within and across species.
Collapse
Affiliation(s)
- Wasila M Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, USA
- National Evolutionary Synthesis Center, Durham, NC, USA
| | - Hong Cui
- School of Information Resource and Library Science, University of Arizona, Tucson, AZ, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
| | | | | | - Ramona L Walls
- The iPlant Collaborative, Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Melissa A Haendel
- Library and Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
21
|
Huntley RP, Sawford T, Martin MJ, O'Donovan C. Understanding how and why the Gene Ontology and its annotations evolve: the GO within UniProt. Gigascience 2014; 3:4. [PMID: 24641996 PMCID: PMC3995153 DOI: 10.1186/2047-217x-3-4] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 03/10/2014] [Indexed: 11/01/2022] Open
Abstract
The Gene Ontology Consortium (GOC) is a major bioinformatics project that provides structured controlled vocabularies to classify gene product function and location. GOC members create annotations to gene products using the Gene Ontology (GO) vocabularies, thus providing an extensive, publicly available resource. The GO and its annotations to gene products are now an integral part of functional analysis, and statistical tests using GO data are becoming routine for researchers to include when publishing functional information. While many helpful articles about the GOC are available, there are certain updates to the ontology and annotation sets that sometimes go unobserved. Here we describe some of the ways in which GO can change that should be carefully considered by all users of GO as they may have a significant impact on the resulting gene product annotations, and therefore the functional description of the gene product, or the interpretation of analyses performed on GO datasets. GO annotations for gene products change for many reasons, and while these changes generally improve the accuracy of the representation of the underlying biology, they do not necessarily imply that previous annotations were incorrect. We additionally describe the quality assurance mechanisms we employ to improve the accuracy of annotations, which necessarily changes the composition of the annotation sets we provide. We use the Universal Protein Resource (UniProt) for illustrative purposes of how the GO Consortium, as a whole, manages these changes.
Collapse
Affiliation(s)
- Rachael P Huntley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | |
Collapse
|
22
|
Rutherford KM, Harris MA, Lock A, Oliver SG, Wood V. Canto: an online tool for community literature curation. ACTA ACUST UNITED AC 2014; 30:1791-2. [PMID: 24574118 PMCID: PMC4058955 DOI: 10.1093/bioinformatics/btu103] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Motivation: Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Availability: Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). Contact:helpdesk@pombase.org
Collapse
Affiliation(s)
- Kim M Rutherford
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UKCambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Midori A Harris
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UKCambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Antonia Lock
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Stephen G Oliver
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UKCambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Valerie Wood
- Cambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UKCambridge Systems Biology Centre, Department of Biochemistry, University of Cambridge, Sanger Building, 80 Tennis Court Road, Cambridge CB2 1GA and Department of Genetics, Evolution and Environment, and UCL Cancer Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
23
|
Chen B, Wild DJ. Practice and Challenges of Building a Semantic Framework for Chemogenomics Research. Mol Inform 2013; 32:1000-8. [PMID: 27481145 DOI: 10.1002/minf.201300078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2013] [Accepted: 09/23/2013] [Indexed: 11/07/2022]
Abstract
Effective discovery of new drugs for complex diseases demands an integrative analysis of big data aggregated from diverse sources in chemical and biological domains, to help better understand the mechanism of drug actions and to quickly translate discovery to clinical applications. Conventional approaches are confronting critical challenges in the integration of those huge heterogeneous datasets and the rapid transformation from data to knowledge. Semantic technologies aimed at facilitating the building of a common framework that allows data sharing and utilization across applications and domains in the web, have been developed quickly and have been exhibiting a broad impact in life science. Chemogenomics serves as a bridge to connect various chemical and biological data, thus building a semantic framework for chemogenomics research could not only facilitate the development of this field but also advance the intersection among other domains. During the last few years, such framework has been developed and applied in addressing real problems. In the review, we will describe the major techniques needed to build a semantic framework, and will discuss the challenges of having such framework making a broader impact.
Collapse
Affiliation(s)
- Bin Chen
- School of Informatics and Computing, Indiana University, Bloomington, IN. .,Present address: School of Medicine, Stanford University, Stanford, CA.
| | - David J Wild
- School of Informatics and Computing, Indiana University, Bloomington, IN.
| |
Collapse
|
24
|
Khodiyar VK, Howe D, Talmud PJ, Breckenridge R, Lovering RC. From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development. F1000Res 2013; 2:242. [PMID: 24627794 DOI: 10.12688/f1000research.2-242.v1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/11/2013] [Indexed: 12/17/2022] Open
Abstract
For the majority of organs in developing vertebrate embryos, left-right asymmetry is controlled by a ciliated region; the left-right organizer node in the mouse and human, and the Kuppfer's vesicle in the zebrafish. In the zebrafish, laterality cues from the Kuppfer's vesicle determine asymmetry in the developing heart, the direction of 'heart jogging' and the direction of 'heart looping'. 'Heart jogging' is the term given to the process by which the symmetrical zebrafish heart tube is displaced relative to the dorsal midline, with a leftward 'jog'. Heart jogging is not considered to occur in mammals, although a leftward shift of the developing mouse caudal heart does occur prior to looping, which may be analogous to zebrafish heart jogging. Previous studies have characterized 30 genes involved in zebrafish heart jogging, the majority of which have well defined orthologs in mouse and human and many of these orthologs have been associated with early mammalian heart development. We undertook manual curation of a specific set of genes associated with heart development and we describe the use of Gene Ontology term enrichment analyses to examine the cellular processes associated with heart jogging. We found that the human, mouse and zebrafish 'heart jogging orthologs' are involved in similar organ developmental processes across the three species, such as heart, kidney and nervous system development, as well as more specific cellular processes such as cilium development and function. The results of these analyses are consistent with a role for cilia in the determination of left-right asymmetry of many internal organs, in addition to their known role in zebrafish heart jogging. This study highlights the importance of model organisms in the study of human heart development, and emphasises both the conservation and divergence of developmental processes across vertebrates, as well as the limitations of this approach.
Collapse
Affiliation(s)
- Varsha K Khodiyar
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| | - Doug Howe
- The Zebrafish Model Organism Database, University of Oregon, Eugene, OR, 97403-5291, USA
| | - Philippa J Talmud
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| | - Ross Breckenridge
- Centre for Metabolism and Experimental Therapeutics, University College London, London, WC1E 6JF, UK
| | - Ruth C Lovering
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| |
Collapse
|
25
|
Khodiyar VK, Howe D, Talmud PJ, Breckenridge R, Lovering RC. From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development. F1000Res 2013; 2:242. [PMID: 24627794 PMCID: PMC3931453 DOI: 10.12688/f1000research.2-242.v2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/10/2014] [Indexed: 01/15/2023] Open
Abstract
For the majority of organs in developing vertebrate embryos, left-right asymmetry is controlled by a ciliated region; the left-right organizer node in the mouse and human, and the Kuppfer’s vesicle in the zebrafish. In the zebrafish, laterality cues from the Kuppfer’s vesicle determine asymmetry in the developing heart, the direction of ‘heart jogging’ and the direction of ‘heart looping’. ‘Heart jogging’ is the term given to the process by which the symmetrical zebrafish heart tube is displaced relative to the dorsal midline, with a leftward ‘jog’. Heart jogging is not considered to occur in mammals, although a leftward shift of the developing mouse caudal heart does occur prior to looping, which may be analogous to zebrafish heart jogging. Previous studies have characterized 30 genes involved in zebrafish heart jogging, the majority of which have well defined orthologs in mouse and human and many of these orthologs have been associated with early mammalian heart development. We undertook manual curation of a specific set of genes associated with heart development and we describe the use of Gene Ontology term enrichment analyses to examine the cellular processes associated with heart jogging. We found that the human, mouse and zebrafish ‘heart jogging orthologs’ are involved in similar organ developmental processes across the three species, such as heart, kidney and nervous system development, as well as more specific cellular processes such as cilium development and function. The results of these analyses are consistent with a role for cilia in the determination of left-right asymmetry of many internal organs, in addition to their known role in zebrafish heart jogging. This study highlights the importance of model organisms in the study of human heart development, and emphasises both the conservation and divergence of developmental processes across vertebrates, as well as the limitations of this approach.
Collapse
Affiliation(s)
- Varsha K Khodiyar
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| | - Doug Howe
- The Zebrafish Model Organism Database, University of Oregon, Eugene, OR, 97403-5291, USA
| | - Philippa J Talmud
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| | - Ross Breckenridge
- Centre for Metabolism and Experimental Therapeutics, University College London, London, WC1E 6JF, UK
| | - Ruth C Lovering
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, WC1E 6JF, UK
| |
Collapse
|
26
|
Costa M, Reeve S, Grumbling G, Osumi-Sutherland D. The Drosophila anatomy ontology. J Biomed Semantics 2013; 4:32. [PMID: 24139062 PMCID: PMC4015547 DOI: 10.1186/2041-1480-4-32] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/11/2013] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Anatomy ontologies are query-able classifications of anatomical structures. They provide a widely-used means for standardising the annotation of phenotypes and expression in both human-readable and programmatically accessible forms. They are also frequently used to group annotations in biologically meaningful ways. Accurate annotation requires clear textual definitions for terms, ideally accompanied by images. Accurate grouping and fruitful programmatic usage requires high-quality formal definitions that can be used to automate classification and check for errors. The Drosophila anatomy ontology (DAO) consists of over 8000 classes with broad coverage of Drosophila anatomy. It has been used extensively for annotation by a range of resources, but until recently it was poorly formalised and had few textual definitions. RESULTS We have transformed the DAO into an ontology rich in formal and textual definitions in which the majority of classifications are automated and extensive error checking ensures quality. Here we present an overview of the content of the DAO, the patterns used in its formalisation, and the various uses it has been put to. CONCLUSIONS As a result of the work described here, the DAO provides a high-quality, queryable reference for the wild-type anatomy of Drosophila melanogaster and a set of terms to annotate data related to that anatomy. Extensive, well referenced textual definitions make it both a reliable and useful reference and ensure accurate use in annotation. Wide use of formal axioms allows a large proportion of classification to be automated and the use of consistency checking to eliminate errors. This increased formalisation has resulted in significant improvements to the completeness and accuracy of classification. The broad use of both formal and informal definitions make further development of the ontology sustainable and scalable. The patterns of formalisation used in the DAO are likely to be useful to developers of other anatomy ontologies.
Collapse
Affiliation(s)
- Marta Costa
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Simon Reeve
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Gary Grumbling
- FlyBase, Department of Biology, Indiana University, 1001 E 3rd Street, Bloomington, IN, 47405-7005, USA
| | | |
Collapse
|
27
|
Osumi-Sutherland D, Marygold SJ, Millburn GH, McQuilton PA, Ponting L, Stefancsik R, Falls K, Brown NH, Gkoutos GV. The Drosophila phenotype ontology. J Biomed Semantics 2013; 4:30. [PMID: 24138933 PMCID: PMC3816596 DOI: 10.1186/2041-1480-4-30] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/11/2013] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Phenotype ontologies are queryable classifications of phenotypes. They provide a widely-used means for annotating phenotypes in a form that is human-readable, programatically accessible and that can be used to group annotations in biologically meaningful ways. Accurate manual annotation requires clear textual definitions for terms. Accurate grouping and fruitful programatic usage require high-quality formal definitions that can be used to automate classification. The Drosophila phenotype ontology (DPO) has been used to annotate over 159,000 phenotypes in FlyBase to date, but until recently lacked textual or formal definitions. RESULTS We have composed textual definitions for all DPO terms and formal definitions for 77% of them. Formal definitions reference terms from a range of widely-used ontologies including the Phenotype and Trait Ontology (PATO), the Gene Ontology (GO) and the Cell Ontology (CL). We also describe a generally applicable system, devised for the DPO, for recording and reasoning about the timing of death in populations. As a result of the new formalisations, 85% of classifications in the DPO are now inferred rather than asserted, with much of this classification leveraging the structure of the GO. This work has significantly improved the accuracy and completeness of classification and made further development of the DPO more sustainable. CONCLUSIONS The DPO provides a set of well-defined terms for annotating Drosophila phenotypes and for grouping and querying the resulting annotation sets in biologically meaningful ways. Such queries have already resulted in successful function predictions from phenotype annotation. Moreover, such formalisations make extended queries possible, including cross-species queries via the external ontologies used in formal definitions. The DPO is openly available under an open source license in both OBO and OWL formats. There is good potential for it to be used more broadly by the Drosophila community, which may ultimately result in its extension to cover a broader range of phenotypes.
Collapse
Affiliation(s)
| | - Steven J Marygold
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Gillian H Millburn
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Peter A McQuilton
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Laura Ponting
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Raymund Stefancsik
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| | - Kathleen Falls
- The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA, USA
| | - Nicholas H Brown
- FlyBase, Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
- Gurdon Institute & Department of Physiology, Development and Neuroscience, University of Cambridge, Tennis Court Road, Cambridge, UK
| | - Georgios V Gkoutos
- Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
| |
Collapse
|
28
|
Roncaglia P, Martone ME, Hill DP, Berardini TZ, Foulger RE, Imam FT, Drabkin H, Mungall CJ, Lomax J. The Gene Ontology (GO) Cellular Component Ontology: integration with SAO (Subcellular Anatomy Ontology) and other recent developments. J Biomed Semantics 2013; 4:20. [PMID: 24093723 PMCID: PMC3852282 DOI: 10.1186/2041-1480-4-20] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 09/24/2013] [Indexed: 12/31/2022] Open
Abstract
Background The Gene Ontology (GO) (http://www.geneontology.org/) contains a set of terms for describing the activity and actions of gene products across all kingdoms of life. Each of these activities is executed in a location within a cell or in the vicinity of a cell. In order to capture this context, the GO includes a sub-ontology called the Cellular Component (CC) ontology (GO-CCO). The primary use of this ontology is for GO annotation, but it has also been used for phenotype annotation, and for the annotation of images. Another ontology with similar scope to the GO-CCO is the Subcellular Anatomy Ontology (SAO), part of the Neuroscience Information Framework Standard (NIFSTD) suite of ontologies. The SAO also covers cell components, but in the domain of neuroscience. Description Recently, the GO-CCO was enriched in content and links to the Biological Process and Molecular Function branches of GO as well as to other ontologies. This was achieved in several ways. We carried out an amalgamation of SAO terms with GO-CCO ones; as a result, nearly 100 new neuroscience-related terms were added to the GO. The GO-CCO also contains relationships to GO Biological Process and Molecular Function terms, as well as connecting to external ontologies such as the Cell Ontology (CL). Terms representing protein complexes in the Protein Ontology (PRO) reference GO-CCO terms for their species-generic counterparts. GO-CCO terms can also be used to search a variety of databases. Conclusions In this publication we provide an overview of the GO-CCO, its overall design, and some recent extensions that make use of additional spatial information. One of the most recent developments of the GO-CCO was the merging in of the SAO, resulting in a single unified ontology designed to serve the needs of GO annotators as well as the specific needs of the neuroscience community.
Collapse
Affiliation(s)
- Paola Roncaglia
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK.
| | | | | | | | | | | | | | | | | |
Collapse
|
29
|
Primmer CR, Papakostas S, Leder EH, Davis MJ, Ragan MA. Annotated genes and nonannotated genomes: cross-species use of Gene Ontology in ecology and evolution research. Mol Ecol 2013; 22:3216-41. [DOI: 10.1111/mec.12309] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2012] [Revised: 02/22/2013] [Accepted: 02/26/2013] [Indexed: 02/01/2023]
Affiliation(s)
- C. R. Primmer
- Department of Biology; University of Turku; 20014 Turku Finland
| | - S. Papakostas
- Department of Biology; University of Turku; 20014 Turku Finland
| | - E. H. Leder
- Department of Biology; University of Turku; 20014 Turku Finland
| | - M. J. Davis
- Institute for Molecular Bioscience; The University of Queensland; Brisbane Qld 4072 Australia
| | - M. A. Ragan
- Institute for Molecular Bioscience; The University of Queensland; Brisbane Qld 4072 Australia
| |
Collapse
|
30
|
Cooper L, Walls RL, Elser J, Gandolfo MA, Stevenson DW, Smith B, Preece J, Athreya B, Mungall CJ, Rensing S, Hiss M, Lang D, Reski R, Berardini TZ, Li D, Huala E, Schaeffer M, Menda N, Arnaud E, Shrestha R, Yamazaki Y, Jaiswal P. The plant ontology as a tool for comparative plant anatomy and genomic analyses. PLANT & CELL PHYSIOLOGY 2013; 54:e1. [PMID: 23220694 PMCID: PMC3583023 DOI: 10.1093/pcp/pcs163] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Collapse
Affiliation(s)
- Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to this work
- These authors contributed equally to the development of the Plant Ontology
| | - Ramona L. Walls
- New York Botanical Garden, 2900 Southern Blvd., Bronx, NY 10458-5126, USA
- These authors contributed equally to this work
- These authors contributed equally to the development of the Plant Ontology
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Maria A. Gandolfo
- L.H. Bailey Hortorium, Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, NY 14853, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Dennis W. Stevenson
- New York Botanical Garden, 2900 Southern Blvd., Bronx, NY 10458-5126, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Barry Smith
- Department of Philosophy, University at Buffalo, 126 Park Hall, Buffalo, NY 14260, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
| | - Balaji Athreya
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
| | - Christopher J. Mungall
- Berkeley Bioinformatics Open-Source Projects, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720, USA
| | - Stefan Rensing
- Faculty of Biology and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Manuel Hiss
- Faculty of Biology and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Daniel Lang
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Germany
- FRIAS - Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany
| | - Tanya Z. Berardini
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Donghui Li
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Mary Schaeffer
- Agriculture Research Services, United States Department of Agriculture, Columbia, MO 65211, USA
- Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA
| | - Naama Menda
- Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 148533, USA
| | - Elizabeth Arnaud
- Bioversity International, via dei Tre Denari, 174/a, Maccarese, Rome, Italy
| | - Rosemary Shrestha
- Genetic Resources Program, Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
| | - Yukiko Yamazaki
- Center for Genetic Resource Information, National Institute of Genetics, Mishima, Shizuoka, 411-8540 Japan
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to the development of the Plant Ontology
- *Corresponding author: E-mail,: ; Fax, +1-541-737-3573
| |
Collapse
|
31
|
Abstract
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new ‘phylogenetic annotation’ process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.
Collapse
|
32
|
Walls RL, Athreya B, Cooper L, Elser J, Gandolfo MA, Jaiswal P, Mungall CJ, Preece J, Rensing S, Smith B, Stevenson DW. Ontologies as integrative tools for plant science. AMERICAN JOURNAL OF BOTANY 2012; 99:1263-75. [PMID: 22847540 PMCID: PMC3492881 DOI: 10.3732/ajb.1200222] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
PREMISE OF THE STUDY Bio-ontologies are essential tools for accessing and analyzing the rapidly growing pool of plant genomic and phenomic data. Ontologies provide structured vocabularies to support consistent aggregation of data and a semantic framework for automated analyses and reasoning. They are a key component of the semantic web. METHODS This paper provides background on what bio-ontologies are, why they are relevant to botany, and the principles of ontology development. It includes an overview of ontologies and related resources that are relevant to plant science, with a detailed description of the Plant Ontology (PO). We discuss the challenges of building an ontology that covers all green plants (Viridiplantae). KEY RESULTS Ontologies can advance plant science in four keys areas: (1) comparative genetics, genomics, phenomics, and development; (2) taxonomy and systematics; (3) semantic applications; and (4) education. CONCLUSIONS Bio-ontologies offer a flexible framework for comparative plant biology, based on common botanical understanding. As genomic and phenomic data become available for more species, we anticipate that the annotation of data with ontology terms will become less centralized, while at the same time, the need for cross-species queries will become more common, causing more researchers in plant science to turn to ontologies.
Collapse
Affiliation(s)
- Ramona L. Walls
- New York Botanical Garden, 2900 Southern Blvd., Bronx, New York 10458-5126 USA
| | - Balaji Athreya
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, Oregon 97331-2902 USA
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, Oregon 97331-2902 USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, Oregon 97331-2902 USA
| | - Maria A. Gandolfo
- L.H. Bailey Hortorium, Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, New York 14853 USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, Oregon 97331-2902 USA
| | - Christopher J. Mungall
- Berkeley Bioinformatics Open-Source Projects, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, California 94720 USA
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, Oregon 97331-2902 USA
| | - Stefan Rensing
- Faculty of Biology, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Barry Smith
- Department of Philosophy, University at Buffalo, 126 Park Hall, Buffalo, New York 14260 USA
| | - Dennis W. Stevenson
- New York Botanical Garden, 2900 Southern Blvd., Bronx, New York 10458-5126 USA
| |
Collapse
|
33
|
Burge S, Kelly E, Lonsdale D, Mutowo-Muellenet P, McAnulla C, Mitchell A, Sangrador-Vegas A, Yong SY, Mulder N, Hunter S. Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2012; 2012:bar068. [PMID: 22301074 PMCID: PMC3270475 DOI: 10.1093/database/bar068] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models. Database URL:http://www.ebi.ac.uk/interpro. The complete InterPro2GO mappings are available at: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/interpro2go
Collapse
Affiliation(s)
- Sarah Burge
- EMBL-EBI, The Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK
| | | | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Mungall CJ, Torniai C, Gkoutos GV, Lewis SE, Haendel MA. Uberon, an integrative multi-species anatomy ontology. Genome Biol 2012; 13:R5. [PMID: 22293552 PMCID: PMC3334586 DOI: 10.1186/gb-2012-13-1-r5] [Citation(s) in RCA: 409] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2011] [Accepted: 01/31/2012] [Indexed: 01/20/2023] Open
Abstract
We present Uberon, an integrated cross-species ontology consisting of over 6,500 classes representing a variety of anatomical entities, organized according to traditional anatomical classification criteria. The ontology represents structures in a species-neutral way and includes extensive associations to existing species-centric anatomical ontologies, allowing integration of model organism and human data. Uberon provides a necessary bridge between anatomical structures in different taxa for cross-species inference. It uses novel methods for representing taxonomic variation, and has proved to be essential for translational phenotype analyses. Uberon is available at http://uberon.org
Collapse
Affiliation(s)
- Christopher J Mungall
- Genomics Division, Lawrence Berkeley National Laboratory, 1 Cycltotron Road MS 64-121, Berkeley, CA 94720, USA.
| | | | | | | | | |
Collapse
|
35
|
Dimmer EC, Huntley RP, Alam-Faruque Y, Sawford T, O'Donovan C, Martin MJ, Bely B, Browne P, Mun Chan W, Eberhardt R, Gardner M, Laiho K, Legge D, Magrane M, Pichler K, Poggioli D, Sehra H, Auchincloss A, Axelsen K, Blatter MC, Boutet E, Braconi-Quintaje S, Breuza L, Bridge A, Coudert E, Estreicher A, Famiglietti L, Ferro-Rojas S, Feuermann M, Gos A, Gruaz-Gumowski N, Hinz U, Hulo C, James J, Jimenez S, Jungo F, Keller G, Lemercier P, Lieberherr D, Masson P, Moinat M, Pedruzzi I, Poux S, Rivoire C, Roechert B, Schneider M, Stutz A, Sundaram S, Tognolli M, Bougueleret L, Argoud-Puy G, Cusin I, Duek-Roggli P, Xenarios I, Apweiler R. The UniProt-GO Annotation database in 2011. Nucleic Acids Res 2011; 40:D565-70. [PMID: 22123736 PMCID: PMC3245010 DOI: 10.1093/nar/gkr1048] [Citation(s) in RCA: 310] [Impact Index Per Article: 23.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The GO annotation dataset provided by the UniProt Consortium (GOA: http://www.ebi.ac.uk/GOA) is a comprehensive set of evidenced-based associations between terms from the Gene Ontology resource and UniProtKB proteins. Currently supplying over 100 million annotations to 11 million proteins in more than 360 000 taxa, this resource has increased 2-fold over the last 2 years and has benefited from a wealth of checks to improve annotation correctness and consistency as well as now supplying a greater information content enabled by GO Consortium annotation format developments. Detailed, manual GO annotations obtained from the curation of peer-reviewed papers are directly contributed by all UniProt curators and supplemented with manual and electronic annotations from 36 model organism and domain-focused scientific resources. The inclusion of high-quality, automatic annotation predictions ensures the UniProt GO annotation dataset supplies functional information to a wide range of proteins, including those from poorly characterized, non-model organism species. UniProt GO annotations are freely available in a range of formats accessible by both file downloads and web-based views. In addition, the introduction of a new, normalized file format in 2010 has made for easier handling of the complete UniProt-GOA data set.
Collapse
Affiliation(s)
- Emily C Dimmer
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
36
|
Abstract
The Gene Ontology (GO) (http://www.geneontology.org) is a community bioinformatics resource that represents gene product function through the use of structured, controlled vocabularies. The number of GO annotations of gene products has increased due to curation efforts among GO Consortium (GOC) groups, including focused literature-based annotation and ortholog-based functional inference. The GO ontologies continue to expand and improve as a result of targeted ontology development, including the introduction of computable logical definitions and development of new tools for the streamlined addition of terms to the ontology. The GOC continues to support its user community through the use of e-mail lists, social media and web-based resources.
Collapse
|
37
|
Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJA, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong SY. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 2011; 40:D306-12. [PMID: 22096229 PMCID: PMC3245097 DOI: 10.1093/nar/gkr948] [Citation(s) in RCA: 800] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
Collapse
Affiliation(s)
- Sarah Hunter
- EMBL Outstation European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
38
|
Gaudet P, Livstone MS, Lewis SE, Thomas PD. Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium. Brief Bioinform 2011; 12:449-62. [PMID: 21873635 PMCID: PMC3178059 DOI: 10.1093/bib/bbr042] [Citation(s) in RCA: 598] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The goal of the Gene Ontology (GO) project is to provide a uniform way to describe the functions of gene products from organisms across all kingdoms of life and thereby enable analysis of genomic data. Protein annotations are either based on experiments or predicted from protein sequences. Since most sequences have not been experimentally characterized, most available annotations need to be based on predictions. To make as accurate inferences as possible, the GO Consortium's Reference Genome Project is using an explicit evolutionary framework to infer annotations of proteins from a broad set of genomes from experimental annotations in a semi-automated manner. Most components in the pipeline, such as selection of sequences, building multiple sequence alignments and phylogenetic trees, retrieving experimental annotations and depositing inferred annotations, are fully automated. However, the most crucial step in our pipeline relies on software-assisted curation by an expert biologist. This curation tool, Phylogenetic Annotation and INference Tool (PAINT) helps curators to infer annotations among members of a protein family. PAINT allows curators to make precise assertions as to when functions were gained and lost during evolution and record the evidence (e.g. experimentally supported GO annotations and phylogenetic information including orthology) for those assertions. In this article, we describe how we use PAINT to infer protein function in a phylogenetic context with emphasis on its strengths, limitations and guidelines. We also discuss specific examples showing how PAINT annotations compare with those generated by other highly used homology-based methods.
Collapse
Affiliation(s)
- Pascale Gaudet
- Swiss Institute for Bioinformatics, CMU, 1 Rue Michel Servet, 1211 Geneva 4, Switzerland.
| | | | | | | |
Collapse
|
39
|
Khodiyar VK, Hill DP, Howe D, Berardini TZ, Tweedie S, Talmud PJ, Breckenridge R, Bhattarcharya S, Riley P, Scambler P, Lovering RC. The representation of heart development in the gene ontology. Dev Biol 2011; 354:9-17. [PMID: 21419760 DOI: 10.1016/j.ydbio.2011.03.011] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2010] [Revised: 02/14/2011] [Accepted: 03/09/2011] [Indexed: 11/25/2022]
Abstract
An understanding of heart development is critical in any systems biology approach to cardiovascular disease. The interpretation of data generated from high-throughput technologies (such as microarray and proteomics) is also essential to this approach. However, characterizing the role of genes in the processes underlying heart development and cardiovascular disease involves the non-trivial task of data analysis and integration of previous knowledge. The Gene Ontology (GO) Consortium provides structured controlled biological vocabularies that are used to summarize previous functional knowledge for gene products across all species. One aspect of GO describes biological processes, such as development and signaling. In order to support high-throughput cardiovascular research, we have initiated an effort to fully describe heart development in GO; expanding the number of GO terms describing heart development from 12 to over 280. This new ontology describes heart morphogenesis, the differentiation of specific cardiac cell types, and the involvement of signaling pathways in heart development. This work also aligns GO with the current views of the heart development research community and its representation in the literature. This extension of GO allows gene product annotators to comprehensively capture the genetic program leading to the developmental progression of the heart. This will enable users to integrate heart development data across species, resulting in the comprehensive retrieval of information about this subject. The revised GO structure, combined with gene product annotations, should improve the interpretation of data from high-throughput methods in a variety of cardiovascular research areas, including heart development, congenital cardiac disease, and cardiac stem cell research. Additionally, we invite the heart development community to contribute to the expansion of this important dataset for the benefit of future research in this area.
Collapse
Affiliation(s)
- Varsha K Khodiyar
- Cardiovascular GO Annotation Initiative, Centre for Cardiovascular Genetics, Rayne Institute, University College London, London, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|