1
|
Chang A, Jeske L, Ulbrich S, Hofmann J, Koblitz J, Schomburg I, Neumann-Schaal M, Jahn D, Schomburg D. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res 2021; 49:D498-D508. [PMID: 33211880 PMCID: PMC7779020 DOI: 10.1093/nar/gkaa1025] [Citation(s) in RCA: 279] [Impact Index Per Article: 93.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 10/14/2020] [Accepted: 10/26/2020] [Indexed: 12/31/2022] Open
Abstract
The BRENDA enzyme database (https://www.brenda-enzymes.org), established in 1987, has evolved into the main collection of functional enzyme and metabolism data. In 2018, BRENDA was selected as an ELIXIR Core Data Resource. BRENDA provides reliable data, continuous curation and updates of classified enzymes, and the integration of newly discovered enzymes. The main part contains >5 million data for ∼90 000 enzymes from ∼13 000 organisms, manually extracted from ∼157 000 primary literature references, combined with information of text and data mining, data integration, and prediction algorithms. Supplements comprise disease-related data, protein sequences, 3D structures, genome annotations, ligand information, taxonomic, bibliographic, and kinetic data. BRENDA offers an easy access to enzyme information from quick to advanced searches, text- and structured-based queries for enzyme-ligand interactions, word maps, and visualization of enzyme data. The BRENDA Pathway Maps are completely revised and updated for an enhanced interactive and intuitive usability. The new design of the Enzyme Summary Page provides an improved access to each individual enzyme. A new protein structure 3D viewer was integrated. The prediction of the intracellular localization of eukaryotic enzymes has been implemented. The new EnzymeDetector combines BRENDA enzyme annotations with protein and genome databases for the detection of eukaryotic and prokaryotic enzymes.
Collapse
Affiliation(s)
- Antje Chang
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Lisa Jeske
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Sandra Ulbrich
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Julia Hofmann
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Julia Koblitz
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7 B, 38124 Braunschweig, Germany
| | - Ida Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Meina Neumann-Schaal
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstrasse 7 B, 38124 Braunschweig, Germany
| | - Dieter Jahn
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Dietmar Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| |
Collapse
|
2
|
Cui H, Zhang L, Ford B, Cheng HL, Macklin JA, Reznicek A, Starr J. Measurement Recorder: developing a useful tool for making species descriptions that produces computable phenotypes. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5995854. [PMID: 33216896 PMCID: PMC7678789 DOI: 10.1093/database/baaa079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 08/24/2020] [Accepted: 08/27/2020] [Indexed: 12/31/2022]
Abstract
To use published phenotype information in computational analyses, there have been efforts to convert descriptions of phenotype characters from human languages to ontologized statements. This postpublication curation process is not only slow and costly, it is also burdened with significant intercurator variation (including curator-author variation), due to different interpretations of a character by various individuals. This problem is inherent in any human-based intellectual activity. To address this problem, making scientific publications semantically clear (i.e. computable) by the authors at the time of publication is a critical step if we are to avoid postpublication curation. To help authors efficiently produce species phenotypes while producing computable data, we are experimenting with an author-driven ontology development approach and developing and evaluating a series of ontology-aware software modules that would create publishable species descriptions that are readily useable in scientific computations. The first software module prototype called Measurement Recorder has been developed to assist authors in defining continuous measurements and reported in this paper. Two usability studies of the software were conducted with 22 undergraduate students majoring in information science and 32 in biology. Results suggest that participants can use Measurement Recorder without training and they find it easy to use after limited practice. Participants also appreciate the semantic enhancement features. Measurement Recorder's character reuse features facilitate character convergence among participants by 48% and have the potential to further reduce user errors in defining characters. A set of software design issues have also been identified and then corrected. Measurement Recorder enables authors to record measurements in a semantically clear manner and enriches phenotype ontology along the way. Future work includes representing the semantic data as Resource Description Framework (RDF) knowledge graphs and characterizing the division of work between authors as domain knowledge providers and ontology engineers as knowledge formalizers in this new author-driven ontology development approach.
Collapse
Affiliation(s)
- Hong Cui
- School of Information, University of Arizona, Tucson, AZ 85705, USA
| | - Limin Zhang
- School of Information, University of Arizona, Tucson, AZ 85705, USA
| | - Bruce Ford
- Department of Biological sciences, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | - Hsin-Liang Cheng
- Curtis Laws Wilson Library, Missouri University of Science and Technology, Rolla, MO 65409, USA
| | - James A Macklin
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON K1A 0C6, Canada
| | - Anton Reznicek
- LSA Herbarium, University of Michigan, Ann Arbor, MI 48019, USA
| | - Julian Starr
- Department of Biology, University of Ottawa, Ottawa, ON K1N 6N5, Canada
| |
Collapse
|
3
|
Jeske L, Placzek S, Schomburg I, Chang A, Schomburg D. BRENDA in 2019: a European ELIXIR core data resource. Nucleic Acids Res 2019; 47:D542-D549. [PMID: 30395242 PMCID: PMC6323942 DOI: 10.1093/nar/gky1048] [Citation(s) in RCA: 225] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 10/05/2018] [Accepted: 10/30/2018] [Indexed: 12/22/2022] Open
Abstract
The BRENDA enzyme database (www.brenda-enzymes.org), recently appointed ELIXIR Core Data Resource, is the main enzyme and enzyme-ligand information system. The core database provides a comprehensive overview on enzymes. A collection of 4.3 million data for ∼84 000 enzymes manually evaluated and extracted from ∼140 000 primary literature references is combined with information obtained by text and data mining, data integration and prediction algorithms. Supplements comprise disease-related data, protein sequences, 3D structures, predicted enzyme locations and genome annotations. Major developments are a revised ligand summary page and the structure search now including a similarity and isomer search. BKMS-react, an integrated database containing known enzyme-catalyzed reactions, is supplemented with further reactions and improved access to pathway connections. In addition to existing enzyme word maps with graphical information of enzyme specific terms, plant word maps have been developed. They show a graphical overview of terms, e.g. enzyme or plant pathogen information, connected to specific plants. An organism summary page showing all relevant information, e.g. taxonomy and synonyms linked to enzyme data, was implemented. Based on a decision by the IUBMB enzyme task force the enzyme class EC 7 has been established for 'translocases', enzymes that catalyze a transport of ions or metabolites across cellular membranes.
Collapse
Affiliation(s)
- Lisa Jeske
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| | - Sandra Placzek
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| | - Ida Schomburg
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| | - Antje Chang
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| | - Dietmar Schomburg
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Rebenring 56, 38106 Braunschweig, Germany
| |
Collapse
|
4
|
Dahdul W, Manda P, Cui H, Balhoff JP, Dececchi TA, Ibrahim N, Lapp H, Vision T, Mabee PM. Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems. Database (Oxford) 2018; 2018:5255130. [PMID: 30576485 PMCID: PMC6301375 DOI: 10.1093/database/bay110] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Revised: 08/22/2018] [Accepted: 09/24/2018] [Indexed: 11/12/2022]
Abstract
Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in the biological literature. Expressing these phenotypes as logical statements using ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make these phenotype descriptions amenable to machine reasoning. Natural language processing tools have been developed to facilitate this task, and the training and evaluation of these tools depend on the availability of high quality, manually annotated gold standard data sets. We describe the development of an expert-curated gold standard data set of annotated phenotypes for evolutionary biology. The gold standard was developed for the curation of complex comparative phenotypes for the Phenoscape project. It was created by consensus among three curators and consists of entity-quality expressions of varying complexity. We use the gold standard to evaluate annotations created by human curators and those generated by the Semantic CharaParser tool. Using four annotation accuracy metrics that can account for any level of relationship between terms from two phenotype annotations, we found that machine-human consistency, or similarity, was significantly lower than inter-curator (human-human) consistency. Surprisingly, allowing curatorsaccess to external information did not significantly increase the similarity of their annotations to the gold standard or have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the gold standard increased after new relevant ontology terms had been added. Evaluation by the original authors of the character descriptions indicated that the gold standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design software to augment human curators and the use of the gold standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.
Collapse
Affiliation(s)
| | - Prashanti Manda
- University of North Carolina at Greensboro, Greensboro, NC, USA
| | - Hong Cui
- University of Arizona, Tucson, AZ, USA
| | - James P Balhoff
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - T Alexander Dececchi
- University of South Dakota, Vermillion, SD, USA
- Current affiliation: University of Pittsburgh at Johnstown, Johnstown, PA, USA
| | - Nizar Ibrahim
- University of Chicago, Chicago, IL, USA
- Current affiliation: University of Detroit Mercy, Detroit, MI, USA & University of Portsmouth, Portsmouth, UK
| | | | - Todd Vision
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | |
Collapse
|
5
|
Schomburg I, Jeske L, Ulbrich M, Placzek S, Chang A, Schomburg D. The BRENDA enzyme information system–From a database to an expert system. J Biotechnol 2017; 261:194-206. [DOI: 10.1016/j.jbiotec.2017.04.020] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 04/11/2017] [Accepted: 04/18/2017] [Indexed: 02/06/2023]
|
6
|
Liang SH, Walther BA, Shieh BS. Contrasting determinants for the introduction and establishment success of exotic birds in Taiwan using decision trees models. PeerJ 2017; 5:e3092. [PMID: 28316893 PMCID: PMC5354111 DOI: 10.7717/peerj.3092] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Accepted: 02/14/2017] [Indexed: 11/20/2022] Open
Abstract
Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Discussion Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies.
Collapse
Affiliation(s)
- Shih-Hsiung Liang
- Department of Biotechnology, National Kaohsiung Normal University , Kaohsiung , Taiwan
| | - Bruno Andreas Walther
- Master Program in Global Health and Development, College of Public Health, Taipei Medical University , Taipei , Taiwan
| | - Bao-Sen Shieh
- Department of Biomedical Science and Environmental Biology, Kaohsiung Medical University, Kaohsiung, Taiwan; Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
| |
Collapse
|
7
|
Placzek S, Schomburg I, Chang A, Jeske L, Ulbrich M, Tillack J, Schomburg D. BRENDA in 2017: new perspectives and new tools in BRENDA. Nucleic Acids Res 2016; 45:D380-D388. [PMID: 27924025 PMCID: PMC5210646 DOI: 10.1093/nar/gkw952] [Citation(s) in RCA: 175] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 10/17/2016] [Indexed: 01/11/2023] Open
Abstract
The BRENDA enzyme database (www.brenda-enzymes.org) has developed into the main enzyme and enzyme-ligand information system in its 30 years of existence. The information is manually extracted from primary literature and extended by text mining procedures, integration of external data and prediction algorithms. Approximately 3 million data from 83 000 enzymes and 137 000 literature references constitute the manually annotated core. Text mining procedures extend these data with information on occurrence, enzyme-disease relationships and kinetic data. Prediction algorithms contribute locations and genome annotations. External data and links complete the data with sequences and 3D structures. A total of 206 000 enzyme ligands provide functional and structural data. BRENDA offers a complex query tool engine allowing the users an efficient access to the data via different search methods and explorers. The new design of the BRENDA entry page and the enzyme summary pages improves the user access and the performance. New interactive and intuitive BRENDA pathway maps give an overview on biochemical processes and facilitate the visualization of enzyme, ligand and organism information in the biochemical context. SCOPe and CATH, databases for protein structure classification, are included. New online and video tutorials provide online training for the users. BRENDA is freely available for academic users.
Collapse
Affiliation(s)
- Sandra Placzek
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Ida Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Antje Chang
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Lisa Jeske
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Marcus Ulbrich
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Jana Tillack
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| | - Dietmar Schomburg
- Technische Universität Braunschweig, Braunschweig Integrated Centre of Systems Biology (BRICS), Rebenring 56, 38106 Braunschweig, Germany
| |
Collapse
|
8
|
Chen L, Peng S, Yang B. Predicting alien herb invasion with machine learning models: biogeographical and life-history traits both matter. Biol Invasions 2015. [DOI: 10.1007/s10530-015-0870-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
Chang A, Schomburg I, Placzek S, Jeske L, Ulbrich M, Xiao M, Sensen CW, Schomburg D. BRENDA in 2015: exciting developments in its 25th year of existence. Nucleic Acids Res 2014; 43:D439-46. [PMID: 25378310 PMCID: PMC4383907 DOI: 10.1093/nar/gku1068] [Citation(s) in RCA: 150] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
The BRENDA enzyme information system (http://www.brenda-enzymes.org/) has developed into an elaborate system of enzyme and enzyme-ligand information obtained from different sources, combined with flexible query systems and evaluation tools. The information is obtained by manual extraction from primary literature, text and data mining, data integration, and prediction algorithms. Approximately 300 million data include enzyme function and molecular data from more than 30 000 organisms. The manually derived core contains 3 million data from 77 000 enzymes annotated from 135 000 literature references. Each entry is connected to the literature reference and the source organism. They are complemented by information on occurrence, enzyme/disease relationships from text mining, sequences and 3D structures from other databases, and predicted enzyme location and genome annotation. Functional and structural data of more than 190 000 enzyme ligands are stored in BRENDA. New features improving the functionality and analysis tools were implemented. The human anatomy atlas CAVEman is linked to the BRENDA Tissue Ontology terms providing a connection between anatomical and functional enzyme data. Word Maps for enzymes obtained from PubMed abstracts highlight application and scientific relevance of enzymes. The EnzymeDetector genome annotation tool and the reaction database BKM-react including reactions from BRENDA, KEGG and MetaCyc were improved. The website was redesigned providing new query options.
Collapse
Affiliation(s)
- Antje Chang
- Department of Bioinformatics and Biochemistry, Technische Universität Braunschweig, Langer Kamp 19 B, D-38106 Braunschweig, Germany
| | - Ida Schomburg
- Department of Bioinformatics and Biochemistry, Technische Universität Braunschweig, Langer Kamp 19 B, D-38106 Braunschweig, Germany
| | - Sandra Placzek
- Department of Bioinformatics and Biochemistry, Technische Universität Braunschweig, Langer Kamp 19 B, D-38106 Braunschweig, Germany
| | - Lisa Jeske
- Department of Bioinformatics and Biochemistry, Technische Universität Braunschweig, Langer Kamp 19 B, D-38106 Braunschweig, Germany
| | - Marcus Ulbrich
- Department of Bioinformatics and Biochemistry, Technische Universität Braunschweig, Langer Kamp 19 B, D-38106 Braunschweig, Germany
| | - Mei Xiao
- Department of Biochemistry & Molecular Biology, Faculty of Medicine, University of Calgary, 3330 Hospital Drive N.W., Calgary, Alberta T2N 4N1, Canada
| | - Christoph W Sensen
- The Jackson Laboratory, 263 Farmington Avenue, Farmington, CT 06030, USA
| | - Dietmar Schomburg
- Department of Bioinformatics and Biochemistry, Technische Universität Braunschweig, Langer Kamp 19 B, D-38106 Braunschweig, Germany
| |
Collapse
|
10
|
Standardization in enzymology—Data integration in the world׳s enzyme information system BRENDA. ACTA ACUST UNITED AC 2014. [DOI: 10.1016/j.pisc.2014.02.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
11
|
OralCard: A bioinformatic tool for the study of oral proteome. Arch Oral Biol 2013; 58:762-72. [DOI: 10.1016/j.archoralbio.2012.12.012] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2012] [Revised: 11/26/2012] [Accepted: 12/30/2012] [Indexed: 10/27/2022]
|
12
|
De Filippo C, Ramazzotti M, Fontana P, Cavalieri D. Bioinformatic approaches for functional annotation and pathway inference in metagenomics data. Brief Bioinform 2013; 13:696-710. [PMID: 23175748 PMCID: PMC3505041 DOI: 10.1093/bib/bbs070] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Metagenomic approaches are increasingly recognized as a baseline for understanding the
ecology and evolution of microbial ecosystems. The development of methods for pathway
inference from metagenomics data is of paramount importance to link a phenotype to a
cascade of events stemming from a series of connected sets of genes or proteins.
Biochemical and regulatory pathways have until recently been thought and modelled within
one cell type, one organism, one species. This vision is being dramatically changed by the
advent of whole microbiome sequencing studies, revealing the role of symbiotic microbial
populations in fundamental biochemical functions. The new landscape we face requires a
clear picture of the potentialities of existing tools and development of new tools to
characterize, reconstruct and model biochemical and regulatory pathways as the result of
integration of function in complex symbiotic interactions of ontologically and
evolutionary distinct cell types.
Collapse
|
13
|
Cooper L, Walls RL, Elser J, Gandolfo MA, Stevenson DW, Smith B, Preece J, Athreya B, Mungall CJ, Rensing S, Hiss M, Lang D, Reski R, Berardini TZ, Li D, Huala E, Schaeffer M, Menda N, Arnaud E, Shrestha R, Yamazaki Y, Jaiswal P. The plant ontology as a tool for comparative plant anatomy and genomic analyses. PLANT & CELL PHYSIOLOGY 2013; 54:e1. [PMID: 23220694 PMCID: PMC3583023 DOI: 10.1093/pcp/pcs163] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Collapse
Affiliation(s)
- Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to this work
- These authors contributed equally to the development of the Plant Ontology
| | - Ramona L. Walls
- New York Botanical Garden, 2900 Southern Blvd., Bronx, NY 10458-5126, USA
- These authors contributed equally to this work
- These authors contributed equally to the development of the Plant Ontology
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Maria A. Gandolfo
- L.H. Bailey Hortorium, Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, NY 14853, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Dennis W. Stevenson
- New York Botanical Garden, 2900 Southern Blvd., Bronx, NY 10458-5126, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Barry Smith
- Department of Philosophy, University at Buffalo, 126 Park Hall, Buffalo, NY 14260, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
| | - Balaji Athreya
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
| | - Christopher J. Mungall
- Berkeley Bioinformatics Open-Source Projects, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720, USA
| | - Stefan Rensing
- Faculty of Biology and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Manuel Hiss
- Faculty of Biology and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Daniel Lang
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Germany
- FRIAS - Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany
| | - Tanya Z. Berardini
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Donghui Li
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Mary Schaeffer
- Agriculture Research Services, United States Department of Agriculture, Columbia, MO 65211, USA
- Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA
| | - Naama Menda
- Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 148533, USA
| | - Elizabeth Arnaud
- Bioversity International, via dei Tre Denari, 174/a, Maccarese, Rome, Italy
| | - Rosemary Shrestha
- Genetic Resources Program, Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
| | - Yukiko Yamazaki
- Center for Genetic Resource Information, National Institute of Genetics, Mishima, Shizuoka, 411-8540 Japan
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to the development of the Plant Ontology
- *Corresponding author: E-mail,: ; Fax, +1-541-737-3573
| |
Collapse
|
14
|
Schomburg I, Chang A, Placzek S, Söhngen C, Rother M, Lang M, Munaretto C, Ulas S, Stelzer M, Grote A, Scheer M, Schomburg D. BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA. Nucleic Acids Res 2013; 41:D764-72. [PMID: 23203881 PMCID: PMC3531171 DOI: 10.1093/nar/gks1049] [Citation(s) in RCA: 271] [Impact Index Per Article: 24.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Revised: 10/08/2012] [Accepted: 10/10/2012] [Indexed: 11/13/2022] Open
Abstract
The BRENDA (BRaunschweig ENzyme DAtabase) enzyme portal (http://www.brenda-enzymes.org) is the main information system of functional biochemical and molecular enzyme data and provides access to seven interconnected databases. BRENDA contains 2.7 million manually annotated data on enzyme occurrence, function, kinetics and molecular properties. Each entry is connected to a reference and the source organism. Enzyme ligands are stored with their structures and can be accessed via their names, synonyms or via a structure search. FRENDA (Full Reference ENzyme DAta) and AMENDA (Automatic Mining of ENzyme DAta) are based on text mining methods and represent a complete survey of PubMed abstracts with information on enzymes in different organisms, tissues or organelles. The supplemental database DRENDA provides more than 910 000 new EC number-disease relations in more than 510 000 references from automatic search and a classification of enzyme-disease-related information. KENDA (Kinetic ENzyme DAta), a new amendment extracts and displays kinetic values from PubMed abstracts. The integration of the EnzymeDetector offers an automatic comparison, evaluation and prediction of enzyme function annotations for prokaryotic genomes. The biochemical reaction database BKM-react contains non-redundant enzyme-catalysed and spontaneous reactions and was developed to facilitate and accelerate the construction of biochemical models.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | - Dietmar Schomburg
- Technische Universität Braunschweig, Dpt. for Bioinformatics and Biochemistry, Langer Kamp 19 B, 38106 Braunschweig, Germany
| |
Collapse
|
15
|
Schad E, Tompa P, Hegyi H. The relationship between proteome size, structural disorder and organism complexity. Genome Biol 2011; 12:R120. [PMID: 22182830 PMCID: PMC3334615 DOI: 10.1186/gb-2011-12-12-r120] [Citation(s) in RCA: 136] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Revised: 10/25/2011] [Accepted: 12/19/2011] [Indexed: 11/22/2022] Open
Abstract
Background Sequencing the genomes of the first few eukaryotes created the impression that gene number shows no correlation with organism complexity, often referred to as the G-value paradox. Several attempts have previously been made to resolve this paradox, citing multifunctionality of proteins, alternative splicing, microRNAs or non-coding DNA. As intrinsic protein disorder has been linked with complex responses to environmental stimuli and communication between cells, an additional possibility is that structural disorder may effectively increase the complexity of species. Results We revisited the G-value paradox by analyzing many new proteomes whose complexity measured with their number of distinct cell types is known. We found that complexity and proteome size measured by the total number of amino acids correlate significantly and have a power function relationship. We systematically analyzed numerous other features in relation to complexity in several organisms and tissues and found: the fraction of protein structural disorder increases significantly between prokaryotes and eukaryotes but does not further increase over the course of evolution; the number of predicted binding sites in disordered regions in a proteome increases with complexity; the fraction of protein disorder, predicted binding sites, alternative splicing and protein-protein interactions all increase with the complexity of human tissues. Conclusions We conclude that complexity is a multi-parametric trait, determined by interaction potential, alternative splicing capacity, tissue-specific protein disorder and, above all, proteome size. The G-value paradox is only apparent when plants are grouped with metazoans, as they have a different relationship between complexity and proteome size.
Collapse
Affiliation(s)
- Eva Schad
- Institute of Enzymology, Research Center For Natural Sciences, Hungarian Academy of Sciences, Karolina út 29, Budapest, Hungary
| | | | | |
Collapse
|