1
|
Loarca J, Liou M, Dawson JC, Simon PW. Evaluation of shoot-growth variation in diverse carrot ( Daucus carota L.) germplasm for genetic improvement of stand establishment. FRONTIERS IN PLANT SCIENCE 2024; 15:1342512. [PMID: 38708395 PMCID: PMC11066248 DOI: 10.3389/fpls.2024.1342512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/26/2024] [Indexed: 05/07/2024]
Abstract
Carrot (Daucus carota L.) is a high value, nutritious, and colorful crop, but delivering carrots from seed to table can be a struggle for carrot growers. Weed competitive ability is a critical trait for crop success that carrot and its apiaceous relatives often lack owing to their characteristic slow shoot growth and erratic seedling emergence, even among genetically uniform lines. This study is the first field-based, multi-year experiment to evaluate shoot-growth trait variation over a 100-day growing season in a carrot diversity panel (N=695) that includes genetically diverse carrot accessions from the United States Department of Agriculture National Plant Germplasm System. We report phenotypic variability for shoot-growth characteristics, the first broad-sense heritability estimates for seedling emergence (0.68 < H2 < 0.80) and early-season canopy coverage ( 0.61 < H2 < 0.65), and consistent broad-sense heritability for late-season canopy height (0.76 < H2 < 0.82), indicating quantitative inheritance and potential for improvement through plant breeding. Strong correlation between emergence and canopy coverage (0.62 < r < 0.72) suggests that improvement of seedling emergence has great potential to increase yield and weed competitive ability. Accessions with high emergence and vigorous canopy growth are of immediate use to breeders targeting stand establishment, weed-tolerance, or weed-suppressant carrots, which is of particular advantage to the organic carrot production sector, reducing the costs and labor associated with herbicide application and weeding. We developed a standardized vocabulary and protocol to describe shoot-growth and facilitate collaboration and communication across carrot research groups. Our study facilitates identification and utilization of carrot genetic resources, conservation of agrobiodiversity, and development of breeding stocks for weed-competitive ability, with the long-term goal of delivering improved carrot cultivars to breeders, growers, and consumers. Accession selection can be further optimized for efficient breeding by combining shoot growth data with phenological data in this study's companion paper to identify ideotypes based on global market needs.
Collapse
Affiliation(s)
- Jenyne Loarca
- Vegetable Crops Research Unit, United States Department of Agriculture, Madison, WI, United States
- Department of Plant and Agroecosystem Sciences, University of Wisconsin–Madison, Madison, WI, United States
| | - Michael Liou
- Department of Statistics, University of Wisconsin–Madison, Madison, WI, United States
| | - Julie C. Dawson
- Department of Plant and Agroecosystem Sciences, University of Wisconsin–Madison, Madison, WI, United States
| | - Philipp W. Simon
- Vegetable Crops Research Unit, United States Department of Agriculture, Madison, WI, United States
- Department of Plant and Agroecosystem Sciences, University of Wisconsin–Madison, Madison, WI, United States
| |
Collapse
|
2
|
Loarca J, Liou M, Dawson JC, Simon PW. Advancing utilization of diverse global carrot ( Daucus carota L.) germplasm with flowering habit trait ontology. FRONTIERS IN PLANT SCIENCE 2024; 15:1342513. [PMID: 38779064 PMCID: PMC11110672 DOI: 10.3389/fpls.2024.1342513] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 02/26/2024] [Indexed: 05/25/2024]
Abstract
Biennial vegetable crops are challenging to breed due to long breeding cycle times. At the same time, it is important to preserve a strong biennial growth habit, avoiding premature flowering that renders the crop unmarketable. Gene banks carry important genetic variation which may be essential to improve crop resilience, but these collections are underutilized due to lack of characterization for key traits like bolting tendency for biennial vegetable crops. Due to concerns about introducing undesirable traits such as premature flowering into elite germplasm, many accessions may not be considered for other key traits that benefit growers, leaving crops more vulnerable to pests, diseases, and abiotic stresses. In this study, we develop a method for characterizing flowering to identify accessions that are predominantly biennial, which could be incorporated into biennial breeding programs without substantially increasing the risk of annual growth habits. This should increase the use of these accessions if they are also sources of other important traits such as disease resistance. We developed the CarrotOmics flowering habit trait ontology and evaluated flowering habit in the largest (N=695), and most diverse collection of cultivated carrots studied to date. Over 80% of accessions were collected from the Eurasian supercontinent, which includes the primary and secondary centers of carrot diversity. We successfully identified untapped genetic diversity in biennial carrot germplasm (n=197 with 0% plants flowering) and predominantly-biennial germplasm (n=357 with <15% plants flowering). High broad-sense heritability for flowering habit (0.81 < H2< 0.93) indicates a strong genetic component of this trait, suggesting that these carrot accessions should be consistently biennial. Breeders can select biennial plants and eliminate annual plants from a predominantly biennial population. The establishment of the predominantly biennial subcategory nearly doubles the availability of germplasm with commercial potential and accounts for 54% of the germplasm collection we evaluated. This subcollection is a useful source of genetic diversity for breeders. This method could also be applied to other biennial vegetable genetic resources and to introduce higher levels of genetic diversity into commercial cultivars, to reduce crop genetic vulnerability. We encourage breeders and researchers of biennial crops to optimize this strategy for their particular crop.
Collapse
Affiliation(s)
- Jenyne Loarca
- Vegetable Crops Research Unit, United States Department of Agriculture, Madison, WI, United States
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Michael Liou
- Department of Statistics, University of Wisconsin-Madison, Madison, WI, United States
| | - Julie C. Dawson
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| | - Philipp W. Simon
- Vegetable Crops Research Unit, United States Department of Agriculture, Madison, WI, United States
- Department of Plant and Agroecosystem Sciences, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|
3
|
Jordão LSB, Morim MP, Baumgratz JFA, Simon MF, Eppinghaus ALC, Calfo VA. TypeTaxonScript: sugarifying and enhancing data structures in biological systematics and biodiversity research. Biol Methods Protoc 2024; 9:bpae017. [PMID: 38566774 PMCID: PMC10984730 DOI: 10.1093/biomethods/bpae017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 02/19/2024] [Accepted: 03/12/2024] [Indexed: 04/04/2024] Open
Abstract
Object-oriented programming (OOP) embodies a software development paradigm grounded in representing real-world entities as objects, facilitating a more efficient and structured modelling approach. In this article, we explore the synergy between OOP principles and the TypeScript (TS) programming language to create a JSON-formatted database designed for storing arrays of biological features. This fusion of technologies fosters a controlled and modular code script, streamlining the integration, manipulation, expansion, and analysis of biological data, all while enhancing syntax for improved human readability, such as through the use of dot notation. We advocate for biologists to embrace Git technology, akin to the practices of programmers and coders, for initiating versioned and collaborative projects. Leveraging the widely accessible and acclaimed IDE, Visual Studio Code, provides an additional advantage. Not only does it support running a Node.js environment, which is essential for running TS, but it also efficiently manages GitHub versioning. We provide a use case involving taxonomic data structure, focusing on angiosperm legume plants. This method is characterized by its simplicity, as the tools employed are both fully accessible and free of charge, and it is widely adopted by communities of professional programmers. Moreover, we are dedicated to facilitating practical implementation and comprehension through a comprehensive tutorial, a readily available pre-built database at GitHub, and a new package at npm.
Collapse
Affiliation(s)
- Lucas Sá Barreto Jordão
- Centro Nacional de Conservação da Flora—CNCFlora, Instituto de Pesquisas Jardim Botânico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| | - Marli Pires Morim
- Diretoria de Pesquisa Científica—DIPEQ, Instituto de Pesquisas Jardim Botânico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| | - José Fernando A Baumgratz
- Diretoria de Pesquisa Científica—DIPEQ, Instituto de Pesquisas Jardim Botânico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| | - Marcelo Fragomeni Simon
- Embrapa Recursos Genéticos e Biotecnologia, Parque Estação Biológica–PqEB, Brasília, 70770-901, Brazil
| | - André L C Eppinghaus
- Centro Nacional de Conservação da Flora—CNCFlora, Instituto de Pesquisas Jardim Botânico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| | - Vicente A Calfo
- Centro Nacional de Conservação da Flora—CNCFlora, Instituto de Pesquisas Jardim Botânico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| |
Collapse
|
4
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
5
|
Li Z, Hu Y, Ma X, Da L, She J, Liu Y, Yi X, Cao Y, Xu W, Jiao Y, Su Z. WheatCENet: A Database for Comparative Co-expression Networks Analysis of Allohexaploid Wheat and Its Progenitors. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:324-336. [PMID: 35660007 PMCID: PMC10626052 DOI: 10.1016/j.gpb.2022.04.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 03/16/2022] [Accepted: 05/08/2022] [Indexed: 06/15/2023]
Abstract
Genetic and epigenetic changes after polyploidization events could result in variable gene expression and modified regulatory networks. Here, using large-scale transcriptome data, we constructed co-expression networks for diploid, tetraploid, and hexaploid wheat species, and built a platform for comparing co-expression networks of allohexaploid wheat and its progenitors, named WheatCENet. WheatCENet is a platform for searching and comparing specific functional co-expression networks, as well as identifying the related functions of the genes clustered therein. Functional annotations like pathways, gene families, protein-protein interactions, microRNAs (miRNAs), and several lines of epigenome data are integrated into this platform, and Gene Ontology (GO) annotation, gene set enrichment analysis (GSEA), motif identification, and other useful tools are also included. Using WheatCENet, we found that the network of WHEAT ABERRANT PANICLE ORGANIZATION 1 (WAPO1) has more co-expressed genes related to spike development in hexaploid wheat than its progenitors. We also found a novel motif of CCWWWWWWGG (CArG) specifically in the promoter region of WAPO-A1, suggesting that neofunctionalization of the WAPO-A1 gene affects spikelet development in hexaploid wheat. WheatCENet is useful for investigating co-expression networks and conducting other analyses, and thus facilitates comparative and functional genomic studies in wheat. WheatCENet is freely available at http://bioinformatics.cpolar.cn/WheatCENet and http://bioinformatics.cau.edu.cn/WheatCENet.
Collapse
Affiliation(s)
- Zhongqiu Li
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yiheng Hu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xuelian Ma
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Lingling Da
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jiajie She
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yue Liu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xin Yi
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China
| | - Yaxin Cao
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yuannian Jiao
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
6
|
Iordache V, Neagoe A. Conceptual methodological framework for the resilience of biogeochemical services to heavy metals stress. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2023; 325:116401. [PMID: 36279774 DOI: 10.1016/j.jenvman.2022.116401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/21/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
The idea of linking stressors, services providing units (SPUs), and ecosystem services (ES) is ubiquitous in the literature, although is currently not applied in areas contaminated with heavy metals (HMs), This integrative literature review introduces the general form of a deterministic conceptual model of the cross-scale effect of HMs on biogeochemical services by SPUs with a feedback loop, a cross-scale heuristic concept of resilience, and develops a method for applying the conceptual model. The objectives are 1) to identify the clusters of existing research about HMs effects on ES, biodiversity, and resilience to HMs stress, 2) to map the scientific fields needed for the conceptual model's implementation, identify institutional constraints for inter-disciplinary cooperation, and propose solutions to surpass them, 3) to describe how the complexity of the cause-effect chain is reflected in the research hypotheses and objectives and extract methodological consequences, and 4) to describe how the conceptual model can be implemented. A nested analysis by CiteSpace of a set of 16,176 articles extracted from the Web of Science shows that at the highest level of data aggregation there is a clear separation between the topics of functional traits, stoichiometry, and regulating services from the typical issues of the literature about HMs, biodiversity, and ES. Most of the resilience to HMs stress agenda focuses on microbial communities. General topics such as the biodiversity-ecosystem function relationship in contaminated areas are no longer dominant in the current research, as well as large-scale problems like watershed management. The number of Web of Science domains that include the analyzed articles is large (26 up to 87 domains with at least ten articles, depending on the sub-set), but thirteen domains account for 70-80% of the literature. The complexity of approaches regarding the cause-effect chain, the stressors, the biological and ecological hierarchical level and the management objectives was characterized by a detailed analysis of 60 selected reviews and 121 primary articles. Most primary articles approach short causal chains, and the number of hypotheses or objectives by article tends to be low, pointing out the need for portfolios of complementary research projects in coherent inter-disciplinary programs and innovation ecosystems to couple the ES and resilience problems in areas contaminated with HMs. One provides triggers for developing innovation ecosystems, examples of complementary research hypotheses, and an example of technology transfer. Finally one proposes operationalizing the conceptual methodological model in contaminated socio-ecological systems by a calibration, a sensitivity analysis, and a validation phase.
Collapse
Affiliation(s)
- Virgil Iordache
- University of Bucharest, Department of Systems Ecology and Sustainability, and "Dan Manoleli" Research Centre for Ecological Services - CESEC, Romania.
| | - Aurora Neagoe
- University of Bucharest, "Dan Manoleli" Research Centre for Ecological Services - CESEC and "Dimitrie Brândză" Botanical Garden, Romania.
| |
Collapse
|
7
|
Eid R, Landès C, Pernet A, Benoît E, Santagostini P, Ghaziri AE, Bourbeillon J. DIVIS: a semantic DIstance to improve the VISualisation of heterogeneous phenotypic datasets. BioData Min 2022; 15:10. [PMID: 35379292 PMCID: PMC8981856 DOI: 10.1186/s13040-022-00293-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/27/2022] [Indexed: 11/24/2022] Open
Abstract
Background Thanks to the wider spread of high-throughput experimental techniques, biologists are accumulating large amounts of datasets which often mix quantitative and qualitative variables and are not always complete, in particular when they regard phenotypic traits. In order to get a first insight into these datasets and reduce the data matrices size scientists often rely on multivariate analysis techniques. However such approaches are not always easily practicable in particular when faced with mixed datasets. Moreover displaying large numbers of individuals leads to cluttered visualisations which are difficult to interpret. Results We introduced a new methodology to overcome these limits. Its main feature is a new semantic distance tailored for both quantitative and qualitative variables which allows for a realistic representation of the relationships between individuals (phenotypic descriptions in our case). This semantic distance is based on ontologies which are engineered to represent real-life knowledge regarding the underlying variables. For easier handling by biologists, we incorporated its use into a complete tool, from raw data file to visualisation. Following the distance calculation, the next steps performed by the tool consist in (i) grouping similar individuals, (ii) representing each group by emblematic individuals we call archetypes and (iii) building sparse visualisations based on these archetypes. Our approach was implemented as a Python pipeline and applied to a rosebush dataset including passport and phenotypic data. Conclusions The introduction of our new semantic distance and of the archetype concept allowed us to build a comprehensive representation of an incomplete dataset characterised by a large proportion of qualitative data. The methodology described here could have wider use beyond information characterizing organisms or species and beyond plant science. Indeed we could apply the same approach to any mixed dataset. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00293-y).
Collapse
Affiliation(s)
- Rayan Eid
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France
| | - Claudine Landès
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France
| | - Alix Pernet
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France
| | | | | | | | - Julie Bourbeillon
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France.
| |
Collapse
|
8
|
Vanderbilt K, Gries C. Integrating long-tail data: How far are we? ECOL INFORM 2021. [DOI: 10.1016/j.ecoinf.2021.101372] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
9
|
Dufayard JF, Bocs S, Guignon V, Larivière D, Louis A, Oubda N, Rouard M, Ruiz M, de Lamotte F. RapGreen, an interactive software and web package to explore and analyze phylogenetic trees. NAR Genom Bioinform 2021; 3:lqab088. [PMID: 34568824 PMCID: PMC8459725 DOI: 10.1093/nargab/lqab088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 12/26/2022] Open
Abstract
RapGreen is a modular software package targeted at scientists handling large datasets for phylogenetic analysis. Its primary function is the graphical visualization and exploration of large trees. In addition, RapGreen offers a tree pattern search function to seek evolutionary scenarios among large collections of phylogenetic trees. Other functionalities include tree reconciliation with a given species tree: the detection of duplication or loss events during evolution and tree rooting. Last but not least, RapGreen features the ability to integrate heterogeneous data while visualizing and otherwise analyzing phylogenetic trees.
Collapse
Affiliation(s)
- Jean-François Dufayard
- CIRAD, UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Stéphanie Bocs
- CIRAD, UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Valentin Guignon
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, France
| | - Delphine Larivière
- CIRAD, UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Alexandra Louis
- IBENS, Institut de Biologie de l’ENS, Département de biologie, École normale supérieure, CNRS, INSERM, Université PSL, 75005 Paris, France
| | - Nicolas Oubda
- CIRAD, UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Mathieu Rouard
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- Bioversity International, Parc Scientifique Agropolis II, 34397, Montpellier, France
| | - Manuel Ruiz
- CIRAD, UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
| | - Frédéric de Lamotte
- French Institute of Bioinformatics (IFB) - South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, F-34398 Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398 Montpellier, France
| |
Collapse
|
10
|
Andrés-Hernández L, Halimi RA, Mauleon R, Mayes S, Baten A, King GJ. Challenges for FAIR-compliant description and comparison of crop phenotype data with standardized controlled vocabularies. Database (Oxford) 2021; 2021:baab028. [PMID: 33991093 PMCID: PMC8122365 DOI: 10.1093/database/baab028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 04/14/2021] [Accepted: 04/30/2021] [Indexed: 12/04/2022]
Abstract
Crop phenotypic data underpin many pre-breeding efforts to characterize variation within germplasm collections. Although there has been an increase in the global capacity for accumulating and comparing such data, a lack of consistency in the systematic description of metadata often limits integration and sharing. We therefore aimed to understand some of the challenges facing findable, accesible, interoperable and reusable (FAIR) curation and annotation of phenotypic data from minor and underutilized crops. We used bambara groundnut (Vigna subterranea) as an exemplar underutilized crop to assess the ability of the Crop Ontology system to facilitate curation of trait datasets, so that they are accessible for comparative analysis. This involved generating a controlled vocabulary Trait Dictionary of 134 terms. Systematic quantification of syntactic and semantic cohesiveness of the full set of 28 crop-specific COs identified inconsistencies between trait descriptor names, a relative lack of cross-referencing to other ontologies and a flat ontological structure for classifying traits. We also evaluated the Minimal Information About a Phenotyping Experiment and FAIR compliance of bambara trait datasets curated within the CropStoreDB schema. We discuss specifications for a more systematic and generic approach to trait controlled vocabularies, which would benefit from representation of terms that adhere to Open Biological and Biomedical Ontologies principles. In particular, we focus on the benefits of reuse of existing definitions within pre- and post-composed axioms from other domains in order to facilitate the curation and comparison of datasets from a wider range of crops. Database URL: https://www.cropstoredb.org/cs_bambara.html.
Collapse
Affiliation(s)
- Liliana Andrés-Hernández
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Razlin Azman Halimi
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Ramil Mauleon
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Sean Mayes
- School of Biosciences, University of Nottingham, Sutton Bonington, Leicestershire, LE12 5RD,Nottingham, Nottingham, UK
| | - Abdul Baten
- Institute of Precision Medicine & Bioinformatics, Sydney Local Health District, Royal Prince Alfred Hospital, Missenden Road, Camperdown, NSW 2050, Australia
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| |
Collapse
|
11
|
Arnaud E, Laporte MA, Kim S, Aubert C, Leonelli S, Miro B, Cooper L, Jaiswal P, Kruseman G, Shrestha R, Buttigieg PL, Mungall CJ, Pietragalla J, Agbona A, Muliro J, Detras J, Hualla V, Rathore A, Das RR, Dieng I, Bauchet G, Menda N, Pommier C, Shaw F, Lyon D, Mwanzia L, Juarez H, Bonaiuti E, Chiputwa B, Obileye O, Auzoux S, Yeumo ED, Mueller LA, Silverstein K, Lafargue A, Antezana E, Devare M, King B. The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems. PATTERNS (NEW YORK, N.Y.) 2020; 1:100105. [PMID: 33205138 PMCID: PMC7660444 DOI: 10.1016/j.patter.2020.100105] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/28/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022]
Abstract
Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams.
Collapse
Affiliation(s)
- Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Soonho Kim
- Markets, Trade and Institutions Division (MTID), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Céline Aubert
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Sabina Leonelli
- Department of Sociology, Philosophy and Anthropology & Exeter Centre for the Study of the Life Sciences (Egenis), University of Exeter, Exeter, UK
| | - Berta Miro
- Agrifood Policy Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Gideon Kruseman
- Socio-Economics Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, Mexico
| | - Rosemary Shrestha
- Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, México
| | - Pier Luigi Buttigieg
- Helmholtz Metadata Collaboration, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Afolabi Agbona
- Cassava Breeding Program, International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Jeffrey Detras
- Bioinformatics Cluster, Strategic Innovation Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Vilma Hualla
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Abhishek Rathore
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Roma Rani Das
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Ibnou Dieng
- Biometrics Unit, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Guillaume Bauchet
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Naama Menda
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Cyril Pommier
- BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Felix Shaw
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - David Lyon
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Leroy Mwanzia
- Performance, Innovation and Strategic Analysis, International Center for Tropical Agriculture (CIAT), Regional Office for Africa, Nairobi, Kenya
| | - Henry Juarez
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Enrico Bonaiuti
- Monitoring, Evaluation and Learning Team, International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut, Lebanon
| | - Brian Chiputwa
- Research Methods Group (RMG), World Agroforestry (ICRAF), Nairobi, Kenya
| | - Olatunbosun Obileye
- Data Management Section, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Sandrine Auzoux
- UPR AIDA, The French Agricultural Research Centre for International Development (CIRAD), Sainte-Clotilde, Réunion, France
- Université de Montpellier, Montpellier, France
| | - Esther Dzalé Yeumo
- Unité Délégation à l’Information Scientifique et Technique - DIST, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Lukas A. Mueller
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | | | | | - Erick Antezana
- Bayer Crop Science SA-NV, Diegem, Belgium
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Medha Devare
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Brian King
- CGIAR Platform for Big Data in Agriculture, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|
12
|
Abstract
Abstract
Biodiversity research studies the variability and diversity of organisms, including variability within and between species with particular focus on the functional diversity of traits and their relationship to environment. Managing biodiversity data implies dealing with its heterogeneous nature using semantics and tailored ontologies. These are themselves differently conceived, and combining them in semantically enabled applications necessitates an effective alignment between their concepts. This paper describes the ontology matching of biodiversity- and ecology-related ontologies. We illustrate diverse challenges introduced by this kind of ontologies to ontology matching in general. Real use cases requiring pairwise alignments between environment and trait ontologies are introduced. We describe our experience creating a new track at the Ontology Alignment Evaluation Initiative designed for this specific domain and report on the results obtained by state-of-the-art participating systems. The biodiversity and ecology use case turns out to be a strong one for ontology matching, introducing new interesting challenges. Even if most of the matching systems perform relatively well in the proposed matching tasks, there is still room for improvement. We highlight possible directions in that matter and elaborate on our plan to further progress with the track.
Collapse
|
13
|
The Spider Anatomy Ontology (SPD)—A Versatile Tool to Link Anatomy with Cross-Disciplinary Data. DIVERSITY 2019. [DOI: 10.3390/d11100202] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Spiders are a diverse group with a high eco-morphological diversity, which complicates anatomical descriptions especially with regard to its terminology. New terms are constantly proposed, and definitions and limits of anatomical concepts are regularly updated. Therefore, it is often challenging to find the correct terms, even for trained scientists, especially when the terminology has obstacles such as synonyms, disputed definitions, ambiguities, or homonyms. Here, we present the Spider Anatomy Ontology (SPD), which we developed combining the functionality of a glossary (a controlled defined vocabulary) with a network of formalized relations between terms that can be used to compute inferences. The SPD follows the guidelines of the Open Biomedical Ontologies and is available through the NCBO BioPortal (ver. 1.1). It constitutes of 757 valid terms and definitions, is rooted with the Common Anatomy Reference Ontology (CARO), and has cross references to other ontologies, especially of arthropods. The SPD offers a wealth of anatomical knowledge that can be used as a resource for any scientific study as, for example, to link images to phylogenetic datasets, compute structural complexity over phylogenies, and produce ancestral ontologies. By using a common reference in a standardized way, the SPD will help bridge diverse disciplines, such as genomics, taxonomy, systematics, evolution, ecology, and behavior.
Collapse
|
14
|
Hong N, Wen A, Shen F, Sohn S, Wang C, Liu H, Jiang G. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2019; 2:570-579. [PMID: 32025655 PMCID: PMC6993992 DOI: 10.1093/jamiaopen/ooz056] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/23/2019] [Accepted: 10/01/2019] [Indexed: 11/30/2022] Open
Abstract
Objective To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory). Conclusion We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.
Collapse
Affiliation(s)
- Na Hong
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Chen Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
15
|
Schneider FD, Fichtmueller D, Gossner MM, Güntsch A, Jochum M, König‐Ries B, Le Provost G, Manning P, Ostrowski A, Penone C, Simons NK. Towards an ecological trait‐data standard. Methods Ecol Evol 2019. [DOI: 10.1111/2041-210x.13288] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Affiliation(s)
- Florian D. Schneider
- unaffiliated, c/o Birgitta König‐Ries Department of Mathematics and Computer Science Friedrich‐Schiller‐Universität Jena Jena Germany
| | - David Fichtmueller
- Botanic Garden and Botanical Museum Berlin Freie Universität Berlin Berlin Germany
| | - Martin M. Gossner
- Forest Entomology Swiss Federal Research Institute WSL Birmensdorf Switzerland
| | - Anton Güntsch
- Botanic Garden and Botanical Museum Berlin Freie Universität Berlin Berlin Germany
| | - Malte Jochum
- Institute of Plant Sciences University of Bern Bern Switzerland
- German Centre for Integrative Biodiversity Research (iDiv) Halle‐Jena‐Leipzig Leipzig Germany
- Institute of Biology Leipzig University Leipzig Germany
| | - Birgitta König‐Ries
- Department of Mathematics and Computer Science Friedrich‐Schiller‐Universität Jena Jena Germany
| | - Gaëtane Le Provost
- Senckenberg Biodiversity and Climate Research Centre (BiK‐F) Frankfurt am Main Germany
| | - Peter Manning
- Senckenberg Biodiversity and Climate Research Centre (BiK‐F) Frankfurt am Main Germany
| | - Andreas Ostrowski
- Department of Mathematics and Computer Science Friedrich‐Schiller‐Universität Jena Jena Germany
| | - Caterina Penone
- Institute of Plant Sciences University of Bern Bern Switzerland
| | - Nadja K. Simons
- Department of Ecology and Ecosystem Management Technische Universität München Freising Germany
- Ecological Networks Department of Biology Technische Universität Darmstadt Darmstadt Germany
| |
Collapse
|
16
|
Cooper L, Meier A, Laporte MA, Elser JL, Mungall C, Sinn BT, Cavaliere D, Carbon S, Dunn NA, Smith B, Qu B, Preece J, Zhang E, Todorovic S, Gkoutos G, Doonan JH, Stevenson DW, Arnaud E, Jaiswal P. The Planteome database: an integrated resource for reference ontologies, plant genomics and phenomics. Nucleic Acids Res 2019; 46:D1168-D1180. [PMID: 29186578 PMCID: PMC5753347 DOI: 10.1093/nar/gkx1152] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 11/21/2017] [Indexed: 01/08/2023] Open
Abstract
The Planteome project (http://www.planteome.org) provides a suite of reference and species-specific ontologies for plants and annotations to genes and phenotypes. Ontologies serve as common standards for semantic integration of a large and growing corpus of plant genomics, phenomics and genetics data. The reference ontologies include the Plant Ontology, Plant Trait Ontology and the Plant Experimental Conditions Ontology developed by the Planteome project, along with the Gene Ontology, Chemical Entities of Biological Interest, Phenotype and Attribute Ontology, and others. The project also provides access to species-specific Crop Ontologies developed by various plant breeding and research communities from around the world. We provide integrated data on plant traits, phenotypes, and gene function and expression from 95 plant taxa, annotated with reference ontology terms. The Planteome project is developing a plant gene annotation platform; Planteome Noctua, to facilitate community engagement. All the Planteome ontologies are publicly available and are maintained at the Planteome GitHub site (https://github.com/Planteome) for sharing, tracking revisions and new requests. The annotated data are freely accessible from the ontology browser (http://browser.planteome.org/amigo) and our data repository.
Collapse
Affiliation(s)
- Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Austin Meier
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | | | - Justin L Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Chris Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | - Seth Carbon
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nathan A Dunn
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY 14260, USA
| | - Botong Qu
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Eugene Zhang
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Sinisa Todorovic
- School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR 97331, USA
| | - Georgios Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - John H Doonan
- National Plant Phenomics Centre, Institute of Biological, Environmental, and Rural Sciences, Aberystwyth University, Aberystwyth SY23 3DA, UK
| | | | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, 34397 Montpellier Cedex 5, France
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
17
|
Walls RL, Cooper L, Elser J, Gandolfo MA, Mungall CJ, Smith B, Stevenson DW, Jaiswal P. The Plant Ontology Facilitates Comparisons of Plant Development Stages Across Species. FRONTIERS IN PLANT SCIENCE 2019; 10:631. [PMID: 31214208 PMCID: PMC6558174 DOI: 10.3389/fpls.2019.00631] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 04/26/2019] [Indexed: 06/09/2023]
Abstract
The Plant Ontology (PO) is a community resource consisting of standardized terms, definitions, and logical relations describing plant structures and development stages, augmented by a large database of annotations from genomic and phenomic studies. This paper describes the structure of the ontology and the design principles we used in constructing PO terms for plant development stages. It also provides details of the methodology and rationale behind our revision and expansion of the PO to cover development stages for all plants, particularly the land plants (bryophytes through angiosperms). As a case study to illustrate the general approach, we examine variation in gene expression across embryo development stages in Arabidopsis and maize, demonstrating how the PO can be used to compare patterns of expression across stages and in developmentally different species. Although many genes appear to be active throughout embryo development, we identified a small set of uniquely expressed genes for each stage of embryo development and also between the two species. Evaluating the different sets of genes expressed during embryo development in Arabidopsis or maize may inform future studies of the divergent developmental pathways observed in monocotyledonous versus dicotyledonous species. The PO and its annotation database (http://www.planteome.org) make plant data for any species more discoverable and accessible through common formats, thus providing support for applications in plant pathology, image analysis, and comparative development and evolution.
Collapse
Affiliation(s)
- Ramona L. Walls
- CyVerse, Bio5 Institute, The University of Arizona, Tucson, AZ, United States
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Maria Alejandra Gandolfo
- Liberty Hyde Bailey Hortorium, Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States
| | - Barry Smith
- Department of Philosophy, University at Buffalo, Buffalo, NY, United States
| | | | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
18
|
Neveu P, Tireau A, Hilgert N, Nègre V, Mineau‐Cesari J, Brichet N, Chapuis R, Sanchez I, Pommier C, Charnomordic B, Tardieu F, Cabrera‐Bosquet L. Dealing with multi-source and multi-scale information in plant phenomics: the ontology-driven Phenotyping Hybrid Information System. THE NEW PHYTOLOGIST 2019; 221:588-601. [PMID: 30152011 PMCID: PMC6585972 DOI: 10.1111/nph.15385] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 07/07/2018] [Indexed: 05/13/2023]
Abstract
Phenomic datasets need to be accessible to the scientific community. Their reanalysis requires tracing relevant information on thousands of plants, sensors and events. The open-source Phenotyping Hybrid Information System (PHIS) is proposed for plant phenotyping experiments in various categories of installations (field, glasshouse). It unambiguously identifies all objects and traits in an experiment and establishes their relations via ontologies and semantics that apply to both field and controlled conditions. For instance, the genotype is declared for a plant or plot and is associated with all objects related to it. Events such as successive plant positions, anomalies and annotations are associated with objects so they can be easily retrieved. Its ontology-driven architecture is a powerful tool for integrating and managing data from multiple experiments and platforms, for creating relationships between objects and enriching datasets with knowledge and metadata. It interoperates with external resources via web services, thereby allowing data integration into other systems; for example, modelling platforms or external databases. It has the potential for rapid diffusion because of its ability to integrate, manage and visualize multi-source and multi-scale data, but also because it is based on 10 yr of trial and error in our groups.
Collapse
Affiliation(s)
- Pascal Neveu
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Anne Tireau
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Nadine Hilgert
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Vincent Nègre
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Jonathan Mineau‐Cesari
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Nicolas Brichet
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Romain Chapuis
- UE DIASCOPE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Isabelle Sanchez
- MISTEA, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | - Cyril Pommier
- INRA, UR1164 URGI – Research Unit in Genomics‐InfoINRA de Versailles‐GrignonRoute de Saint‐CyrVersailles78026France
| | | | - François Tardieu
- LEPSE, INRA, Montpellier SupAgro, Université de MontpellierMontpellier34060France
| | | |
Collapse
|
19
|
Gkoutos GV, Schofield PN, Hoehndorf R. The anatomy of phenotype ontologies: principles, properties and applications. Brief Bioinform 2018; 19:1008-1021. [PMID: 28387809 PMCID: PMC6169674 DOI: 10.1093/bib/bbx035] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2017] [Revised: 02/05/2017] [Indexed: 12/14/2022] Open
Abstract
The past decade has seen an explosion in the collection of genotype data in domains as diverse as medicine, ecology, livestock and plant breeding. Along with this comes the challenge of dealing with the related phenotype data, which is not only large but also highly multidimensional. Computational analysis of phenotypes has therefore become critical for our ability to understand the biological meaning of genomic data in the biological sciences. At the heart of computational phenotype analysis are the phenotype ontologies. A large number of these ontologies have been developed across many domains, and we are now at a point where the knowledge captured in the structure of these ontologies can be used for the integration and analysis of large interrelated data sets. The Phenotype And Trait Ontology framework provides a method for formal definitions of phenotypes and associated data sets and has proved to be key to our ability to develop methods for the integration and analysis of phenotype data. Here, we describe the development and products of the ontological approach to phenotype capture, the formal content of phenotype ontologies and how their content can be used computationally.
Collapse
Affiliation(s)
| | | | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, King Abdullah University of Science and Technology, Thuwal
| |
Collapse
|
20
|
Kissling WD, Walls R, Bowser A, Jones MO, Kattge J, Agosti D, Amengual J, Basset A, van Bodegom PM, Cornelissen JHC, Denny EG, Deudero S, Egloff W, Elmendorf SC, Alonso García E, Jones KD, Jones OR, Lavorel S, Lear D, Navarro LM, Pawar S, Pirzl R, Rüger N, Sal S, Salguero-Gómez R, Schigel D, Schulz KS, Skidmore A, Guralnick RP. Towards global data products of Essential Biodiversity Variables on species traits. Nat Ecol Evol 2018; 2:1531-1540. [PMID: 30224814 DOI: 10.1038/s41559-018-0667-3] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 07/16/2018] [Indexed: 02/03/2023]
Abstract
Essential Biodiversity Variables (EBVs) allow observation and reporting of global biodiversity change, but a detailed framework for the empirical derivation of specific EBVs has yet to be developed. Here, we re-examine and refine the previous candidate set of species traits EBVs and show how traits related to phenology, morphology, reproduction, physiology and movement can contribute to EBV operationalization. The selected EBVs express intra-specific trait variation and allow monitoring of how organisms respond to global change. We evaluate the societal relevance of species traits EBVs for policy targets and demonstrate how open, interoperable and machine-readable trait data enable the building of EBV data products. We outline collection methods, meta(data) standardization, reproducible workflows, semantic tools and licence requirements for producing species traits EBVs. An operationalization is critical for assessing progress towards biodiversity conservation and sustainable development goals and has wide implications for data-intensive science in ecology, biogeography, conservation and Earth observation.
Collapse
Affiliation(s)
- W Daniel Kissling
- Department of Theoretical and Computational Ecology, Institute for Biodiversity and Ecosystem Dynamics (IBED), University of Amsterdam, Amsterdam, The Netherlands.
| | | | - Anne Bowser
- Woodrow Wilson International Center for Scholars, Washington DC, USA
| | - Matthew O Jones
- University of Montana, W. A. Franke Department of Forestry and Conservation, Missoula, MT, USA
| | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Jena, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| | | | - Josep Amengual
- Area de Conservacion, Seguimiento y Programas de la Red, Organismo Autonomo Parques Nacionales, Ministerio de Agricultura y Pesca, Madrid, Spain
| | - Alberto Basset
- Department of Biological and Environmental Sciences and Technologies, University of Salento, Lecce, Italy
| | - Peter M van Bodegom
- Institute of Environmental Sciences, Leiden University, Leiden, The Netherlands
| | - Johannes H C Cornelissen
- Systems Ecology, Department of Ecological Science, Vrije Universiteit, Amsterdam, The Netherlands
| | - Ellen G Denny
- USA National Phenology Network, University of Arizona, Tucson, AZ, USA
| | - Salud Deudero
- Instituto Español de Oceanografía, Centro Oceanográfico de Baleares, Palma de Mallorca, Spain
| | | | - Sarah C Elmendorf
- National Ecological Observatory Network, Battelle Ecology, Boulder, CO, USA.,Department of Ecology and Evolutionary Biology, University of Colorado, Boulder, CO, USA
| | | | - Katherine D Jones
- National Ecological Observatory Network, Battelle Ecology, Boulder, CO, USA
| | - Owen R Jones
- Department of Biology, University of Southern Denmark, Odense M, Denmark
| | - Sandra Lavorel
- Laboratoire d'Ecologie Alpine, CNRS - Université Grenoble Alpes, Grenoble, France
| | - Dan Lear
- Marine Biological Association of the United Kingdom, Plymouth, Devon, UK
| | - Laetitia M Navarro
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Institute of Biology, Martin Luther University Halle Wittenberg, Halle (Saale), Germany
| | - Samraat Pawar
- Department of Life Sciences, Imperial College London, Ascot, Berkshire, UK
| | - Rebecca Pirzl
- CSIRO and Atlas of Living Australia, Canberra, Australian Capital Territory, Australia
| | - Nadja Rüger
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.,Smithsonian Tropical Research Institute, Ancon, Panama
| | - Sofia Sal
- Department of Life Sciences, Imperial College London, Ascot, Berkshire, UK
| | - Roberto Salguero-Gómez
- Department of Zoology, Oxford University, Oxford, UK.,Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK.,Centre for Biodiversity and Conservation Science, University of Queensland, St Lucia, Queensland, Australia.,Evolutionary Demography Laboratory, Max Plank Institute for Demographic Research, Rostock, Germany
| | - Dmitry Schigel
- Global Biodiversity Information Facility (GBIF), Secretariat, Copenhagen, Denmark
| | - Katja-Sabine Schulz
- Smithsonian Institution, National Museum of Natural History, Washington DC, USA
| | - Andrew Skidmore
- Department of Natural Resources, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente, Enschede, The Netherlands.,Department of Environmental Science, Macquarie University, New South Wales, Australia
| | - Robert P Guralnick
- Florida Museum of Natural History, University of Florida, Gainesville, FL, USA
| |
Collapse
|
21
|
Mohamad-Matrol AA, Chang SW, Abu A. Plant data visualisation using network graphs. PeerJ 2018; 6:e5579. [PMID: 30186704 PMCID: PMC6120445 DOI: 10.7717/peerj.5579] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 08/12/2018] [Indexed: 11/20/2022] Open
Abstract
Background The amount of plant data such as taxonomical classification, morphological characteristics, ecological attributes and geological distribution in textual and image forms has increased rapidly due to emerging research and technologies. Therefore, it is crucial for experts as well as the public to discern meaningful relationships from this vast amount of data using appropriate methods. The data are often presented in lengthy texts and tables, which make gaining new insights difficult. The study proposes a visual-based representation to display data to users in a meaningful way. This method emphasises the relationships between different data sets. Method This study involves four main steps which translate text-based results from Extensible Markup Language (XML) serialisation format into graphs. The four steps include: (1) conversion of ontological dataset as graph model data; (2) query from graph model data; (3) transformation of text-based results in XML serialisation format into a graphical form; and (4) display of results to the user via a graphical user interface (GUI). Ontological data for plants and samples of trees and shrubs were used as the dataset to demonstrate how plant-based data could be integrated into the proposed data visualisation. Results A visualisation system named plant visualisation system was developed. This system provides a GUI that enables users to perform the query process, as well as a graphical viewer to display the results of the query in the form of a network graph. The efficiency of the developed visualisation system was measured by performing two types of user evaluations: a usability heuristics evaluation, and a query and visualisation evaluation. Discussion The relationships between the data were visualised, enabling the users to easily infer the knowledge and correlations between data. The results from the user evaluation show that the proposed visualisation system is suitable for both expert and novice users, with or without computer skills. This technique demonstrates the practicability of using a computer assisted-tool by providing cognitive analysis for understanding relationships between data. Therefore, the results benefit not only botanists, but also novice users, especially those that are interested to know more about plants.
Collapse
Affiliation(s)
| | - Siow-Wee Chang
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Arpah Abu
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia.,Centre of Research for Computational Sciences and Informatics for Biology, Bioindustry, Environment, Agriculture and Healthcare, University of Malaya, Kuala Lumpur, Malaysia
| |
Collapse
|
22
|
Sauquet H, Magallón S. Key questions and challenges in angiosperm macroevolution. THE NEW PHYTOLOGIST 2018; 219:1170-1187. [PMID: 29577323 DOI: 10.1111/nph.15104] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/26/2017] [Accepted: 02/05/2018] [Indexed: 05/26/2023]
Abstract
Contents Summary 1170 I. Introduction 1170 II. Six key questions 1172 III. Three key challenges 1177 IV. Conclusions 1181 Acknowledgements 1182 References 1183 SUMMARY: The origin and rapid diversification of angiosperms (flowering plants) represent one of the most intriguing topics in evolutionary biology. Despite considerable progress made in complementary fields over the last two decades (paleobotany, phylogenetics, ecology, evo-devo, genomics), many important questions remain. For instance, what has been the impact of mass extinctions on angiosperm diversification? Are the angiosperms an adaptive radiation? Has morphological evolution in angiosperms been gradual or pulsed? We propose that the recent and ongoing revolution in macroevolutionary methods provides an unprecedented opportunity to explore long-standing questions that probably hold important clues to understand present-day biodiversity. We present six key questions that explore the origin and diversification of angiosperms. We also identify three key challenges to address these questions: (1) the development of new integrative models that include diversification, multiple intrinsic and environmental traits, biogeography and the fossil record all at once, whilst accounting for sampling bias and heterogeneity of macroevolutionary processes through time and among lineages; (2) the need for large and standardized synthetic databases of morphological variation; and (3) continuous effort on sampling the fossil record, but with a revolution in current paleobotanical practice.
Collapse
Affiliation(s)
- Hervé Sauquet
- National Herbarium of New South Wales (NSW), Royal Botanic Gardens and Domain Trust, Sydney, NSW, 2000, Australia
- Laboratoire Écologie, Systématique, Évolution, Université Paris-Sud, CNRS, UMR 8079, Orsay, 91405, France
| | - Susana Magallón
- Instituto de Biología, Universidad Nacional Autónoma de México, Circuito Exterior, Ciudad Universitaria, Coyoacán, México City, 04510, México
| |
Collapse
|
23
|
Jackson LM, Fernando PC, Hanscom JS, Balhoff JP, Mabee PM. Automated Integration of Trees and Traits: A Case Study Using Paired Fin Loss Across Teleost Fishes. Syst Biol 2018; 67:559-575. [PMID: 29325126 PMCID: PMC6005059 DOI: 10.1093/sysbio/syx098] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Revised: 12/15/2017] [Accepted: 12/21/2017] [Indexed: 11/24/2022] Open
Abstract
Data synthesis required for large-scale macroevolutionary studies is challenging with the current tools available for integration. Using a classic question regarding the frequency of paired fin loss in teleost fishes as a case study, we sought to create automated methods to facilitate the integration of broad-scale trait data with a sizable species-level phylogeny. Similar to the evolutionary pattern previously described for limbs, pelvic and pectoral fin reduction and loss are thought to have occurred independently multiple times in the evolution of fishes. We developed a bioinformatics pipeline to identify the presence and absence of pectoral and pelvic fins of 12,582 species. To do this, we integrated a synthetic morphological supermatrix of phenotypic data for the pectoral and pelvic fins for teleost fishes from the Phenoscape Knowledgebase (two presence/absence characters for 3047 taxa) with a species-level tree for teleost fishes from the Open Tree of Life project (38,419 species). The integration method detailed herein harnessed a new combined approach by utilizing data based on ontological inference, as well as phylogenetic propagation, to reduce overall data loss. Using inference enabled by ontology-based annotations, missing data were reduced from 98.0% to 85.9%, and further reduced to 34.8% by phylogenetic data propagation. These methods allowed us to extend the data to an additional 11,293 species for a total of 12,582 species with trait data. The pectoral fin appears to have been independently lost in a minimum of 19 lineages and the pelvic fin in 48. Though interpretation is limited by lack of phylogenetic resolution at the species level, it appears that following loss, both pectoral and pelvic fins were regained several (3) to many (14) times respectively. Focused investigation into putative regains of the pectoral fin, all within one clade (Anguilliformes), showed that the pectoral fin was regained at least twice following loss. Overall, this study points to specific teleost clades where strategic phylogenetic resolution and genetic investigation will be necessary to understand the pattern and frequency of pectoral fin reversals.
Collapse
Affiliation(s)
- Laura M Jackson
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - Pasan C Fernando
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - Josh S Hanscom
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, 100 Europa Drive Suite 540, Chapel Hill, NC 27517, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, 414 East Clark St., Vermillion, SD 57069, USA
| |
Collapse
|
24
|
Singh G, Kuzniar A, van Mulligen EM, Gavai A, Bachem CW, Visser RGF, Finkers R. QTLTableMiner ++: semantic mining of QTL tables in scientific articles. BMC Bioinformatics 2018; 19:183. [PMID: 29801439 PMCID: PMC5970438 DOI: 10.1186/s12859-018-2165-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 04/25/2018] [Indexed: 11/11/2022] Open
Abstract
Background A quantitative trait locus (QTL) is a genomic region that correlates with a phenotype. Most of the experimental information about QTL mapping studies is described in tables of scientific publications. Traditional text mining techniques aim to extract information from unstructured text rather than from tables. We present QTLTableMiner++ (QTM), a table mining tool that extracts and semantically annotates QTL information buried in (heterogeneous) tables of plant science literature. QTM is a command line tool written in the Java programming language. This tool takes scientific articles from the Europe PMC repository as input, extracts QTL tables using keyword matching and ontology-based concept identification. The tables are further normalized using rules derived from table properties such as captions, column headers and table footers. Furthermore, table columns are classified into three categories namely column descriptors, properties and values based on column headers and data types of cell entries. Abbreviations found in the tables are expanded using the Schwartz and Hearst algorithm. Finally, the content of QTL tables is semantically enriched with domain-specific ontologies (e.g. Crop Ontology, Plant Ontology and Trait Ontology) using the Apache Solr search platform and the results are stored in a relational database and a text file. Results The performance of the QTM tool was assessed by precision and recall based on the information retrieved from two manually annotated corpora of open access articles, i.e. QTL mapping studies in tomato (Solanum lycopersicum) and in potato (S. tuberosum). In summary, QTM detected QTL statements in tomato with 74.53% precision and 92.56% recall and in potato with 82.82% precision and 98.94% recall. Conclusion QTM is a unique tool that aids in providing QTL information in machine-readable and semantically interoperable formats. Electronic supplementary material The online version of this article (10.1186/s12859-018-2165-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gurnoor Singh
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Arnold Kuzniar
- Netherlands eScience Center (NLeSC), Amsterdam, The Netherlands
| | - Erik M van Mulligen
- Department of Medical Informatics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Anand Gavai
- Netherlands eScience Center (NLeSC), Amsterdam, The Netherlands
| | - Christian W Bachem
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard G F Visser
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard Finkers
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands.
| |
Collapse
|
25
|
Lu-Irving P, Marx HE, Dlugosch KM. Leveraging contemporary species introductions to test phylogenetic hypotheses of trait evolution. CURRENT OPINION IN PLANT BIOLOGY 2018; 42:95-102. [PMID: 29754025 DOI: 10.1016/j.pbi.2018.04.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 04/18/2018] [Accepted: 04/22/2018] [Indexed: 06/08/2023]
Abstract
Plant trait evolution is a topic of interest across disciplines and scales. Phylogenetic studies are powerful for generating hypotheses about the mechanisms that have shaped plant traits and their evolution. Introduced plants are a rich source of data on contemporary trait evolution. Introductions could provide especially useful tests of a variety of evolutionary hypotheses because the environments selecting on evolving traits are still present. We review phylogenetic and contemporary studies of trait evolution and identify areas of overlap and areas for further integration. Emerging tools which can promote integration include broadly focused repositories of trait data, and comparative models of trait evolution that consider both intra and interspecific variation.
Collapse
Affiliation(s)
- Patricia Lu-Irving
- Department of Ecology and Evolutionary Biology, University of Arizona, PO Box 210088, Tucson, AZ 85721, USA.
| | - Hannah E Marx
- Department of Ecology and Evolutionary Biology, University of Arizona, PO Box 210088, Tucson, AZ 85721, USA
| | - Katrina M Dlugosch
- Department of Ecology and Evolutionary Biology, University of Arizona, PO Box 210088, Tucson, AZ 85721, USA
| |
Collapse
|
26
|
Chiu B, Pyysalo S, Vulić I, Korhonen A. Bio-SimVerb and Bio-SimLex: wide-coverage evaluation sets of word similarity in biomedicine. BMC Bioinformatics 2018; 19:33. [PMID: 29402212 PMCID: PMC5800055 DOI: 10.1186/s12859-018-2039-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2017] [Accepted: 01/24/2018] [Indexed: 01/10/2023] Open
Abstract
Background Word representations support a variety of Natural Language Processing (NLP) tasks. The quality of these representations is typically assessed by comparing the distances in the induced vector spaces against human similarity judgements. Whereas comprehensive evaluation resources have recently been developed for the general domain, similar resources for biomedicine currently suffer from the lack of coverage, both in terms of word types included and with respect to the semantic distinctions. Notably, verbs have been excluded, although they are essential for the interpretation of biomedical language. Further, current resources do not discern between semantic similarity and semantic relatedness, although this has been proven as an important predictor of the usefulness of word representations and their performance in downstream applications. Results We present two novel comprehensive resources targeting the evaluation of word representations in biomedicine. These resources, Bio-SimVerb and Bio-SimLex, address the previously mentioned problems, and can be used for evaluations of verb and noun representations respectively. In our experiments, we have computed the Pearson’s correlation between performances on intrinsic and extrinsic tasks using twelve popular state-of-the-art representation models (e.g. word2vec models). The intrinsic–extrinsic correlations using our datasets are notably higher than with previous intrinsic evaluation benchmarks such as UMNSRS and MayoSRS. In addition, when evaluating representation models for their abilities to capture verb and noun semantics individually, we show a considerable variation between performances across all models. Conclusion Bio-SimVerb and Bio-SimLex enable intrinsic evaluation of word representations. This evaluation can serve as a predictor of performance on various downstream tasks in the biomedical domain. The results on Bio-SimVerb and Bio-SimLex using standard word representation models highlight the importance of developing dedicated evaluation resources for NLP in biomedicine for particular word classes (e.g. verbs). These are needed to identify the most accurate methods for learning class-specific representations. Bio-SimVerb and Bio-SimLex are publicly available.
Collapse
Affiliation(s)
- Billy Chiu
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK.
| | - Sampo Pyysalo
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Ivan Vulić
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| | - Anna Korhonen
- Language Technology Laboratory, DTAL, University of Cambridge, 9 West Road, Cambridge, CB39DB, UK
| |
Collapse
|
27
|
Salhi A, Negrão S, Essack M, Morton MJL, Bougouffa S, Razali R, Radovanovic A, Marchand B, Kulmanov M, Hoehndorf R, Tester M, Bajic VB. DES-TOMATO: A Knowledge Exploration System Focused On Tomato Species. Sci Rep 2017; 7:5968. [PMID: 28729549 PMCID: PMC5519719 DOI: 10.1038/s41598-017-05448-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2017] [Accepted: 05/25/2017] [Indexed: 12/29/2022] Open
Abstract
Tomato is the most economically important horticultural crop used as a model to study plant biology and particularly fruit development. Knowledge obtained from tomato research initiated improvements in tomato and, being transferrable to other such economically important crops, has led to a surge of tomato-related research and published literature. We developed DES-TOMATO knowledgebase (KB) for exploration of information related to tomato. Information exploration is enabled through terms from 26 dictionaries and combination of these terms. To illustrate the utility of DES-TOMATO, we provide several examples how one can efficiently use this KB to retrieve known or potentially novel information. DES-TOMATO is free for academic and nonprofit users and can be accessed at http://cbrc.kaust.edu.sa/des_tomato/, using any of the mainstream web browsers, including Firefox, Safari and Chrome.
Collapse
Affiliation(s)
- Adil Salhi
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Sónia Negrão
- King Abdullah University of Science and Technology (KAUST), Division of Biological and Environmental Sciences and Engineering, Thuwal, 23955-6900, Saudi Arabia
| | - Magbubah Essack
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Mitchell J L Morton
- King Abdullah University of Science and Technology (KAUST), Division of Biological and Environmental Sciences and Engineering, Thuwal, 23955-6900, Saudi Arabia
| | - Salim Bougouffa
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Rozaimi Razali
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Aleksandar Radovanovic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | | | - Maxat Kulmanov
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia
- King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, 23955-6900, Saudi Arabia
| | - Mark Tester
- King Abdullah University of Science and Technology (KAUST), Division of Biological and Environmental Sciences and Engineering, Thuwal, 23955-6900, Saudi Arabia
| | - Vladimir B Bajic
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal, 23955-6900, Saudi Arabia.
- King Abdullah University of Science and Technology (KAUST), Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
28
|
Willis CG, Ellwood ER, Primack RB, Davis CC, Pearson KD, Gallinat AS, Yost JM, Nelson G, Mazer SJ, Rossington NL, Sparks TH, Soltis PS. Old Plants, New Tricks: Phenological Research Using Herbarium Specimens. Trends Ecol Evol 2017; 32:531-546. [DOI: 10.1016/j.tree.2017.03.015] [Citation(s) in RCA: 183] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 03/07/2017] [Accepted: 03/31/2017] [Indexed: 11/30/2022]
|
29
|
Hoehndorf R, Alshahrani M, Gkoutos GV, Gosline G, Groom Q, Hamann T, Kattge J, de Oliveira SM, Schmidt M, Sierra S, Smets E, Vos RA, Weiland C. The flora phenotype ontology (FLOPO): tool for integrating morphological traits and phenotypes of vascular plants. J Biomed Semantics 2016; 7:65. [PMID: 27842607 PMCID: PMC5109718 DOI: 10.1186/s13326-016-0107-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Accepted: 11/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The systematic analysis of a large number of comparable plant trait data can support investigations into phylogenetics and ecological adaptation, with broad applications in evolutionary biology, agriculture, conservation, and the functioning of ecosystems. Floras, i.e., books collecting the information on all known plant species found within a region, are a potentially rich source of such plant trait data. Floras describe plant traits with a focus on morphology and other traits relevant for species identification in addition to other characteristics of plant species, such as ecological affinities, distribution, economic value, health applications, traditional uses, and so on. However, a key limitation in systematically analyzing information in Floras is the lack of a standardized vocabulary for the described traits as well as the difficulties in extracting structured information from free text. RESULTS We have developed the Flora Phenotype Ontology (FLOPO), an ontology for describing traits of plant species found in Floras. We used the Plant Ontology (PO) and the Phenotype And Trait Ontology (PATO) to extract entity-quality relationships from digitized taxon descriptions in Floras, and used a formal ontological approach based on phenotype description patterns and automated reasoning to generate the FLOPO. The resulting ontology consists of 25,407 classes and is based on the PO and PATO. The classified ontology closely follows the structure of Plant Ontology in that the primary axis of classification is the observed plant anatomical structure, and more specific traits are then classified based on parthood and subclass relations between anatomical structures as well as subclass relations between phenotypic qualities. CONCLUSIONS The FLOPO is primarily intended as a framework based on which plant traits can be integrated computationally across all species and higher taxa of flowering plants. Importantly, it is not intended to replace established vocabularies or ontologies, but rather serve as an overarching framework based on which different application- and domain-specific ontologies, thesauri and vocabularies of phenotypes observed in flowering plants can be integrated.
Collapse
Affiliation(s)
- Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Mona Alshahrani
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
- Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), King Abdullah University of Science and Technology, 4700 KAUST, Thuwal, 23955–6900 Kingdom of Saudi Arabia
| | - Georgios V. Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, B15 2TT United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, B15 2TT United Kingdom
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 2AX United Kingdom
| | - George Gosline
- Royal Botanical Gardens, Kew, Richmond, Surrey, TW9 3AB United Kingdom
| | - Quentin Groom
- Botanic Garden Meise, Nieuwelaan 38, Meise, 1860 Belgium
| | - Thomas Hamann
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Jens Kattge
- Max Planck Institute for Biogeochemistry, Hans Knoell Str. 10, Jena, 07745 Germany
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig, 04103 Germany
| | | | - Marco Schmidt
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| | - Soraya Sierra
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Erik Smets
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Rutger A. Vos
- Naturalis Biodiversity Center, P.O. Box 9517, Leiden, 2300 RA The Netherlands
| | - Claus Weiland
- Senckenberg Biodiversity and Climate Research Centre (BiK-F), Senckenberganlage 25, Frankfurt am Main, 60325 Germany
| |
Collapse
|
30
|
Behavior change interventions: the potential of ontologies for advancing science and practice. J Behav Med 2016; 40:6-22. [PMID: 27481101 DOI: 10.1007/s10865-016-9768-0] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 07/06/2016] [Indexed: 10/21/2022]
Abstract
A central goal of behavioral medicine is the creation of evidence-based interventions for promoting behavior change. Scientific knowledge about behavior change could be more effectively accumulated using "ontologies." In information science, an ontology is a systematic method for articulating a "controlled vocabulary" of agreed-upon terms and their inter-relationships. It involves three core elements: (1) a controlled vocabulary specifying and defining existing classes; (2) specification of the inter-relationships between classes; and (3) codification in a computer-readable format to enable knowledge generation, organization, reuse, integration, and analysis. This paper introduces ontologies, provides a review of current efforts to create ontologies related to behavior change interventions and suggests future work. This paper was written by behavioral medicine and information science experts and was developed in partnership between the Society of Behavioral Medicine's Technology Special Interest Group (SIG) and the Theories and Techniques of Behavior Change Interventions SIG. In recent years significant progress has been made in the foundational work needed to develop ontologies of behavior change. Ontologies of behavior change could facilitate a transformation of behavioral science from a field in which data from different experiments are siloed into one in which data across experiments could be compared and/or integrated. This could facilitate new approaches to hypothesis generation and knowledge discovery in behavioral science.
Collapse
|
31
|
He F, Yoo S, Wang D, Kumari S, Gerstein M, Ware D, Maslov S. Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016; 86:472-480. [PMID: 27015116 DOI: 10.1111/tpj.13175] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 02/24/2016] [Accepted: 03/21/2016] [Indexed: 06/05/2023]
Abstract
Transcriptome data sets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by a lack of metadata or differences in annotation styles of different labs. In this study, we carefully selected and integrated 6057 Arabidopsis microarray expression samples from 304 experiments deposited to the Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI). Metadata such as tissue type, growth conditions and developmental stage were manually curated for each sample. We then studied the global expression landscape of the integrated data set and found that samples of the same tissue tend to be more similar to each other than to samples of other tissues, even in different growth conditions or developmental stages. Root has the most distinct transcriptome, compared with aerial tissues, but the transcriptome of cultured root is more similar to the transcriptome of aerial tissues, as the cultured root samples lost their cellular identity. Using a simple computational classification method, we showed that the tissue type of a sample can be successfully predicted based on its expression profile, opening the door for automatic metadata extraction and facilitating the re-use of plant transcriptome data. As a proof of principle, we applied our automated annotation pipeline to 708 RNA-seq samples from public repositories and verified the accuracy of our predictions with sample metadata provided by the authors.
Collapse
Affiliation(s)
- Fei He
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
| | - Shinjae Yoo
- Computational Science Center, Brookhaven National Laboratory, Upton, NY, 11973, USA
- Institute of Advanced Computational Science at Stony Brook University, Stony Brook, NY, 11794, USA
| | - Daifeng Wang
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 17724, USA
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06520, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 17724, USA
- USDA ARS NEA Plant, Soil & Nutrition Laboratory Research Unit, USDA-ARS, Ithaca, NY, 14853, USA
| | - Sergei Maslov
- Biology Department, Brookhaven National Laboratory, Upton, NY, 11973, USA
- Department of Bioengineering, Carl R. Woese Institute for Genomic Biology, Urbana, IL, 61801, USA
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| |
Collapse
|
32
|
Abstract
The use of controlled, structured vocabularies (ontologies) has become a critical tool for scientists in the post-genomic era of massive datasets. Adoption and integration of common vocabularies and annotation practices enables cross-species comparative analyses and increases data sharing and reusability. The Plant Ontology (PO; http://www.plantontology.org/ ) describes plant anatomy, morphology, and the stages of plant development, and offers a database of plant genomics annotations associated to the PO terms. The scope of the PO has grown from its original design covering only rice, maize, and Arabidopsis, and now includes terms to describe all green plants from angiosperms to green algae.This chapter introduces how the PO and other related ontologies are constructed and organized, including languages and software used for ontology development, and provides an overview of the key features. Detailed instructions illustrate how to search and browse the PO database and access the associated annotation data. Users are encouraged to provide input on the ontology through the online term request form and contribute datasets for integration in the PO database.
Collapse
|
33
|
Großkinsky DK, Svensgaard J, Christensen S, Roitsch T. Plant phenomics and the need for physiological phenotyping across scales to narrow the genotype-to-phenotype knowledge gap. JOURNAL OF EXPERIMENTAL BOTANY 2015; 66:5429-40. [PMID: 26163702 DOI: 10.1093/jxb/erv345] [Citation(s) in RCA: 107] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/18/2023]
Abstract
Plants are affected by complex genome×environment×management interactions which determine phenotypic plasticity as a result of the variability of genetic components. Whereas great advances have been made in the cost-efficient and high-throughput analyses of genetic information and non-invasive phenotyping, the large-scale analyses of the underlying physiological mechanisms lag behind. The external phenotype is determined by the sum of the complex interactions of metabolic pathways and intracellular regulatory networks that is reflected in an internal, physiological, and biochemical phenotype. These various scales of dynamic physiological responses need to be considered, and genotyping and external phenotyping should be linked to the physiology at the cellular and tissue level. A high-dimensional physiological phenotyping across scales is needed that integrates the precise characterization of the internal phenotype into high-throughput phenotyping of whole plants and canopies. By this means, complex traits can be broken down into individual components of physiological traits. Since the higher resolution of physiological phenotyping by 'wet chemistry' is inherently limited in throughput, high-throughput non-invasive phenotyping needs to be validated and verified across scales to be used as proxy for the underlying processes. Armed with this interdisciplinary and multidimensional phenomics approach, plant physiology, non-invasive phenotyping, and functional genomics will complement each other, ultimately enabling the in silico assessment of responses under defined environments with advanced crop models. This will allow generation of robust physiological predictors also for complex traits to bridge the knowledge gap between genotype and phenotype for applications in breeding, precision farming, and basic research.
Collapse
Affiliation(s)
- Dominik K Großkinsky
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Højbakkegård Allé 13, 2630 Taastrup, Denmark
| | - Jesper Svensgaard
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Højbakkegård Allé 13, 2630 Taastrup, Denmark
| | - Svend Christensen
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Højbakkegård Allé 13, 2630 Taastrup, Denmark
| | - Thomas Roitsch
- Department of Plant and Environmental Sciences, Copenhagen Plant Science Centre, University of Copenhagen, Højbakkegård Allé 13, 2630 Taastrup, Denmark Global Change Research Centre, Czech Globe AS CR, v.v.i.., Drásov 470, Cz-664 24 Drásov, Czech Republic
| |
Collapse
|
34
|
Rivers J, Warthmann N, Pogson BJ, Borevitz JO. Genomic breeding for food, environment and livelihoods. Food Secur 2015. [DOI: 10.1007/s12571-015-0431-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
35
|
Beneventano D, Bergamaschi S, Sorrentino S, Vincini M, Benedetti F. Semantic annotation of the CEREALAB database by the AGROVOC linked dataset. ECOL INFORM 2015. [DOI: 10.1016/j.ecoinf.2014.07.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
36
|
Oellrich A, Walls RL, Cannon EKS, Cannon SB, Cooper L, Gardiner J, Gkoutos GV, Harper L, He M, Hoehndorf R, Jaiswal P, Kalberer SR, Lloyd JP, Meinke D, Menda N, Moore L, Nelson RT, Pujar A, Lawrence CJ, Huala E. An ontology approach to comparative phenomics in plants. PLANT METHODS 2015; 11:10. [PMID: 25774204 PMCID: PMC4359497 DOI: 10.1186/s13007-015-0053-y] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 02/05/2015] [Indexed: 05/29/2023]
Abstract
BACKGROUND Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.
Collapse
Affiliation(s)
- Anika Oellrich
- />Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA UK
| | - Ramona L Walls
- />iPlant Collaborative, University of Arizona, 1657 E. Helen St., Tucson, Arizona 85721 USA
| | - Ethalinda KS Cannon
- />Department of Electrical and Computer Engineering Iowa State University, 1018 Crop Informatics Lab, Ames, Iowa 50011 USA
| | - Steven B Cannon
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
- />Department of Agronomy, Agronomy Hall, Iowa State University, Ames, IA 50010 USA
| | - Laurel Cooper
- />Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA
| | - Jack Gardiner
- />Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA
| | - Georgios V Gkoutos
- />Department of Computer Science, Aberystwyth University, Llandinam Building, Aberystwyth, SY23 3DB UK
| | - Lisa Harper
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
| | - Mingze He
- />Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA
| | - Robert Hoehndorf
- />Computer, Electrical and Mathematical Sciences & Engineering Division and Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 King Abdullah University of Science and Technology, P.O. Box 2882, Thuwal, 23955-6900 Kingdom of Saudi Arabia
| | - Pankaj Jaiswal
- />Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA
| | - Scott R Kalberer
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
| | - John P Lloyd
- />Department of Plant Biology, Michigan State University, 220 Trowbridge Rd, East Lansing, MI 48824 USA
| | - David Meinke
- />Department of Botany, Oklahoma State University, 301 Physical Sciences, Stillwater, OK 74078 USA
| | - Naama Menda
- />Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 14853 USA
| | - Laura Moore
- />Department of Botany and Plant Pathology, 2082 Cordley Hall, Oregon State University, Corvallis, OR 97331 USA
| | - Rex T Nelson
- />USDA-ARS Corn Insects and Crop Genetics Research Unit, Iowa State University, Crop Genome Informatics Lab, Iowa State University, Ames, IA 50011 USA
| | - Anuradha Pujar
- />Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 14853 USA
| | - Carolyn J Lawrence
- />Department of Agronomy, Agronomy Hall, Iowa State University, Ames, IA 50010 USA
- />Department of Genetics, Development and Cell Biology, Roy J Carver Co-Laboratory, Iowa State University, Ames, IA 50010 USA
| | - Eva Huala
- />Phoenix Bioinformatics, 643 Bair Island Rd Suite 403, Redwood City, CA 94063 USA
| |
Collapse
|
37
|
Thacker RW, Díaz MC, Kerner A, Vignes-Lebbe R, Segerdell E, Haendel MA, Mungall CJ. The Porifera Ontology (PORO): enhancing sponge systematics with an anatomy ontology. J Biomed Semantics 2014; 5:39. [PMID: 25276334 PMCID: PMC4177528 DOI: 10.1186/2041-1480-5-39] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 07/22/2014] [Indexed: 12/31/2022] Open
Abstract
Background Porifera (sponges) are ancient basal metazoans that lack organs. They provide insight into key evolutionary transitions, such as the emergence of multicellularity and the nervous system. In addition, their ability to synthesize unusual compounds offers potential biotechnical applications. However, much of the knowledge of these organisms has not previously been codified in a machine-readable way using modern web standards. Results The Porifera Ontology is intended as a standardized coding system for sponge anatomical features currently used in systematics. The ontology is available from http://purl.obolibrary.org/obo/poro.owl, or from the project homepage http://porifera-ontology.googlecode.com/. The version referred to in this manuscript is permanently available from http://purl.obolibrary.org/obo/poro/releases/2014-03-06/. Conclusions By standardizing character representations, we hope to facilitate more rapid description and identification of sponge taxa, to allow integration with other evolutionary database systems, and to perform character mapping across the major clades of sponges to better understand the evolution of morphological features. Future applications of the ontology will focus on creating (1) ontology-based species descriptions; (2) taxonomic keys that use the nested terms of the ontology to more quickly facilitate species identifications; and (3) methods to map anatomical characters onto molecular phylogenies of sponges. In addition to modern taxa, the ontology is being extended to include features of fossil taxa.
Collapse
Affiliation(s)
- Robert W Thacker
- Department of Biology, University of Alabama at Birmingham, Birmingham, USA
| | | | - Adeline Kerner
- CR2P, UMR 7207 CNRS-MNHN-UPMC, Département Histoire de la Terre, Muséum National d'Histoire Naturelle, Bâtiment de Géologie, CP48, 57 rue Cuvier, 75005 Paris, France
| | - Régine Vignes-Lebbe
- CR2P, UMR 7207 CNRS-MNHN-UPMC, Département Histoire de la Terre, Muséum National d'Histoire Naturelle, Bâtiment de Géologie, CP48, 57 rue Cuvier, 75005 Paris, France
| | - Erik Segerdell
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, USA
| | - Melissa A Haendel
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, USA
| | | |
Collapse
|
38
|
Dahdul WM, Cui H, Mabee PM, Mungall CJ, Osumi-Sutherland D, Walls RL, Haendel MA. Nose to tail, roots to shoots: spatial descriptors for phenotypic diversity in the Biological Spatial Ontology. J Biomed Semantics 2014; 5:34. [PMID: 25140222 PMCID: PMC4137724 DOI: 10.1186/2041-1480-5-34] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 06/16/2014] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Spatial terminology is used in anatomy to indicate precise, relative positions of structures in an organism. While these terms are often standardized within specific fields of biology, they can differ dramatically across taxa. Such differences in usage can impair our ability to unambiguously refer to anatomical position when comparing anatomy or phenotypes across species. We developed the Biological Spatial Ontology (BSPO) to standardize the description of spatial and topological relationships across taxa to enable the discovery of comparable phenotypes. RESULTS BSPO currently contains 146 classes and 58 relations representing anatomical axes, gradients, regions, planes, sides, and surfaces. These concepts can be used at multiple biological scales and in a diversity of taxa, including plants, animals and fungi. The BSPO is used to provide a source of anatomical location descriptors for logically defining anatomical entity classes in anatomy ontologies. Spatial reasoning is further enhanced in anatomy ontologies by integrating spatial relations such as dorsal_to into class descriptions (e.g., 'dorsolateral placode' dorsal_to some 'epibranchial placode'). CONCLUSIONS The BSPO is currently used by projects that require standardized anatomical descriptors for phenotype annotation and ontology integration across a diversity of taxa. Anatomical location classes are also useful for describing phenotypic differences, such as morphological variation in position of structures resulting from evolution within and across species.
Collapse
Affiliation(s)
- Wasila M Dahdul
- Department of Biology, University of South Dakota, Vermillion, SD, USA
- National Evolutionary Synthesis Center, Durham, NC, USA
| | - Hong Cui
- School of Information Resource and Library Science, University of Arizona, Tucson, AZ, USA
| | - Paula M Mabee
- Department of Biology, University of South Dakota, Vermillion, SD, USA
| | | | | | - Ramona L Walls
- The iPlant Collaborative, Bio5 Institute, University of Arizona, Tucson, AZ, USA
| | - Melissa A Haendel
- Library and Department of Medical Informatics & Epidemiology, Oregon Health & Science University, Portland, OR, USA
| |
Collapse
|
39
|
Hiss M, Laule O, Meskauskiene RM, Arif MA, Decker EL, Erxleben A, Frank W, Hanke ST, Lang D, Martin A, Neu C, Reski R, Richardt S, Schallenberg-Rüdinger M, Szövényi P, Tiko T, Wiedemann G, Wolf L, Zimmermann P, Rensing SA. Large-scale gene expression profiling data for the model moss Physcomitrella patens aid understanding of developmental progression, culture and stress conditions. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2014; 79:530-9. [PMID: 24889180 DOI: 10.1111/tpj.12572] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Revised: 05/22/2014] [Accepted: 05/27/2014] [Indexed: 05/21/2023]
Abstract
The moss Physcomitrella patens is an important model organism for studying plant evolution, development, physiology and biotechnology. Here we have generated microarray gene expression data covering the principal developmental stages, culture forms and some environmental/stress conditions. Example analyses of developmental stages and growth conditions as well as abiotic stress treatments demonstrate that (i) growth stage is dominant over culture conditions, (ii) liquid culture is not stressful for the plant, (iii) low pH might aid protoplastation by reduced expression of cell wall structure genes, (iv) largely the same gene pool mediates response to dehydration and rehydration, and (v) AP2/EREBP transcription factors play important roles in stress response reactions. With regard to the AP2 gene family, phylogenetic analysis and comparison with Arabidopsis thaliana shows commonalities as well as uniquely expressed family members under drought, light perturbations and protoplastation. Gene expression profiles for P. patens are available for the scientific community via the easy-to-use tool at https://www.genevestigator.com. By providing large-scale expression profiles, the usability of this model organism is further enhanced, for example by enabling selection of control genes for quantitative real-time PCR. Now, gene expression levels across a broad range of conditions can be accessed online for P. patens.
Collapse
Affiliation(s)
- Manuel Hiss
- Plant Cell Biology, Faculty of Biology, University of Marburg, Karl-von-Frisch-Strasse 8, 35043, Marburg, Germany; Faculty of Biology, University of Freiburg, Schänzlestrasse 1, 79104, Freiburg, Germany; FRISYS Freiburg Initiative for Systems Biology, University of Freiburg, 79104, Freiburg, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Ramírez MJ, Michalik P. Calculating structural complexity in phylogenies using ancestral ontologies. Cladistics 2014; 30:635-649. [DOI: 10.1111/cla.12075] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2014] [Indexed: 01/29/2023] Open
Affiliation(s)
- Martín J. Ramírez
- Museo Argentino de Ciencias Naturales “Bernardino Rivadavia” - CONICET; Av. Angel Gallardo 470 C1405DJR Buenos Aires Argentina
| | - Peter Michalik
- Zoologisches Institut und Museum; Ernst-Moritz-Arndt-Universität; J.-S.-Bach-Str. 11/12 D-17489 Greifswald Germany
| |
Collapse
|
41
|
Alexandersson E, Jacobson D, Vivier MA, Weckwerth W, Andreasson E. Field-omics-understanding large-scale molecular data from field crops. FRONTIERS IN PLANT SCIENCE 2014; 5:286. [PMID: 24999347 PMCID: PMC4064663 DOI: 10.3389/fpls.2014.00286] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 06/02/2014] [Indexed: 05/19/2023]
Abstract
The recent advances in gene expression analysis as well as protein and metabolite quantification enable genome-scale capturing of complex biological processes at the molecular level in crop field trials. This opens up new possibilities for understanding the molecular and environmental complexity of field-based systems and thus shedding light on the black box between genotype and environment, which in agriculture always is influenced by a multi-stress environment and includes management interventions. Nevertheless, combining different types of data obtained from the field and making biological sense out of large datasets remain challenging. Here we highlight the need to create a cross-disciplinary platform for innovative experimental design, sampling and subsequent analysis of large-scale molecular data obtained in field trials. For these reasons we put forward the term field-omics: "Field-omics strives to couple information from genomes, transcriptomes, proteomes, metabolomes and metagenomes to the long-established practice in crop science of conducting field trials as well as to adapt current strategies for recording and analysing field data to facilitate integration with '-omics' data."
Collapse
Affiliation(s)
- Erik Alexandersson
- Department of Plant Protection Biology, Swedish University of Agricultural SciencesAlnarp, Sweden
- *Correspondence: Erik Alexandersson, Department of Plant Protection Biology, Swedish University of Agricultural Sciences, PO Box 102, SE-23053 Alnarp, Sweden e-mail:
| | - Dan Jacobson
- Department of Viticulture and Oenology, Institute for Wine Biotechnology, Stellenbosch UniversityStellenbosch, South Africa
| | - Melané A. Vivier
- Department of Viticulture and Oenology, Institute for Wine Biotechnology, Stellenbosch UniversityStellenbosch, South Africa
| | - Wolfram Weckwerth
- Department of Ecogenomics and Systems Biology, University of ViennaVienna, Austria
| | - Erik Andreasson
- Department of Plant Protection Biology, Swedish University of Agricultural SciencesAlnarp, Sweden
| |
Collapse
|
42
|
Akiyama K, Kurotani A, Iida K, Kuromori T, Shinozaki K, Sakurai T. RARGE II: an integrated phenotype database of Arabidopsis mutant traits using a controlled vocabulary. PLANT & CELL PHYSIOLOGY 2014; 55:e4. [PMID: 24272250 PMCID: PMC3894705 DOI: 10.1093/pcp/pct165] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Accepted: 11/05/2013] [Indexed: 05/20/2023]
Abstract
Arabidopsis thaliana is one of the most popular experimental plants. However, only 40% of its genes have at least one experimental Gene Ontology (GO) annotation assigned. Systematic observation of mutant phenotypes is an important technique for elucidating gene functions. Indeed, several large-scale phenotypic analyses have been performed and have generated phenotypic data sets from many Arabidopsis mutant lines and overexpressing lines, which are freely available online. Since each Arabidopsis mutant line database uses individual phenotype expression, the differences in the structured term sets used by each database make it difficult to compare data sets and make it impossible to search across databases. Therefore, we obtained publicly available information for a total of 66,209 Arabidopsis mutant lines, including loss-of-function (RATM and TARAPPER) and gain-of-function (AtFOX and OsFOX) lines, and integrated the phenotype data by mapping the descriptions onto Plant Ontology (PO) and Phenotypic Quality Ontology (PATO) terms. This approach made it possible to manage the four different phenotype databases as one large data set. Here, we report a publicly accessible web-based database, the RIKEN Arabidopsis Genome Encyclopedia II (RARGE II; http://rarge-v2.psc.riken.jp/), in which all of the data described in this study are included. Using the database, we demonstrated consistency (in terms of protein function) with a previous study and identified the presumed function of an unknown gene. We provide examples of AT1G21600, which is a subunit in the plastid-encoded RNA polymerase complex, and AT5G56980, which is related to the jasmonic acid signaling pathway.
Collapse
Affiliation(s)
- Kenji Akiyama
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| | - Atsushi Kurotani
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| | - Kei Iida
- Graduate School of Medicine, Kyoto University, Kyoto, Kyoto, 606-8501 Japan
| | - Takashi Kuromori
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| | - Kazuo Shinozaki
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
| | - Tetsuya Sakurai
- RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045 Japan
- *Corresponding author: E-mail, ; Fax, +81-45-503-9665
| |
Collapse
|
43
|
Gour P, Garg P, Jain R, Joseph SV, Tyagi AK, Raghuvanshi S. Manually curated database of rice proteins. Nucleic Acids Res 2013; 42:D1214-21. [PMID: 24214963 PMCID: PMC3964970 DOI: 10.1093/nar/gkt1072] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
'Manually Curated Database of Rice Proteins' (MCDRP) available at http://www.genomeindia.org/biocuration is a unique curated database based on published experimental data. Semantic integration of scientific data is essential to gain a higher level of understanding of biological systems. Since the majority of scientific data is available as published literature, text mining is an essential step before the data can be integrated and made available for computer-based search in various databases. However, text mining is a tedious exercise and thus, there is a large gap in the data available in curated databases and published literature. Moreover, data in an experiment can be perceived from several perspectives, which may not reflect in the text-based curation. In order to address such issues, we have demonstrated the feasibility of digitizing the experimental data itself by creating a database on rice proteins based on in-house developed data curation models. Using these models data of individual experiments have been digitized with the help of universal ontologies. Currently, the database has data for over 1800 rice proteins curated from >4000 different experiments of over 400 research articles. Since every aspect of the experiment such as gene name, plant type, tissue and developmental stage has been digitized, experimental data can be rapidly accessed and integrated.
Collapse
Affiliation(s)
- Pratibha Gour
- Department of Plant Molecular Biology, University of Delhi South Campus, Benito Juarez Road, New Delhi - 110021, India
| | | | | | | | | | | |
Collapse
|
44
|
Fuellen G, Boerries M, Busch H, de Grey A, Hahn U, Hiller T, Hoeflich A, Jansen L, Janssens GE, Kaleta C, Meinema AC, Schäuble S, Simm A, Schofield PN, Smith B, Sühnel J, Vera J, Wagner W, Wönne EC, Wuttke D. In silico approaches and the role of ontologies in aging research. Rejuvenation Res 2013; 16:540-6. [PMID: 24188080 DOI: 10.1089/rej.2013.1517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The 2013 Rostock Symposium on Systems Biology and Bioinformatics in Aging Research was again dedicated to dissecting the aging process using in silico means. A particular focus was on ontologies, because these are a key technology to systematically integrate heterogeneous information about the aging process. Related topics were databases and data integration. Other talks tackled modeling issues and applications, the latter including talks focused on marker development and cellular stress as well as on diseases, in particular on diseases of kidney and skin.
Collapse
Affiliation(s)
- Georg Fuellen
- 1 Institute for Biostatistics and Informatics in Medicine and Aging Research, Department of Medicine, Rostock University , Rostock, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Balhoff JP, Mikó I, Yoder MJ, Mullins PL, Deans AR. A semantic model for species description applied to the ensign wasps (hymenoptera: evaniidae) of New Caledonia. Syst Biol 2013; 62:639-59. [PMID: 23652347 PMCID: PMC3739881 DOI: 10.1093/sysbio/syt028] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Revised: 02/14/2013] [Accepted: 04/23/2013] [Indexed: 12/01/2022] Open
Abstract
Taxonomic descriptions are unparalleled sources of knowledge of life's phenotypic diversity. As natural language prose, these data sets are largely refractory to computation and integration with other sources of phenotypic data. By formalizing taxonomic descriptions using ontology-based semantic representation, we aim to increase the reusability and computability of taxonomists' primary data. Here, we present a revision of the ensign wasp (Hymenoptera: Evaniidae) fauna of New Caledonia using this new model for species description. Descriptive matrices, specimen data, and taxonomic nomenclature are gathered in a unified Web-based application, mx, then exported as both traditional taxonomic treatments and semantic statements using the OWL Web Ontology Language. Character:character-state combinations are then annotated following the entity-quality phenotype model, originally developed to represent mutant model organism phenotype data; concepts of anatomy are drawn from the Hymenoptera Anatomy Ontology and linked to phenotype descriptors from the Phenotypic Quality Ontology. The resulting set of semantic statements is provided in Resource Description Framework format. Applying the model to real data, that is, specimens, taxonomic names, diagnoses, descriptions, and redescriptions, provides us with a foundation to discuss limitations and potential benefits such as automated data integration and reasoner-driven queries. Four species of ensign wasp are now known to occur in New Caledonia: Szepligetella levipetiolata, Szepligetella deercreeki Deans and Mikó sp. nov., Szepligetella irwini Deans and Mikó sp. nov., and the nearly cosmopolitan Evania appendigaster. A fifth species, Szepligetella sericea, including Szepligetella impressa, syn. nov., has not yet been collected in New Caledonia but can be found on islands throughout the Pacific and so is included in the diagnostic key.
Collapse
Affiliation(s)
- James P. Balhoff
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - István Mikó
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Matthew J. Yoder
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Patricia L. Mullins
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Andrew R. Deans
- National Evolutionary Synthesis Center, Durham, NC 27705, USA; Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA; Insect Museum, Department of Entomology, North Carolina State University, Box 7613, Raleigh, NC 27695, USA; Department of Entomology, Pennsylvania State University, 501 ASI Building, University Park, PA 16802, USA; Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, MC 652 Champaign, IL 61820, USA; and Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
46
|
Abstract
In crop genetics and breeding research, phenotypic data are collected for each plant genotype, often in multiple locations and field conditions, in search of the genomic regions that confer improved traits. But what is happening to all of these phenotypic data? Currently, virtually none of the data generated from the hundreds of phenotypic studies conducted each year are being made publically available as raw data; thus there is little we can learn from past experience when making decisions about how to breed better crops for the future. This ongoing loss of phenotypic information, particularly about crop productivity, must be stopped if we are to meet the considerable challenge of increasing food production sufficiently to meet the needs of a growing world population. Here I present a road map for developing and implementing an information network to share data on crop plant phenotypes.
Collapse
Affiliation(s)
- Dani Zamir
- Faculty of Agriculture, The Hebrew University of Jerusalem, Rehovot, Israel.
| |
Collapse
|
47
|
The 3rd DBCLS BioHackathon: improving life science data integration with Semantic Web technologies. J Biomed Semantics 2013; 4:6. [PMID: 23398680 PMCID: PMC3598643 DOI: 10.1186/2041-1480-4-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2012] [Accepted: 02/05/2013] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND BioHackathon 2010 was the third in a series of meetings hosted by the Database Center for Life Sciences (DBCLS) in Tokyo, Japan. The overall goal of the BioHackathon series is to improve the quality and accessibility of life science research data on the Web by bringing together representatives from public databases, analytical tool providers, and cyber-infrastructure researchers to jointly tackle important challenges in the area of in silico biological research. RESULTS The theme of BioHackathon 2010 was the 'Semantic Web', and all attendees gathered with the shared goal of producing Semantic Web data from their respective resources, and/or consuming or interacting those data using their tools and interfaces. We discussed on topics including guidelines for designing semantic data and interoperability of resources. We consequently developed tools and clients for analysis and visualization. CONCLUSION We provide a meeting report from BioHackathon 2010, in which we describe the discussions, decisions, and breakthroughs made as we moved towards compliance with Semantic Web technologies - from source provider, through middleware, to the end-consumer.
Collapse
|
48
|
Cooper L, Walls RL, Elser J, Gandolfo MA, Stevenson DW, Smith B, Preece J, Athreya B, Mungall CJ, Rensing S, Hiss M, Lang D, Reski R, Berardini TZ, Li D, Huala E, Schaeffer M, Menda N, Arnaud E, Shrestha R, Yamazaki Y, Jaiswal P. The plant ontology as a tool for comparative plant anatomy and genomic analyses. PLANT & CELL PHYSIOLOGY 2013; 54:e1. [PMID: 23220694 PMCID: PMC3583023 DOI: 10.1093/pcp/pcs163] [Citation(s) in RCA: 87] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
The Plant Ontology (PO; http://www.plantontology.org/) is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to >110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.
Collapse
Affiliation(s)
- Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to this work
- These authors contributed equally to the development of the Plant Ontology
| | - Ramona L. Walls
- New York Botanical Garden, 2900 Southern Blvd., Bronx, NY 10458-5126, USA
- These authors contributed equally to this work
- These authors contributed equally to the development of the Plant Ontology
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Maria A. Gandolfo
- L.H. Bailey Hortorium, Department of Plant Biology, Cornell University, 412 Mann Library Building, Ithaca, NY 14853, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Dennis W. Stevenson
- New York Botanical Garden, 2900 Southern Blvd., Bronx, NY 10458-5126, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Barry Smith
- Department of Philosophy, University at Buffalo, 126 Park Hall, Buffalo, NY 14260, USA
- These authors contributed equally to the development of the Plant Ontology
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
| | - Balaji Athreya
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
| | - Christopher J. Mungall
- Berkeley Bioinformatics Open-Source Projects, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Mailstop 64-121, Berkeley, CA 94720, USA
| | - Stefan Rensing
- Faculty of Biology and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Manuel Hiss
- Faculty of Biology and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Schänzlestr. 1, D-79104 Freiburg, Germany
| | - Daniel Lang
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Germany
| | - Ralf Reski
- Plant Biotechnology, Faculty of Biology, University of Freiburg, Germany
- FRIAS - Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany
| | - Tanya Z. Berardini
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Donghui Li
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Eva Huala
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA 94305, USA
| | - Mary Schaeffer
- Agriculture Research Services, United States Department of Agriculture, Columbia, MO 65211, USA
- Division of Plant Sciences, Department of Agronomy, University of Missouri, Columbia, MO 65211, USA
| | - Naama Menda
- Boyce Thompson Institute for Plant Research, 533 Tower Road, Ithaca, NY 148533, USA
| | - Elizabeth Arnaud
- Bioversity International, via dei Tre Denari, 174/a, Maccarese, Rome, Italy
| | - Rosemary Shrestha
- Genetic Resources Program, Centro Internacional de Mejoramiento de Maiz y Trigo (CIMMYT), Apdo. Postal 6-641, 06600 Mexico, D.F., Mexico
| | - Yukiko Yamazaki
- Center for Genetic Resource Information, National Institute of Genetics, Mishima, Shizuoka, 411-8540 Japan
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, 2082 Cordley Hall, Corvallis, OR 97331-2902, USA
- These authors contributed equally to the development of the Plant Ontology
- *Corresponding author: E-mail,: ; Fax, +1-541-737-3573
| |
Collapse
|
49
|
Abstract
The Gene Ontology (GO) Consortium (GOC, http://www.geneontology.org) is a community-based bioinformatics resource that classifies gene product function through the use of structured, controlled vocabularies. Over the past year, the GOC has implemented several processes to increase the quantity, quality and specificity of GO annotations. First, the number of manual, literature-based annotations has grown at an increasing rate. Second, as a result of a new ‘phylogenetic annotation’ process, manually reviewed, homology-based annotations are becoming available for a broad range of species. Third, the quality of GO annotations has been improved through a streamlined process for, and automated quality checks of, GO annotations deposited by different annotation groups. Fourth, the consistency and correctness of the ontology itself has increased by using automated reasoning tools. Finally, the GO has been expanded not only to cover new areas of biology through focused interaction with experts, but also to capture greater specificity in all areas of the ontology using tools for adding new combinatorial terms. The GOC works closely with other ontology developers to support integrated use of terminologies. The GOC supports its user community through the use of e-mail lists, social media and web-based resources.
Collapse
|