1
|
Papoutsoglou EA, Athanasiadis IN, Visser RGF, Finkers R. The benefits and struggles of FAIR data: the case of reusing plant phenotyping data. Sci Data 2023; 10:457. [PMID: 37443110 PMCID: PMC10345100 DOI: 10.1038/s41597-023-02364-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Accepted: 07/03/2023] [Indexed: 07/15/2023] Open
Abstract
Plant phenotyping experiments are conducted under a variety of experimental parameters and settings for diverse purposes. The data they produce is heterogeneous, complicated, often poorly documented and, as a result, difficult to reuse. Meeting societal needs (nutrition, crop adaptation and stability) requires more efficient methods toward data integration and reuse. In this work, we examine what "making data FAIR" entails, and investigate the benefits and the struggles not only of reusing FAIR data, but also making data FAIR using genotype by environment and QTL by environment interactions for developmental traits in potato as a case study. We assume the role of a scientist discovering a phenotypic dataset on a FAIR data point, verifying the existence of related datasets with environmental data, acquiring both and integrating them. We report and discuss the challenges and the potential for reusability and reproducibility of FAIRifying existing datasets, using metadata standards such as MIAPPE, that were encountered in this process.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
- Taxonic B.V., De Meern, The Netherlands
| | - Ioannis N Athanasiadis
- Wageningen Data Competence Center and Geo-Information Science & Remote Sensing Lab, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard G F Visser
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands
| | - Richard Finkers
- Plant Breeding, Wageningen University and Research, Wageningen, The Netherlands.
- GenNovation B.V., Wageningen, The Netherlands.
| |
Collapse
|
2
|
Singh G, Papoutsoglou EA, Keijts-Lalleman F, Vencheva B, Rice M, Visser RG, Bachem CW, Finkers R. Extracting knowledge networks from plant scientific literature: potato tuber flesh color as an exemplary trait. BMC Plant Biol 2021; 21:198. [PMID: 33894758 PMCID: PMC8070292 DOI: 10.1186/s12870-021-02943-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 03/29/2021] [Indexed: 06/12/2023]
Abstract
BACKGROUND Scientific literature carries a wealth of information crucial for research, but only a fraction of it is present as structured information in databases and therefore can be analyzed using traditional data analysis tools. Natural language processing (NLP) is often and successfully employed to support humans by distilling relevant information from large corpora of free text and structuring it in a way that lends itself to further computational analyses. For this pilot, we developed a pipeline that uses NLP on biological literature to produce knowledge networks. We focused on the flesh color of potato, a well-studied trait with known associations, and we investigated whether these knowledge networks can assist us in formulating new hypotheses on the underlying biological processes. RESULTS We trained an NLP model based on a manually annotated corpus of 34 full-text potato articles, to recognize relevant biological entities and relationships between them in text (genes, proteins, metabolites and traits). This model detected the number of biological entities with a precision of 97.65% and a recall of 88.91% on the training set. We conducted a time series analysis on 4023 PubMed abstract of plant genetics-based articles which focus on 4 major Solanaceous crops (tomato, potato, eggplant and capsicum), to determine that the networks contained both previously known and contemporaneously unknown leads to subsequently discovered biological phenomena relating to flesh color. A novel time-based analysis of these networks indicates a connection between our trait and a candidate gene (zeaxanthin epoxidase) already two years prior to explicit statements of that connection in the literature. CONCLUSIONS Our time-based analysis indicates that network-assisted hypothesis generation shows promise for knowledge discovery, data integration and hypothesis generation in scientific research.
Collapse
Affiliation(s)
- Gurnoor Singh
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| | | | | | | | - Mark Rice
- IBM Netherlands, Amsterdam, The Netherlands
| | - Richard G.F. Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| | - Christian W.B. Bachem
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ The Netherlands
| |
Collapse
|
3
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. New Phytol 2020. [PMID: 32171029 DOI: 10.15454/1yxvzv] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|
4
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek‐Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte M, Michotey C, Oppermann M, Ostler R, Poorter H, Ramírez‐Gonzalez R, Ramšak Ž, Reif JC, Rocca‐Serra P, Sansone S, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam‐Blondon A, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. New Phytol 2020; 227:260-273. [PMID: 32171029 PMCID: PMC7317793 DOI: 10.1111/nph.16544] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 02/24/2020] [Indexed: 05/21/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
|
5
|
Papoutsoglou EA, Faria D, Arend D, Arnaud E, Athanasiadis IN, Chaves I, Coppens F, Cornut G, Costa BV, Ćwiek-Kupczyńska H, Droesbeke B, Finkers R, Gruden K, Junker A, King GJ, Krajewski P, Lange M, Laporte MA, Michotey C, Oppermann M, Ostler R, Poorter H, Ramı Rez-Gonzalez R, Ramšak Ž, Reif JC, Rocca-Serra P, Sansone SA, Scholz U, Tardieu F, Uauy C, Usadel B, Visser RGF, Weise S, Kersey PJ, Miguel CM, Adam-Blondon AF, Pommier C. Enabling reusability of plant phenomic datasets with MIAPPE 1.1. New Phytol 2020. [PMID: 32171029 DOI: 10.15454/ah6u4a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Enabling data reuse and knowledge discovery is increasingly critical in modern science, and requires an effort towards standardising data publication practices. This is particularly challenging in the plant phenotyping domain, due to its complexity and heterogeneity. We have produced the MIAPPE 1.1 release, which enhances the existing MIAPPE standard in coverage, to support perennial plants, in structure, through an explicit data model, and in clarity, through definitions and examples. We evaluated MIAPPE 1.1 by using it to express several heterogeneous phenotyping experiments in a range of different formats, to demonstrate its applicability and the interoperability between the various implementations. Furthermore, the extended coverage is demonstrated by the fact that one of the datasets could not have been described under MIAPPE 1.0. MIAPPE 1.1 marks a major step towards enabling plant phenotyping data reusability, thanks to its extended coverage, and especially the formalisation of its data model, which facilitates its implementation in different formats. Community feedback has been critical to this development, and will be a key part of ensuring adoption of the standard.
Collapse
Affiliation(s)
- Evangelia A Papoutsoglou
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Daniel Faria
- BioData.pt, Instituto Gulbenkian de Ciência, 2780-156, Oeiras, Portugal
- INESC-ID, 1000-029, Lisboa, Portugal
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Elizabeth Arnaud
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Ioannis N Athanasiadis
- Geo-Information Science and Remote Sensing Laboratory, Wageningen University, Droevendaalsesteeg 3, Wageningen, 6708PB, the Netherlands
| | - Inês Chaves
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- Instituto de Biologia Experimental e Tecnológica (iBET), 2780-157, Oeiras, Portugal
| | - Frederik Coppens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | | | - Bruno V Costa
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | - Hanna Ćwiek-Kupczyńska
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Bert Droesbeke
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 71, Ghent, 9052, Belgium
- VIB Center for Plant Systems Biology, Technologiepark 71, Ghent, 9052, Belgium
| | - Richard Finkers
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Kristina Gruden
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Astrid Junker
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2577, Australia
| | - Paweł Krajewski
- Institute of Plant Genetics, Polish Academy of Sciences, ul. Strzeszyńska 34, 60-479, Poznań, Poland
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Marie-Angélique Laporte
- Bioversity International, Parc Scientifique Agropolis II, Montpellier Cedex 5, 34397, France
| | - Célia Michotey
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Richard Ostler
- Computational and Analytical Sciences, Rothamsted Research, Harpenden, AL5 2JQ, UK
| | - Hendrik Poorter
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Department of Biological Sciences, Macquarie University, North Ryde, NSW 2109, Australia
| | | | - Živa Ramšak
- Department of Biotechnology and Systems Biology, National Institute of Biology, SI1000, Ljubljana, Slovenia
| | - Jochen C Reif
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - Philippe Rocca-Serra
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Susanna-Assunta Sansone
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, 7 Keble Road, Oxford, OX1 3QG, UK
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | - François Tardieu
- INRA, Laboratoire d'Ecophysiologie des Plantes sous Stress Environnementaux, UMR759, Montpellier, 34060, France
| | - Cristobal Uauy
- Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich, NR4 7UH, UK
| | - Björn Usadel
- Plant Sciences (IBG-2), Forschungszentrum Jülich GmbH, D-52425, Jülich, Germany
- Institute for Biology I, BioSC, RWTH Aachen University, Worringer Weg 3, 52074, Aachen, Germany
| | - Richard G F Visser
- Plant Breeding, Wageningen University & Research, PO Box 386, Wageningen, 6700AJ, the Netherlands
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Seeland, Germany
| | | | - Célia M Miguel
- Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa (ITQB NOVA) Avenida da República, 2780-157, Oeiras, Portugal
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, 1749-016, Portugal
| | | | - Cyril Pommier
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France
| |
Collapse
|