1
|
Payumo J, Bello-Bravo J, Chennuru V, Mercene SA, Yim C, Duynslager L, Kanamarlapudi B, Posos-Parra O, Payumo S, Mota-Sanchez D. An Assessment Model for Agricultural Databases: The Arthropod Pesticide Resistance Database as a Case Study. INSECTS 2024; 15:747. [PMID: 39452324 PMCID: PMC11509053 DOI: 10.3390/insects15100747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Revised: 09/19/2024] [Accepted: 09/24/2024] [Indexed: 10/26/2024]
Abstract
This paper presents a multi-method approach for evaluating the utility and impact of agricultural databases in the context of the rapidly expanding digital economy. Focusing on the Arthropod Pesticide Resistance Database, one of the most comprehensive global resources on arthropod pesticide resistance, we offer a framework for assessing the effectiveness of agricultural databases. Our approach provides practical guidance for developers, users, evaluators, and funders on how to measure the impact of these digital tools, using relevant metrics and data to validate their contributions. Additionally, we introduce an index-based method that evaluates impact across multiple dimensions, including data usage, accessibility, inclusivity, knowledge generation, innovation, research and policy development, and collaboration. The detailed methodology serves as both a reference and a model for evaluating the impact of other agricultural databases, ensuring they effectively support decision-making and foster innovation in the agricultural sector.
Collapse
Affiliation(s)
- Jane Payumo
- Research Evaluation and Data Analytics, MSU AgBioResearch, Michigan State University, East Lansing, MI 48824, USA
| | - Julia Bello-Bravo
- Department of Agricultural Sciences Education and Communication, Purdue University, Lafayette, IN 47907, USA;
| | - Vineeth Chennuru
- Research Evaluation and Data Analytics, MSU AgBioResearch, Michigan State University, East Lansing, MI 48824, USA
| | - Solo Arman Mercene
- Department of Entomology, Michigan State University, East Lansing, MI 48824, USA
| | - Chaeyeon Yim
- Research Evaluation and Data Analytics, MSU AgBioResearch, Michigan State University, East Lansing, MI 48824, USA
| | - Lee Duynslager
- Department of Entomology, Michigan State University, East Lansing, MI 48824, USA
| | - Bhanu Kanamarlapudi
- Research Evaluation and Data Analytics, MSU AgBioResearch, Michigan State University, East Lansing, MI 48824, USA
| | - Omar Posos-Parra
- Department of Entomology, Michigan State University, East Lansing, MI 48824, USA
| | - Sky Payumo
- Department of Entomology, Michigan State University, East Lansing, MI 48824, USA
| | - David Mota-Sanchez
- Department of Entomology, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
2
|
Çelik Ş, Tutar H, Gönülal E, Er H. Prediction of fresh herbage yield using data mining techniques with limited plant quality parameters. Sci Rep 2024; 14:21396. [PMID: 39271726 PMCID: PMC11399138 DOI: 10.1038/s41598-024-72746-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 09/10/2024] [Indexed: 09/15/2024] Open
Abstract
The purpose of this study was to ascertain the fresh herbage yield, fertilizer dosage, and plant characteristics of the Sorghum-Sudangrass hybrid grown in arid and semi-arid regions, as well as their interrelationships. For this reason, data from the Sorghum-Sudangrass hybrid were used to assess the predictive performance of several data mining techniques, including CHAID, CART, MARS, and Bagging MARS. Plant traits were measured in Konya and Sanliurfa during 2021 and 2022. The descriptive statistical values were calculated as follows: plant height 306.27 cm, stem diameter 9.47 mm, fresh herbage yield 10852.51 kg/da, crude protein ratio 9.66%, acid detergent fiber 33.39%, neutral detergent fiber 51.85%, acid detergent lignin 9.76%, dry matter digestibility 62.88%, dry matter intake 2.34%, and relative feed value 114.68 (average values). The predictive capacities of the fitted models were assessed using model fit statistics such as the coefficient of determination (R²), adjusted R², root mean square error (RMSE), mean absolute percentage error (MAPE), standard deviation ratio (SD ratio), and Akaike Information Criterion (AIC). With the lowest values for RMSE, MAPE, SD ratio, and AIC (246, 1.926, 0.085, and 845, respectively), and the highest R² value (0.993) and adjusted R² value (0.989), the MARS algorithm was determined to be the best model for characterizing fresh herbage yield. As a solid alternative to other data mining techniques, the MARS algorithm was shown to be the most appropriate model for forecasting fresh herbage production.
Collapse
Affiliation(s)
- Şenol Çelik
- Biometry and Genetic Unit, Department of Animal Science, Faculty of Agriculture, Bingol University, 12000, Bingöl, Turkey.
| | - Halit Tutar
- Department of Field Crops, Faculty of Agriculture, Bingol University, 12000, Bingöl, Turkey
| | - Erdal Gönülal
- Bahri Dagdas International Agriculture Research Institute, 42000, Konya, Turkey
| | - Hasan Er
- Department of Biosystems Engineering, Faculty of Agriculture, Bingol University, 12000, Bingöl, Turkey
| |
Collapse
|
3
|
Reiser L, Bakker E, Subramaniam S, Chen X, Sawant S, Khosa K, Prithvi T, Berardini TZ. The Arabidopsis Information Resource in 2024. Genetics 2024; 227:iyae027. [PMID: 38457127 PMCID: PMC11075553 DOI: 10.1093/genetics/iyae027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 02/07/2024] [Indexed: 03/09/2024] Open
Abstract
Since 1999, The Arabidopsis Information Resource (www.arabidopsis.org) has been curating data about the Arabidopsis thaliana genome. Its primary focus is integrating experimental gene function information from the peer-reviewed literature and codifying it as controlled vocabulary annotations. Our goal is to produce a "gold standard" functional annotation set that reflects the current state of knowledge about the Arabidopsis genome. At the same time, the resource serves as a nexus for community-based collaborations aimed at improving data quality, access, and reuse. For the past decade, our work has been made possible by subscriptions from our global user base. This update covers our ongoing biocuration work, some of our modernization efforts that contribute to the first major infrastructure overhaul since 2011, the introduction of JBrowse2, and the resource's role in community activities such as organizing the structural reannotation of the genome. For gene function assessment, we used gene ontology annotations as a metric to evaluate: (1) what is currently known about Arabidopsis gene function and (2) the set of "unknown" genes. Currently, 74% of the proteome has been annotated to at least one gene ontology term. Of those loci, half have experimental support for at least one of the following aspects: molecular function, biological process, or cellular component. Our work sheds light on the genes for which we have not yet identified any published experimental data and have no functional annotation. Drawing attention to these unknown genes highlights knowledge gaps and potential sources of novel discoveries.
Collapse
|
4
|
Wright A, Wilkinson MD, Mungall C, Cain S, Richards S, Sternberg P, Provin E, Jacobs JL, Geib S, Raciti D, Yook K, Stein L, Molik DC. FAIR Header Reference genome: a TRUSTworthy standard. Brief Bioinform 2024; 25:bbae122. [PMID: 38555475 PMCID: PMC10981671 DOI: 10.1093/bib/bbae122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2023] [Revised: 02/16/2024] [Accepted: 02/22/2024] [Indexed: 04/02/2024] Open
Abstract
The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.
Collapse
Affiliation(s)
- Adam Wright
- Adaptive Oncology Program, Ontario Institute for Cancer Research, 661 University Avenue Suite 500, Toronto, ON M5G 0A3, Canada
| | - Mark D Wilkinson
- Departamento de Biotecnolog’ıa-Biolog’ıa Vegetal, Escuela T’ecnica Superior de Ingenier’ıa Agron’omica, Alimentaria y de Biosistemas,Centro de Biotecnolog’ıa y Gen’omica de Plantas (CBGP, UPM-INIA/CSIC), Universidad Polit’ecnica de Madrid (UPM) - Instituto Nacional de Investigaci’on y Tecnolog’ıa Agraria y Alimentaria (INIA/CSIC), Pozuelo de Alarc’on, Madrid, ES, Spain
| | - Christopher Mungall
- Biosystems Data Science, Lawrence Berkeley National Laboratory, Building: 977, 1 Cyclotron Rd, Berkeley, CA 94720, USA
| | - Scott Cain
- Adaptive Oncology Program, Ontario Institute for Cancer Research, 661 University Avenue Suite 500, Toronto, ON M5G 0A3, Canada
| | - Stephen Richards
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, MS: BCM226, Houston, TX 77030, USA
| | - Paul Sternberg
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Ellen Provin
- Department of Horticultural Studies, Texas A&M University, HFSB 204, TAMU 2133, College Station, TX 77848, USA
| | - Jonathan L Jacobs
- American Type Culture Collection, 10801 University Blvd, Manassas, VA 20110, USA
| | - Scott Geib
- Tropical Pest Genetics and Molecular Biology Research Unit, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, United States Department of Agriculture, Agricultural Research Service, 64 Nowelo St, Hilo HI 96720, USA
| | - Daniela Raciti
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Karen Yook
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lincoln Stein
- Adaptive Oncology Program, Ontario Institute for Cancer Research, 661 University Avenue Suite 500, Toronto, ON M5G 0A3, Canada
| | - David C Molik
- Arthropod-borne Animal Diseases Research Unit, Center for Grain and Animal Health Research United States Department of Agriculture, Agricultural Research Service, 1515 College Ave, Manhattan, KS 66502 USA
| |
Collapse
|
5
|
Harrison PW, Amode MR, Austine-Orimoloye O, Azov A, Barba M, Barnes I, Becker A, Bennett R, Berry A, Bhai J, Bhurji SK, Boddu S, Branco Lins PR, Brooks L, Ramaraju S, Campbell L, Martinez MC, Charkhchi M, Chougule K, Cockburn A, Davidson C, De Silva N, Dodiya K, Donaldson S, El Houdaigui B, Naboulsi T, Fatima R, Giron CG, Genez T, Grigoriadis D, Ghattaoraya G, Martinez JG, Gurbich T, Hardy M, Hollis Z, Hourlier T, Hunt T, Kay M, Kaykala V, Le T, Lemos D, Lodha D, Marques-Coelho D, Maslen G, Merino G, Mirabueno L, Mushtaq A, Hossain S, Ogeh D, Sakthivel MP, Parker A, Perry M, Piližota I, Poppleton D, Prosovetskaia I, Raj S, Pérez-Silva J, Salam A, Saraf S, Saraiva-Agostinho N, Sheppard D, Sinha S, Sipos B, Sitnik V, Stark W, Steed E, Suner MM, Surapaneni L, Sutinen K, Tricomi FF, Urbina-Gómez D, Veidenberg A, Walsh TA, Ware D, Wass E, Willhoft N, Allen J, Alvarez-Jarreta J, Chakiachvili M, Flint B, Giorgetti S, Haggerty L, Ilsley G, Keatley J, Loveland J, Moore B, Mudge J, Naamati G, Tate J, Trevanion S, Winterbottom A, Frankish A, Hunt SE, Cunningham F, Dyer S, Finn R, Martin F, Yates A. Ensembl 2024. Nucleic Acids Res 2024; 52:D891-D899. [PMID: 37953337 PMCID: PMC10767893 DOI: 10.1093/nar/gkad1049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/20/2023] [Accepted: 10/24/2023] [Indexed: 11/14/2023] Open
Abstract
Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.
Collapse
Affiliation(s)
- Peter W Harrison
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - M Ridwan Amode
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Olanrewaju Austine-Orimoloye
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrey G Azov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthieu Barba
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - If Barnes
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Arne Becker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ruth Bennett
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew Berry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jyothish Bhai
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Simarpreet Kaur Bhurji
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Paulo R Branco Lins
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lucy Brooks
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shashank Budhanuru Ramaraju
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Lahcen I Campbell
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manuel Carbajo Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mehrnaz Charkhchi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
| | - Alexander Cockburn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Claire Davidson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nishadi H De Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kamalkumar Dodiya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Donaldson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bilal El Houdaigui
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tamara El Naboulsi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Reham Fatima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Carlos Garcia Giron
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thiago Genez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dionysios Grigoriadis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gurpreet S Ghattaoraya
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jose Gonzalez Martinez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tatiana A Gurbich
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Matthew Hardy
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Zoe Hollis
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Toby Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Mike Kay
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vinay Kaykala
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Tuan Le
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diana Lemos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Disha Lodha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Diego Marques-Coelho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gareth Maslen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Gabriela Alejandra Merino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Louisse Paola Mirabueno
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Aleena Mushtaq
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Syed Nakib Hossain
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Denye N Ogeh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Manoj Pandian Sakthivel
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Anne Parker
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Malcolm Perry
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ivana Piližota
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Daniel Poppleton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Irina Prosovetskaia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shriya Raj
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - José G Pérez-Silva
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Ahamed Imran Abdul Salam
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Shradha Saraf
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Nuno Saraiva-Agostinho
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Dan Sheppard
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Swati Sinha
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Botond Sipos
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Vasily Sitnik
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - William Stark
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Emily Steed
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marie-Marthe Suner
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Likhitha Surapaneni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Kyösti Sutinen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Francesca Floriana Tricomi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - David Urbina-Gómez
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andres Veidenberg
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Thomas A Walsh
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Agricultural Research Service, Ithaca, NY 14853, USA
| | - Elizabeth Wass
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Natalie L Willhoft
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jamie Allen
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jorge Alvarez-Jarreta
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Marc Chakiachvili
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Bethany Flint
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stefano Giorgetti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Leanne Haggerty
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Garth R Ilsley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jon Keatley
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jane E Loveland
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Benjamin Moore
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Jonathan M Mudge
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - John Tate
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Stephen J Trevanion
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrea Winterbottom
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Adam Frankish
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah E Hunt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fiona Cunningham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Sarah Dyer
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| | - Andrew D Yates
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge, Cambridgeshire CB10 1SD, UK
| |
Collapse
|
6
|
Gupta P, Elser J, Hooks E, D’Eustachio P, Jaiswal P, Naithani S. Plant Reactome Knowledgebase: empowering plant pathway exploration and OMICS data analysis. Nucleic Acids Res 2024; 52:D1538-D1547. [PMID: 37986220 PMCID: PMC10767815 DOI: 10.1093/nar/gkad1052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/20/2023] [Accepted: 10/23/2023] [Indexed: 11/22/2023] Open
Abstract
Plant Reactome (https://plantreactome.gramene.org) is a freely accessible, comprehensive plant pathway knowledgebase. It provides curated reference pathways from rice (Oryza sativa) and gene-orthology-based pathway projections to 129 additional species, spanning single-cell photoautotrophs, non-vascular plants, and higher plants, thus encompassing a wide-ranging taxonomic diversity. Currently, Plant Reactome houses a collection of 339 reference pathways, covering metabolic and transport pathways, hormone signaling, genetic regulations of developmental processes, and intricate transcriptional networks that orchestrate a plant's response to abiotic and biotic stimuli. Beyond being a mere repository, Plant Reactome serves as a dynamic data discovery platform. Users can analyze and visualize omics data, such as gene expression, gene-gene interaction, proteome, and metabolome data, all within the rich context of plant pathways. Plant Reactome is dedicated to fostering data interoperability, upholding global data standards, and embracing the tenets of the Findable, Accessible, Interoperable and Re-usable (FAIR) data policy.
Collapse
Affiliation(s)
- Parul Gupta
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Justin Elser
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Elizabeth Hooks
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | | | - Pankaj Jaiswal
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sushma Naithani
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
7
|
Wright A, Wilkinson MD, Mungall C, Cain S, Richards S, Sternberg P, Provin E, Jacobs JL, Geib S, Raciti D, Yook K, Stein L, Molik DC. DATA RESOURCES AND ANALYSES FAIR Header Reference genome: A TRUSTworthy standard. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.29.569306. [PMID: 38076838 PMCID: PMC10705436 DOI: 10.1101/2023.11.29.569306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR-bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability, and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability, and Technology (TRUST). The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility, and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR's design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.
Collapse
Affiliation(s)
- Adam Wright
- Adaptive Oncology Program, Ontario Institute for Cancer Research, 661 University Avenue Suite 500, Toronto, ON M5G 0A3, Canada
| | - Mark D Wilkinson
- Departamento de Biotecnología-Biología Vegetal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas,Centro de Biotecnología y Genómica de Plantas (CBGP, UPM-INIA/CSIC), Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA/CSIC), Pozuelo de Alarcón, Madrid, ES, Spain
| | - Chris Mungall
- Biosystems Data Science, Lawrence Berkeley National Laboratory, Building: 977, 1 Cyclotron Rd, Berkeley, CA 94720 USA
| | - Scott Cain
- Adaptive Oncology Program, Ontario Institute for Cancer Research, 661 University Avenue Suite 500, Toronto, ON M5G 0A3, Canada
| | - Stephen Richards
- Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, MS: BCM226, Houston, TX 77030, USA
| | - Paul Sternberg
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Ellen Provin
- Department of Horticultural Studies, Texas A&M University, HFSB 204, TAMU 2133, College Station, TX 77848, USA
| | - Jonathan L Jacobs
- American Type Culture Collection, 10801 University Blvd, Manassas, VA 20110, USA
| | - Scott Geib
- Tropical Pest Genetics and Molecular Biology Research Unit, Daniel K. Inouye U.S. Pacific Basin Agricultural Research Center, United States Department of Agriculture, Agricultural Research Service, 64 Nowelo St, Hilo HI 96720 USA
| | - Daniela Raciti
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Karen Yook
- Division of Biology and Biological Engineering 140-18, California Institute of Technology, Pasadena, CA 91125, USA
| | - Lincoln Stein
- Adaptive Oncology Program, Ontario Institute for Cancer Research, 661 University Avenue Suite 500, Toronto, ON M5G 0A3, Canada
| | - David C Molik
- Arthropod-borne Animal Diseases Research Unit, Center for Grain and Animal Health Research United States Department of Agriculture, Agricultural Research Service, 1515 College Ave, Manhattan, KS 66502 USA
| |
Collapse
|
8
|
Gupta P, Geniza M, Elser J, Al-Bader N, Baschieri R, Phillips JL, Haq E, Preece J, Naithani S, Jaiswal P. Reference genome of the nutrition-rich orphan crop chia ( Salvia hispanica) and its implications for future breeding. FRONTIERS IN PLANT SCIENCE 2023; 14:1272966. [PMID: 38162307 PMCID: PMC10757625 DOI: 10.3389/fpls.2023.1272966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 10/23/2023] [Indexed: 01/03/2024]
Abstract
Chia (Salvia hispanica L.) is one of the most popular nutrition-rich foods and pseudocereal crops of the family Lamiaceae. Chia seeds are a rich source of proteins, polyunsaturated fatty acids (PUFAs), dietary fibers, and antioxidants. In this study, we present the assembly of the chia reference genome, which spans 303.6 Mb and encodes 48,090 annotated protein-coding genes. Our analysis revealed that ~42% of the chia genome harbors repetitive content, and identified ~3 million single nucleotide polymorphisms (SNPs) and 15,380 simple sequence repeat (SSR) marker sites. By investigating the chia transcriptome, we discovered that ~44% of the genes undergo alternative splicing with a higher frequency of intron retention events. Additionally, we identified chia genes associated with important nutrient content and quality traits, such as the biosynthesis of PUFAs and seed mucilage fiber (dietary fiber) polysaccharides. Notably, this is the first report of in-silico annotation of a plant genome for protein-derived small bioactive peptides (biopeptides) associated with improving human health. To facilitate further research and translational applications of this valuable orphan crop, we have developed the Salvia genomics database (SalviaGDB), accessible at https://salviagdb.org.
Collapse
Affiliation(s)
- Parul Gupta
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Matthew Geniza
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
- Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, United States
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Noor Al-Bader
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
- Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, United States
| | - Rachel Baschieri
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Jeremy Levi Phillips
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Ebaad Haq
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
9
|
Deng CH, Naithani S, Kumari S, Cobo-Simón I, Quezada-Rodríguez EH, Skrabisova M, Gladman N, Correll MJ, Sikiru AB, Afuwape OO, Marrano A, Rebollo I, Zhang W, Jung S. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database (Oxford) 2023; 2023:baad088. [PMID: 38079567 PMCID: PMC10712715 DOI: 10.1093/database/baad088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023]
Abstract
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
Collapse
Affiliation(s)
- Cecilia H Deng
- Molecular and Digital Breeding, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, 120 Mt Albert Road, Auckland 1025, New Zealand
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
| | - Irene Cobo-Simón
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
- Institute of Forest Science (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Elsa H Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México, México
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Maria Skrabisova
- Department of Biochemistry, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Nick Gladman
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853, USA
| | - Melanie J Correll
- Agricultural and Biological Engineering Department, University of Florida, 1741 Museum Rd, Gainesville, FL 32611, USA
| | | | | | - Annarita Marrano
- Phoenix Bioinformatics, 39899 Balentine Drive, Suite 200, Newark, CA 94560, USA
| | | | - Wentao Zhang
- National Research Council Canada, 110 Gymnasium Pl, Saskatoon, Saskatchewan S7N 0W9, Canada
| | - Sook Jung
- Department of Horticulture, Washington State University, 303c Plant Sciences Building, Pullman, WA 99164-6414, USA
| |
Collapse
|
10
|
Clarke JL, Cooper LD, Poelchau MF, Berardini TZ, Elser J, Farmer AD, Ficklin S, Kumari S, Laporte MA, Nelson RT, Sadohara R, Selby P, Thessen AE, Whitehead B, Sen TZ. Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium. Database (Oxford) 2023; 2023:baad076. [PMID: 37971715 PMCID: PMC10653126 DOI: 10.1093/database/baad076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]
Abstract
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.
Collapse
Affiliation(s)
- Jennifer L Clarke
- Department of Statistics and Department of Food Science and Technology, University of Nebraska–Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Laurel D Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Monica F Poelchau
- USDA, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Ave, Beltsville 20705, USA
| | - Tanya Z Berardini
- The Arabidopsis Information Resource and Phoenix Bioinformatic, 39899 Balentine Drive, Suite 200, Newark, CA, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Andrew D Farmer
- National Center for Genome Resources, 2935 Rodeo Park Dr. E., Santa Fe, NM 87505, USA
| | - Stephen Ficklin
- Department of Horticulture, Washington State University, 249 Clark Hall, PO Box 646414, Pullman, WA 99164, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Marie-Angélique Laporte
- Digital Inclusion, Bioversity International, Parc Scientifique Agropolis II, 1990 Bd de la Lironde, Montpellier 34397, France
| | - Rex T Nelson
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Iowa State University, 716 Farmhouse Lane, Ames, IA 50011, USA
| | - Rie Sadohara
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, 1066 Bogue St, East Lansing, MI 48824, USA
| | - Peter Selby
- School of Integrative Plant Science, College of Agriculture and Life Sciences, Cornell University, 215 Garden Avenue, Ithaca, NY 14850, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz, 1890 N. Revere Court, Mailstop F600, Aurora CO 80045, USA
| | - Brandon Whitehead
- Data Science and Informatics, Manaaki Whenua—Landcare Research, Ltd., Riddet Road, Massey University, Palmerston North 4472, New Zealand
| | - Taner Z Sen
- USDA, Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany 94710, USA
- Department of Bioengineering, University of California, 306 Stanley Hall, Berkeley, CA 94720, USA
| |
Collapse
|
11
|
Naithani S, Deng CH, Sahu SK, Jaiswal P. Exploring Pan-Genomes: An Overview of Resources and Tools for Unraveling Structure, Function, and Evolution of Crop Genes and Genomes. Biomolecules 2023; 13:1403. [PMID: 37759803 PMCID: PMC10527062 DOI: 10.3390/biom13091403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/29/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
The availability of multiple sequenced genomes from a single species made it possible to explore intra- and inter-specific genomic comparisons at higher resolution and build clade-specific pan-genomes of several crops. The pan-genomes of crops constructed from various cultivars, accessions, landraces, and wild ancestral species represent a compendium of genes and structural variations and allow researchers to search for the novel genes and alleles that were inadvertently lost in domesticated crops during the historical process of crop domestication or in the process of extensive plant breeding. Fortunately, many valuable genes and alleles associated with desirable traits like disease resistance, abiotic stress tolerance, plant architecture, and nutrition qualities exist in landraces, ancestral species, and crop wild relatives. The novel genes from the wild ancestors and landraces can be introduced back to high-yielding varieties of modern crops by implementing classical plant breeding, genomic selection, and transgenic/gene editing approaches. Thus, pan-genomic represents a great leap in plant research and offers new avenues for targeted breeding to mitigate the impact of global climate change. Here, we summarize the tools used for pan-genome assembly and annotations, web-portals hosting plant pan-genomes, etc. Furthermore, we highlight a few discoveries made in crops using the pan-genomic approach and future potential of this emerging field of study.
Collapse
Affiliation(s)
- Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA;
| | - Cecilia H. Deng
- Molecular & Digital Breeing Group, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, Private Bag 92169, Auckland 1142, New Zealand;
| | - Sunil Kumar Sahu
- State Key Laboratory of Agricultural Genomics, Key Laboratory of Genomics, Ministry of Agriculture, BGI Research, Shenzhen 518083, China;
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA;
| |
Collapse
|
12
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
13
|
Naithani S, Mohanty B, Elser J, D’Eustachio P, Jaiswal P. Biocuration of a Transcription Factors Network Involved in Submergence Tolerance during Seed Germination and Coleoptile Elongation in Rice ( Oryza sativa). PLANTS (BASEL, SWITZERLAND) 2023; 12:2146. [PMID: 37299125 PMCID: PMC10255735 DOI: 10.3390/plants12112146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 05/19/2023] [Accepted: 05/23/2023] [Indexed: 06/12/2023]
Abstract
Modeling biological processes and genetic-regulatory networks using in silico approaches provides a valuable framework for understanding how genes and associated allelic and genotypic differences result in specific traits. Submergence tolerance is a significant agronomic trait in rice; however, the gene-gene interactions linked with this polygenic trait remain largely unknown. In this study, we constructed a network of 57 transcription factors involved in seed germination and coleoptile elongation under submergence. The gene-gene interactions were based on the co-expression profiles of genes and the presence of transcription factor binding sites in the promoter region of target genes. We also incorporated published experimental evidence, wherever available, to support gene-gene, gene-protein, and protein-protein interactions. The co-expression data were obtained by re-analyzing publicly available transcriptome data from rice. Notably, this network includes OSH1, OSH15, OSH71, Sub1B, ERFs, WRKYs, NACs, ZFP36, TCPs, etc., which play key regulatory roles in seed germination, coleoptile elongation and submergence response, and mediate gravitropic signaling by regulating OsLAZY1 and/or IL2. The network of transcription factors was manually biocurated and submitted to the Plant Reactome Knowledgebase to make it publicly accessible. We expect this work will facilitate the re-analysis/re-use of OMICs data and aid genomics research to accelerate crop improvement.
Collapse
Affiliation(s)
- Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA; (J.E.); (P.J.)
| | - Bijayalaxmi Mohanty
- NUS Environmental Research Institute, National University of Singapore, Singapore 117411, Singapore;
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA; (J.E.); (P.J.)
| | - Peter D’Eustachio
- Department of Biochemistry and Molecular Pharmacology, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA; (J.E.); (P.J.)
| |
Collapse
|
14
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
15
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
16
|
Zhang H, Wafula EK, Eilers J, Harkess A, Ralph PE, Timilsena PR, dePamphilis CW, Waite JM, Honaas LA. Building a foundation for gene family analysis in Rosaceae genomes with a novel workflow: A case study in Pyrus architecture genes. FRONTIERS IN PLANT SCIENCE 2022; 13:975942. [PMID: 36452099 PMCID: PMC9702816 DOI: 10.3389/fpls.2022.975942] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/21/2022] [Indexed: 05/26/2023]
Abstract
The rapid development of sequencing technologies has led to a deeper understanding of plant genomes. However, direct experimental evidence connecting genes to important agronomic traits is still lacking in most non-model plants. For instance, the genetic mechanisms underlying plant architecture are poorly understood in pome fruit trees, creating a major hurdle in developing new cultivars with desirable architecture, such as dwarfing rootstocks in European pear (Pyrus communis). An efficient way to identify genetic factors for important traits in non-model organisms can be to transfer knowledge across genomes. However, major obstacles exist, including complex evolutionary histories and variable quality and content of publicly available plant genomes. As researchers aim to link genes to traits of interest, these challenges can impede the transfer of experimental evidence across plant species, namely in the curation of high-quality, high-confidence gene models in an evolutionary context. Here we present a workflow using a collection of bioinformatic tools for the curation of deeply conserved gene families of interest across plant genomes. To study gene families involved in tree architecture in European pear and other rosaceous species, we used our workflow, plus a draft genome assembly and high-quality annotation of a second P. communis cultivar, 'd'Anjou.' Our comparative gene family approach revealed significant issues with the most recent 'Bartlett' genome - primarily thousands of missing genes due to methodological bias. After correcting assembly errors on a global scale in the 'Bartlett' genome, we used our workflow for targeted improvement of our genes of interest in both P. communis genomes, thus laying the groundwork for future functional studies in pear tree architecture. Further, our global gene family classification of 15 genomes across 6 genera provides a valuable and previously unavailable resource for the Rosaceae research community. With it, orthologs and other gene family members can be easily identified across any of the classified genomes. Importantly, our workflow can be easily adopted for any other plant genomes and gene families of interest.
Collapse
Affiliation(s)
- Huiting Zhang
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
- Department of Horticulture, Washington State University, Pullman, WA, United States
| | - Eric K. Wafula
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Jon Eilers
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| | - Alex E. Harkess
- College of Agriculture, Auburn University, Auburn, AL, United States
- HudsonAlpha Institute for Biotechnology, Huntsville, AL, United States
| | - Paula E. Ralph
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Prakash Raj Timilsena
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Claude W. dePamphilis
- Department of Biology, The Pennsylvania State University, University Park, PA, United States
| | - Jessica M. Waite
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| | - Loren A. Honaas
- Tree Fruit Research Laboratory, Agricultural Research Service (ARS), United States Department of Agriculture (USDA), Wenatchee, WA, United States
| |
Collapse
|
17
|
Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q. Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOLECULAR PLANT 2022; 15:1664-1695. [PMID: 36081348 DOI: 10.1016/j.molp.2022.09.001] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 08/20/2022] [Accepted: 09/02/2022] [Indexed: 05/12/2023]
Abstract
The first paradigm of plant breeding involves direct selection-based phenotypic observation, followed by predictive breeding using statistical models for quantitative traits constructed based on genetic experimental design and, more recently, by incorporation of molecular marker genotypes. However, plant performance or phenotype (P) is determined by the combined effects of genotype (G), envirotype (E), and genotype by environment interaction (GEI). Phenotypes can be predicted more precisely by training a model using data collected from multiple sources, including spatiotemporal omics (genomics, phenomics, and enviromics across time and space). Integration of 3D information profiles (G-P-E), each with multidimensionality, provides predictive breeding with both tremendous opportunities and great challenges. Here, we first review innovative technologies for predictive breeding. We then evaluate multidimensional information profiles that can be integrated with a predictive breeding strategy, particularly envirotypic data, which have largely been neglected in data collection and are nearly untouched in model construction. We propose a smart breeding scheme, integrated genomic-enviromic prediction (iGEP), as an extension of genomic prediction, using integrated multiomics information, big data technology, and artificial intelligence (mainly focused on machine and deep learning). We discuss how to implement iGEP, including spatiotemporal models, environmental indices, factorial and spatiotemporal structure of plant breeding data, and cross-species prediction. A strategy is then proposed for prediction-based crop redesign at both the macro (individual, population, and species) and micro (gene, metabolism, and network) scales. Finally, we provide perspectives on translating smart breeding into genetic gain through integrative breeding platforms and open-source breeding initiatives. We call for coordinated efforts in smart breeding through iGEP, institutional partnerships, and innovative technological support.
Collapse
Affiliation(s)
- Yunbi Xu
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; CIMMYT-China Tropical Maize Research Center, School of Food Science and Engineering, Foshan University, Foshan, Guangdong 528231, China; Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China.
| | - Xingping Zhang
- Peking University Institute of Advanced Agricultural Sciences, Weifang, Shandong 261325, China
| | - Huihui Li
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China; National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, Hainan 572024, China
| | - Hongjian Zheng
- CIMMYT-China Specialty Maize Research Center, Shanghai Academy of Agricultural Sciences, Shanghai 201400, China
| | - Jianan Zhang
- MolBreeding Biotechnology Co., Ltd., Shijiazhuang, Hebei 050035, China
| | - Michael S Olsen
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Rajeev K Varshney
- State Agricultural Biotechnology Centre, Centre for Crop and Food Innovation, Food Futures Institute, Murdoch University, Murdoch, Australia
| | - Boddupalli M Prasanna
- CIMMYT (International Maize and Wheat Improvement Center), ICRAF Campus, United Nations Avenue, Nairobi, Kenya
| | - Qian Qian
- Institute of Crop Sciences, CIMMYT-China, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| |
Collapse
|
18
|
Yao E, Blake VC, Cooper L, Wight CP, Michel S, Cagirici HB, Lazo GR, Birkett CL, Waring DJ, Jannink JL, Holmes I, Waters AJ, Eickholt DP, Sen TZ. GrainGenes: a data-rich repository for small grains genetics and genomics. Database (Oxford) 2022; 2022:6591224. [PMID: 35616118 PMCID: PMC9216595 DOI: 10.1093/database/baac034] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 04/01/2022] [Accepted: 04/26/2022] [Indexed: 05/16/2023]
Abstract
As one of the US Department of Agriculture-Agricultural Research Service flagship databases, GrainGenes (https://wheat.pw.usda.gov) serves the data and community needs of globally distributed small grains researchers for the genetic improvement of the Triticeae family and Avena species that include wheat, barley, rye and oat. GrainGenes accomplishes its mission by continually enriching its cross-linked data content following the findable, accessible, interoperable and reusable principles, enhancing and maintaining an intuitive web interface, creating tools to enable easy data access and establishing data connections within and between GrainGenes and other biological databases to facilitate knowledge discovery. GrainGenes operates within the biological database community, collaborates with curators and genome sequencing groups and contributes to the AgBioData Consortium and the International Wheat Initiative through the Wheat Information System (WheatIS). Interactive and linked content is paramount for successful biological databases and GrainGenes now has 2917 manually curated gene records, including 289 genes and 254 alleles from the Wheat Gene Catalogue (WGC). There are >4.8 million gene models in 51 genome browser assemblies, 6273 quantitative trait loci and >1.4 million genetic loci on 4756 genetic and physical maps contained within 443 mapping sets, complete with standardized metadata. Most notably, 50 new genome browsers that include outputs from the Wheat and Barley PanGenome projects have been created. We provide an example of an expression quantitative trait loci track on the International Wheat Genome Sequencing Consortium Chinese Spring wheat browser to demonstrate how genome browser tracks can be adapted for different data types. To help users benefit more from its data, GrainGenes created four tutorials available on YouTube. GrainGenes is executing its vision of service by continuously responding to the needs of the global small grains community by creating a centralized, long-term, interconnected data repository. Database URL:https://wheat.pw.usda.gov.
Collapse
Affiliation(s)
- Eric Yao
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
- Department of Bioengineering, University of California, Stanley Hall, Berkeley, CA 94720-1762, USA
| | - Victoria C Blake
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
- Department of Plant Sciences and Plant Pathology, Montana State University, 119 Plant Biosciences Building, Bozeman, MT 59717, USA
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, 1500 SW Jefferson Way, Corvallis, OR 97331, USA
| | - Charlene P Wight
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Ave., Ottawa, ON K1A 0C6, Canada
| | - Steve Michel
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
| | - H Busra Cagirici
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
| | - Gerard R Lazo
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
| | - Clay L Birkett
- United States Department of Agriculture—Agricultural Research Service, Robert Holley Center, 538 Tower Rd., Ithaca, NY 14853, USA
| | - David J Waring
- Section of Plant Breeding and Genetics, Cornell University, Bradfield Hall, 306 Tower Rd, Ithaca, NY 14853, USA
| | - Jean-Luc Jannink
- United States Department of Agriculture—Agricultural Research Service, Robert Holley Center, 538 Tower Rd., Ithaca, NY 14853, USA
- Section of Plant Breeding and Genetics, Cornell University, Bradfield Hall, 306 Tower Rd, Ithaca, NY 14853, USA
| | - Ian Holmes
- Department of Bioengineering, University of California, Stanley Hall, Berkeley, CA 94720-1762, USA
| | - Amanda J Waters
- PepsiCo R&D, 1991 Upper Buford Circle, 210 Borlaug Hall, St. Paul, MN 55108, USA
| | - David P Eickholt
- PepsiCo R&D, 1991 Upper Buford Circle, 210 Borlaug Hall, St. Paul, MN 55108, USA
| | - Taner Z Sen
- *Corresponding author: Tel: +1 (510) 559-5982; Fax: + 1 (510) 559-5963;
| |
Collapse
|
19
|
Danilevicz MF, Gill M, Anderson R, Batley J, Bennamoun M, Bayer PE, Edwards D. Plant Genotype to Phenotype Prediction Using Machine Learning. Front Genet 2022; 13:822173. [PMID: 35664329 PMCID: PMC9159391 DOI: 10.3389/fgene.2022.822173] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2021] [Accepted: 03/07/2022] [Indexed: 12/13/2022] Open
Abstract
Genomic prediction tools support crop breeding based on statistical methods, such as the genomic best linear unbiased prediction (GBLUP). However, these tools are not designed to capture non-linear relationships within multi-dimensional datasets, or deal with high dimension datasets such as imagery collected by unmanned aerial vehicles. Machine learning (ML) algorithms have the potential to surpass the prediction accuracy of current tools used for genotype to phenotype prediction, due to their capacity to autonomously extract data features and represent their relationships at multiple levels of abstraction. This review addresses the challenges of applying statistical and machine learning methods for predicting phenotypic traits based on genetic markers, environment data, and imagery for crop breeding. We present the advantages and disadvantages of explainable model structures, discuss the potential of machine learning models for genotype to phenotype prediction in crop breeding, and the challenges, including the scarcity of high-quality datasets, inconsistent metadata annotation and the requirements of ML models.
Collapse
Affiliation(s)
- Monica F. Danilevicz
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Robyn Anderson
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Mohammed Bennamoun
- School of Physics, Mathematics and Computing, University of Western Australia, Perth, WA, Australia
| | - Philipp E. Bayer
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
- *Correspondence: David Edwards,
| |
Collapse
|
20
|
Tuggle CK, Clarke J, Dekkers JCM, Ertl D, Lawrence-Dill CJ, Lyons E, Murdoch BM, Scott NM, Schnable PS. The Agricultural Genome to Phenome Initiative (AG2PI): creating a shared vision across crop and livestock research communities. Genome Biol 2022; 23:3. [PMID: 34980221 PMCID: PMC8722016 DOI: 10.1186/s13059-021-02570-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Affiliation(s)
| | | | | | - David Ertl
- Iowa Corn Growers Association, Johnston, USA
| | | | | | | | | | | |
Collapse
|
21
|
Yu J, Jung S, Cheng CH, Lee T, Zheng P, Buble K, Crabb J, Humann J, Hough H, Jones D, Campbell JT, Udall J, Main D. CottonGen: The Community Database for Cotton Genomics, Genetics, and Breeding Research. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10122805. [PMID: 34961276 PMCID: PMC8705096 DOI: 10.3390/plants10122805] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/11/2021] [Accepted: 12/12/2021] [Indexed: 05/12/2023]
Abstract
Over the last eight years, the volume of whole genome, gene expression, SNP genotyping, and phenotype data generated by the cotton research community has exponentially increased. The efficient utilization/re-utilization of these complex and large datasets for knowledge discovery, translation, and application in crop improvement requires them to be curated, integrated with other types of data, and made available for access and analysis through efficient online search tools. Initiated in 2012, CottonGen is an online community database providing access to integrated peer-reviewed cotton genomic, genetic, and breeding data, and analysis tools. Used by cotton researchers worldwide, and managed by experts with crop-specific knowledge, it continuous to be the logical choice to integrate new data and provide necessary interfaces for information retrieval. The repository in CottonGen contains colleague, gene, genome, genotype, germplasm, map, marker, metabolite, phenotype, publication, QTL, species, transcriptome, and trait data curated by the CottonGen team. The number of data entries housed in CottonGen has increased dramatically, for example, since 2014 there has been an 18-fold increase in genes/mRNAs, a 23-fold increase in whole genomes, and a 372-fold increase in genotype data. New tools include a genetic map viewer, a genome browser, a synteny viewer, a metabolite pathways browser, sequence retrieval, BLAST, and a breeding information management system (BIMS), as well as various search pages for new data types. CottonGen serves as the home to the International Cotton Genome Initiative, managing its elections and serving as a communication and coordination hub for the community. With its extensive curation and integration of data and online tools, CottonGen will continue to facilitate utilization of its critical resources to empower research for cotton crop improvement.
Collapse
Affiliation(s)
- Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Taein Lee
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Ping Zheng
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Katheryn Buble
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - James Crabb
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Jodi Humann
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Heidi Hough
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Don Jones
- Cotton Incorporated, Cary, NC 27513, USA;
| | - J. Todd Campbell
- The Agricultural Research Service of U.S. Department of Agriculture, Florence, SC 29501, USA;
| | - Josh Udall
- The Agricultural Research Service of U.S. Department of Agriculture, College Station, TX 77845, USA;
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
- Correspondence: ; Tel.: +1-509-335-2774
| |
Collapse
|
22
|
Andrade R, Urioste S, Rivera T, Schiek B, Nyakundi F, Vergara J, Mwanzia L, Loaiza K, Gonzalez C. Where Is My Crop? Data-Driven Initiatives to Support Integrated Multi-Stakeholder Agricultural Decisions. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2021. [DOI: 10.3389/fsufs.2021.737528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Globally, there has been an explosion of data generation in agriculture. With such a deluge of data available, it has become essential to create solutions that organize, analyze, and visualize it to gain actionable insights, which can guide farmers, scientists, or policy makers to take better decisions that lead to transformative actions for agriculture. There is a plethora of digital innovations in agriculture that implement big data techniques to harness solutions from large amounts of data, however, there is also a significant gap in access to these innovations among stakeholders of the value chains, with smallholder's farmers facing higher risks. Open data platforms have emerged as an important source of information for this group of producers but are still far from reaching their full potential. While the growing number of such initiatives has improved the availability and reach of data, it has also made the collection and processing of this information more difficult, widening the gap between those who can process and interpret this information and those who cannot. The Crop Observatories are presented in this article as an initiative that aims to harmonize large amounts of crop-specific data from various open access sources to build relevant indicators for decision making. Observatories are being developed for rice, cassava, beans, plantain and banana, and tropical forages, containing information on production, prices, policies, breeding, agronomy, and socioeconomic variables of interest. The Observatories are expected to become a lighthouse that attracts multi-stakeholders to avoid “not see the forest for the trees” and to advance research and strengthen crop economic systems. The process of developing the Observatories, as well as the methods for data collection, analysis, and display, is described. The main results obtained by the recently launched Rice Observatory (www.riceobservatory.org), and the about to be launched Cassava Observatory are presented, contextualizing their potential use and importance for multi-stakeholders of both crops. The article concludes with a list of lessons learned and next steps for the Observatories, which are also expected to guide the development of similar initiatives. Observatories, beyond presenting themselves as an alternative for improving data-driven decision making, can become platforms for collaboration on data issues and digital innovations within each sector.
Collapse
|
23
|
Volk GM, Byrne PF, Coyne CJ, Flint-Garcia S, Reeves PA, Richards C. Integrating Genomic and Phenomic Approaches to Support Plant Genetic Resources Conservation and Use. PLANTS (BASEL, SWITZERLAND) 2021; 10:2260. [PMID: 34834625 PMCID: PMC8619436 DOI: 10.3390/plants10112260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 10/20/2021] [Accepted: 10/20/2021] [Indexed: 05/17/2023]
Abstract
Plant genebanks provide genetic resources for breeding and research programs worldwide. These programs benefit from having access to high-quality, standardized phenotypic and genotypic data. Technological advances have made it possible to collect phenomic and genomic data for genebank collections, which, with the appropriate analytical tools, can directly inform breeding programs. We discuss the importance of considering genebank accession homogeneity and heterogeneity in data collection and documentation. Citing specific examples, we describe how well-documented genomic and phenomic data have met or could meet the needs of plant genetic resource managers and users. We explore future opportunities that may emerge from improved documentation and data integration among plant genetic resource information systems.
Collapse
Affiliation(s)
- Gayle M. Volk
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| | - Patrick F. Byrne
- Department of Soil and Crop Sciences, Colorado State University, Fort Collins, CO 80523, USA;
| | - Clarice J. Coyne
- United States Department of Agriculture, Agricultural Research Service, Western Regional Plant Introduction Station, Pullman, WA 99164, USA;
| | - Sherry Flint-Garcia
- Plant Genetics Research Unit, United States Department of Agriculture, Agricultural Research Service, Columbia, MO 65211, USA;
| | - Patrick A. Reeves
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| | - Chris Richards
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| |
Collapse
|
24
|
Staton M, Cannon E, Sanderson LA, Wegrzyn J, Anderson T, Buehler S, Cobo-Simón I, Faaberg K, Grau E, Guignon V, Gunoskey J, Inderski B, Jung S, Lager K, Main D, Poelchau M, Ramnath R, Richter P, West J, Ficklin S. Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Brief Bioinform 2021; 22:6318561. [PMID: 34251419 PMCID: PMC8574961 DOI: 10.1093/bib/bbab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/01/2022] Open
Abstract
Online, open access databases for biological knowledge serve as central repositories for research communities to store, find and analyze integrated, multi-disciplinary datasets. With increasing volumes, complexity and the need to integrate genomic, transcriptomic, metabolomic, proteomic, phenomic and environmental data, community databases face tremendous challenges in ongoing maintenance, expansion and upgrades. A common infrastructure framework using community standards shared by many databases can reduce development burden, provide interoperability, ensure use of common standards and support long-term sustainability. Tripal is a mature, open source platform built to meet this need. With ongoing improvement since its first release in 2009, Tripal provides full functionality for searching, browsing, loading and curating numerous types of data and is a primary technology powering at least 31 publicly available databases spanning plants, animals and human data, primarily storing genomics, genetics and breeding data. Tripal software development is managed by a shared, inclusive governance structure including both project management and advisory teams. Here, we report on the most important and innovative aspects of Tripal after 11 years development, including integration of diverse types of biological data, successful collaborative projects across member databases, and support for implementing FAIR principles.
Collapse
Affiliation(s)
| | - Ethalinda Cannon
- USDA-ARS, Corn Insects and Crop Genetics Research Unit, Ames, IA USA
| | | | | | | | | | | | - Kay Faaberg
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Emily Grau
- University of Connecticut, Storrs, CT USA
| | | | | | | | - Sook Jung
- Washington State University, Pullman, WA USA
| | - Kelly Lager
- USDA-ARS, National Animal Disease Center, Ames, IA, USA
| | - Dorrie Main
- Washington State University, Pullman, WA USA
| | - Monica Poelchau
- USDA-ARS, National Agricultural Library, Beltsville, MD, USA
| | | | | | - Joe West
- University of Tennessee, Knoxville, TN USA
| | | |
Collapse
|
25
|
Andrés-Hernández L, Halimi RA, Mauleon R, Mayes S, Baten A, King GJ. Challenges for FAIR-compliant description and comparison of crop phenotype data with standardized controlled vocabularies. Database (Oxford) 2021; 2021:baab028. [PMID: 33991093 PMCID: PMC8122365 DOI: 10.1093/database/baab028] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 04/14/2021] [Accepted: 04/30/2021] [Indexed: 12/04/2022]
Abstract
Crop phenotypic data underpin many pre-breeding efforts to characterize variation within germplasm collections. Although there has been an increase in the global capacity for accumulating and comparing such data, a lack of consistency in the systematic description of metadata often limits integration and sharing. We therefore aimed to understand some of the challenges facing findable, accesible, interoperable and reusable (FAIR) curation and annotation of phenotypic data from minor and underutilized crops. We used bambara groundnut (Vigna subterranea) as an exemplar underutilized crop to assess the ability of the Crop Ontology system to facilitate curation of trait datasets, so that they are accessible for comparative analysis. This involved generating a controlled vocabulary Trait Dictionary of 134 terms. Systematic quantification of syntactic and semantic cohesiveness of the full set of 28 crop-specific COs identified inconsistencies between trait descriptor names, a relative lack of cross-referencing to other ontologies and a flat ontological structure for classifying traits. We also evaluated the Minimal Information About a Phenotyping Experiment and FAIR compliance of bambara trait datasets curated within the CropStoreDB schema. We discuss specifications for a more systematic and generic approach to trait controlled vocabularies, which would benefit from representation of terms that adhere to Open Biological and Biomedical Ontologies principles. In particular, we focus on the benefits of reuse of existing definitions within pre- and post-composed axioms from other domains in order to facilitate the curation and comparison of datasets from a wider range of crops. Database URL: https://www.cropstoredb.org/cs_bambara.html.
Collapse
Affiliation(s)
- Liliana Andrés-Hernández
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Razlin Azman Halimi
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Ramil Mauleon
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| | - Sean Mayes
- School of Biosciences, University of Nottingham, Sutton Bonington, Leicestershire, LE12 5RD,Nottingham, Nottingham, UK
| | - Abdul Baten
- Institute of Precision Medicine & Bioinformatics, Sydney Local Health District, Royal Prince Alfred Hospital, Missenden Road, Camperdown, NSW 2050, Australia
| | - Graham J King
- Southern Cross Plant Science, Southern Cross University, PO Box 157, Lismore, NSW 2480, Australia
| |
Collapse
|
26
|
Bayer PE, Edwards D. Machine learning in agriculture: from silos to marketplaces. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:648-650. [PMID: 33289294 PMCID: PMC8051597 DOI: 10.1111/pbi.13521] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 11/23/2020] [Accepted: 11/29/2020] [Indexed: 05/03/2023]
Affiliation(s)
- Philipp E. Bayer
- School of Biological Sciences and Institute of AgricultureUniversity of Western AustraliaPerthWAAustralia
| | - David Edwards
- School of Biological Sciences and Institute of AgricultureUniversity of Western AustraliaPerthWAAustralia
| |
Collapse
|
27
|
Sen TZ, Caccamo M, Edwards D, Quesneville H. Building a successful international research community through data sharing: The case of the Wheat Information System (WheatIS). F1000Res 2021; 9:536. [PMID: 33763204 PMCID: PMC7953914 DOI: 10.12688/f1000research.23525.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/28/2020] [Indexed: 11/20/2022] Open
Abstract
The International Wheat Information System (WheatIS) Expert Working Group (EWG) was initiated in 2012 under the Wheat Initiative with a broad range of contributing organizations. The mission of the WheatIS EWG was to create an informational infrastructure, establish data standards, and build a single portal that allows search, retrieval, and display of globally distributed wheat data sets that are indexed in standard data formats at servers around the world. The web portal at WheatIS.org was released publicly in 2015, and by 2020, it expanded to 8 geographically-distributed nodes and around 20 organizations under its umbrella. In this paper, we present our experience, the challenges we faced, and the answer we brought for establishing an international research community to build an informational infrastructure. Our hope is that our experience with building wheatis.org will guide current and future research communities to facilitate institutional and international challenges to create global tools and resources to help their respective scientific communities.
Collapse
Affiliation(s)
- Taner Z Sen
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Mario Caccamo
- NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, University of Western Australia, Perth, WA, Australia
| | - Hadi Quesneville
- Université Paris-Saclay, INRAE, URGI, Versailles, 78026, France.,Université Paris-Saclay, INRAE, BioinfOmics, Plant bioinformatics facility, Versailles, 78026, France
| |
Collapse
|
28
|
Walters J, Light K, Robinson N. Using agricultural metadata: a novel investigation of trends in sowing date in on-farm research trials using the Online Farm Trials database. F1000Res 2020; 9:1305. [PMID: 34354820 PMCID: PMC8290206 DOI: 10.12688/f1000research.26903.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/20/2021] [Indexed: 11/20/2022] Open
Abstract
Background: A growing ability to collect data, together with the development and adoption of the FAIR guiding principles, has increased the amount of data available in many disciplines. This has given rise to an urgent need for robust metadata. Within the Australian grains industry, data from thousands of on-farm research trials (Trial Projects) have been made available via the
Online Farm Trials (OFT) website. OFT Trial Project metadata were developed as filters to refine front-end database searches, but could also be used as a dataset to investigate trends in metadata elements. Australian grains crops are being sown earlier, but whether on-farm research trials reflect this change is currently unknown. Methods: We investigated whether OFT Trial Project metadata could be used to detect trends in sowing dates of on-farm crop research trials across Australia, testing the hypothesis that research trials are being sown earlier in line with local farming practices. The investigation included 15 autumn-sown, winter crop species listed in the database, with trial records from 1993 to 2019. Results: Our analyses showed that (i) OFT Trial Project metadata can be used as a dataset to detect trends in sowing date; and (ii) cropping research trials are being sown earlier in Victoria and Western Australia, but no trend exists within the other states. Discussion/Conclusion: Our findings show that OFT Trial Project metadata can be used to detect trends in crop sowing date, suggesting that metadata could also be used to detect trends in other elements such as harvest date. Because OFT is a national database of research trials, further assessment of metadata may uncover important agronomic, cultural or economic trends within or across the Australian cropping regions. New information could then be used to lead practice change and increase productivity within the Australian grains industry.
Collapse
Affiliation(s)
- Judi Walters
- Centre for eResearch and Digital Innovation, Federation University Australia, Mount Helen, Victoria, 3350, Australia
| | - Kate Light
- Centre for eResearch and Digital Innovation, Federation University Australia, Mount Helen, Victoria, 3350, Australia
| | - Nathan Robinson
- Centre for eResearch and Digital Innovation, Federation University Australia, Mount Helen, Victoria, 3350, Australia
| |
Collapse
|
29
|
Arnaud E, Laporte MA, Kim S, Aubert C, Leonelli S, Miro B, Cooper L, Jaiswal P, Kruseman G, Shrestha R, Buttigieg PL, Mungall CJ, Pietragalla J, Agbona A, Muliro J, Detras J, Hualla V, Rathore A, Das RR, Dieng I, Bauchet G, Menda N, Pommier C, Shaw F, Lyon D, Mwanzia L, Juarez H, Bonaiuti E, Chiputwa B, Obileye O, Auzoux S, Yeumo ED, Mueller LA, Silverstein K, Lafargue A, Antezana E, Devare M, King B. The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems. PATTERNS (NEW YORK, N.Y.) 2020; 1:100105. [PMID: 33205138 PMCID: PMC7660444 DOI: 10.1016/j.patter.2020.100105] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 05/28/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022]
Abstract
Heterogeneous and multidisciplinary data generated by research on sustainable global agriculture and agrifood systems requires quality data labeling or annotation in order to be interoperable. As recommended by the FAIR principles, data, labels, and metadata must use controlled vocabularies and ontologies that are popular in the knowledge domain and commonly used by the community. Despite the existence of robust ontologies in the Life Sciences, there is currently no comprehensive full set of ontologies recommended for data annotation across agricultural research disciplines. In this paper, we discuss the added value of the Ontologies Community of Practice (CoP) of the CGIAR Platform for Big Data in Agriculture for harnessing relevant expertise in ontology development and identifying innovative solutions that support quality data annotation. The Ontologies CoP stimulates knowledge sharing among stakeholders, such as researchers, data managers, domain experts, experts in ontology design, and platform development teams.
Collapse
Affiliation(s)
- Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Soonho Kim
- Markets, Trade and Institutions Division (MTID), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Céline Aubert
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Sabina Leonelli
- Department of Sociology, Philosophy and Anthropology & Exeter Centre for the Study of the Life Sciences (Egenis), University of Exeter, Exeter, UK
| | - Berta Miro
- Agrifood Policy Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Gideon Kruseman
- Socio-Economics Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, Mexico
| | - Rosemary Shrestha
- Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Texcoco, State of México, México
| | - Pier Luigi Buttigieg
- Helmholtz Metadata Collaboration, GEOMAR Helmholtz Centre for Ocean Research, Kiel, Germany
| | - Christopher J. Mungall
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | | | - Afolabi Agbona
- Cassava Breeding Program, International Institute of Tropical Agriculture (IITA), Ibadan, Nigeria
| | | | - Jeffrey Detras
- Bioinformatics Cluster, Strategic Innovation Platform, International Rice Research Institute (IRRI), Los Baños, Laguna, Philippines
| | - Vilma Hualla
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Abhishek Rathore
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Roma Rani Das
- Statistics, Bioinformatics & Data Management (SBDM) Theme, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, Telangana, India
| | - Ibnou Dieng
- Biometrics Unit, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Guillaume Bauchet
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Naama Menda
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Cyril Pommier
- BioinfOmics, Plant Bioinformatics Facility, Université Paris-Saclay, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Felix Shaw
- Digital Biology, Earlham Institute, Norwich, Norfolk, UK
| | - David Lyon
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | - Leroy Mwanzia
- Performance, Innovation and Strategic Analysis, International Center for Tropical Agriculture (CIAT), Regional Office for Africa, Nairobi, Kenya
| | - Henry Juarez
- Research Informatics Unit (RIU), International Potato Center (CIP), Lima, Peru
| | - Enrico Bonaiuti
- Monitoring, Evaluation and Learning Team, International Center for Agricultural Research in the Dry Areas (ICARDA), Beirut, Lebanon
| | - Brian Chiputwa
- Research Methods Group (RMG), World Agroforestry (ICRAF), Nairobi, Kenya
| | - Olatunbosun Obileye
- Data Management Section, International Institute of Tropical Agriculture (IITA), Ibadan, Oyo State, Nigeria
| | - Sandrine Auzoux
- UPR AIDA, The French Agricultural Research Centre for International Development (CIRAD), Sainte-Clotilde, Réunion, France
- Université de Montpellier, Montpellier, France
| | - Esther Dzalé Yeumo
- Unité Délégation à l’Information Scientifique et Technique - DIST, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Versailles, France
| | - Lukas A. Mueller
- Mueller Bioinformatics Laboratory, Boyce Thompson Institute for Plant Research, Ithaca, NY, USA
| | | | | | - Erick Antezana
- Bayer Crop Science SA-NV, Diegem, Belgium
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Medha Devare
- Environment and Production Technology Division (EPTD), International Food Policy Research Institute (IFPRI), Washington, DC, USA
| | - Brian King
- CGIAR Platform for Big Data in Agriculture, International Center for Tropical Agriculture (CIAT), Cali, Colombia
| |
Collapse
|
30
|
Nédellec C, Ibanescu L, Bossy R, Sourdille P. WTO, an ontology for wheat traits and phenotypes in scientific publications. Genomics Inform 2020; 18:e14. [PMID: 32634868 PMCID: PMC7362939 DOI: 10.5808/gi.2020.18.2.e14] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Revised: 06/09/2020] [Accepted: 06/10/2020] [Indexed: 11/20/2022] Open
Abstract
Phenotyping is a major issue for wheat agriculture to meet the challenges of adaptation of wheat varieties to climate change and chemical input reduction in crop. The need to improve the reuse of observations and experimental data has led to the creation of reference ontologies to standardize descriptions of phenotypes and to facilitate their comparison. The scientific literature is largely under-exploited, although extremely rich in phenotype descriptions associated with cultivars and genetic information. In this paper we propose the Wheat Trait Ontology (WTO) that is suitable for the extraction and management of scientific information from scientific papers, and its combination with data from genomic and experimental databases. We describe the principles of WTO construction and show examples of WTO use for the extraction and management of phenotype descriptions obtained from scientific documents.
Collapse
Affiliation(s)
- Claire Nédellec
- Paris-Saclay University, INRAE, MaIAGE, F-78350 Jouy-en-Josas, France
| | - Liliana Ibanescu
- Paris-Saclay University, INRAE, UMR MIA-Paris, AgroParisTech, F-75005, Paris, France
| | - Robert Bossy
- Paris-Saclay University, INRAE, MaIAGE, F-78350 Jouy-en-Josas, France
| | - Pierre Sourdille
- University Clermont-Auvergne, INRAE, UMR 1095 GDEC, F-63000 Clermont-Ferrand, France
| |
Collapse
|
31
|
Portwood JL, Woodhouse MR, Cannon EK, Gardiner JM, Harper LC, Schaeffer ML, Walsh JR, Sen TZ, Cho KT, Schott DA, Braun BL, Dietze M, Dunfee B, Elsik CG, Manchanda N, Coe E, Sachs M, Stinard P, Tolbert J, Zimmerman S, Andorf CM. MaizeGDB 2018: the maize multi-genome genetics and genomics database. Nucleic Acids Res 2020; 47:D1146-D1154. [PMID: 30407532 PMCID: PMC6323944 DOI: 10.1093/nar/gky1046] [Citation(s) in RCA: 161] [Impact Index Per Article: 40.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 10/16/2018] [Indexed: 01/12/2023] Open
Abstract
Since its 2015 update, MaizeGDB, the Maize Genetics and Genomics database, has expanded to support the sequenced genomes of many maize inbred lines in addition to the B73 reference genome assembly. Curation and development efforts have targeted high quality datasets and tools to support maize trait analysis, germplasm analysis, genetic studies, and breeding. MaizeGDB hosts a wide range of data including recent support of new data types including genome metadata, RNA-seq, proteomics, synteny, and large-scale diversity. To improve access and visualization of data types several new tools have been implemented to: access large-scale maize diversity data (SNPversity), download and compare gene expression data (qTeller), visualize pedigree data (Pedigree Viewer), link genes with phenotype images (MaizeDIG), and enable flexible user-specified queries to the MaizeGDB database (MaizeMine). MaizeGDB also continues to be the community hub for maize research, coordinating activities and providing technical support to the maize research community. Here we report the changes MaizeGDB has made within the last three years to keep pace with recent software and research advances, as well as the pan-genomic landscape that cheaper and better sequencing technologies have made possible. MaizeGDB is accessible online at https://www.maizegdb.org.
Collapse
Affiliation(s)
- John L Portwood
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Margaret R Woodhouse
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Ethalinda K Cannon
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Jack M Gardiner
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Lisa C Harper
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Mary L Schaeffer
- Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Jesse R Walsh
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Taner Z Sen
- USDA-ARS Crop Improvement and Genetics Research Unit, Albany, CA 94710, USA.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Kyoung Tak Cho
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - David A Schott
- Department of Computer Science, Iowa State University, Ames, IA 50011, USA
| | - Bremen L Braun
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| | - Miranda Dietze
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Brittney Dunfee
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA
| | - Christine G Elsik
- Division of Animal Sciences, University of Missouri, Columbia, MO 65211, USA.,Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Nancy Manchanda
- Department of Ecology, Evolution and Organismal Biology, Iowa State University, Ames, IA 50011, USA
| | - Ed Coe
- Division of Plant Sciences, University of Missouri, Columbia, MO 65211, USA
| | - Marty Sachs
- USDA/ARS/MWA Soybean/Maize Germplasm, Pathology & Genetics Research Unit, Urbana, IL, 61801, USA
| | - Philip Stinard
- USDA/ARS/MWA Soybean/Maize Germplasm, Pathology & Genetics Research Unit, Urbana, IL, 61801, USA
| | - Josh Tolbert
- USDA/ARS/MWA Soybean/Maize Germplasm, Pathology & Genetics Research Unit, Urbana, IL, 61801, USA
| | - Shane Zimmerman
- USDA/ARS/MWA Soybean/Maize Germplasm, Pathology & Genetics Research Unit, Urbana, IL, 61801, USA
| | - Carson M Andorf
- USDA-ARS Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA
| |
Collapse
|
32
|
Peng H, Wang K, Chen Z, Cao Y, Gao Q, Li Y, Li X, Lu H, Du H, Lu M, Yang X, Liang C. MBKbase for rice: an integrated omics knowledgebase for molecular breeding in rice. Nucleic Acids Res 2020; 48:D1085-D1092. [PMID: 31624841 PMCID: PMC7145604 DOI: 10.1093/nar/gkz921] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 10/04/2019] [Accepted: 10/08/2019] [Indexed: 11/25/2022] Open
Abstract
To date, large amounts of genomic and phenotypic data have been accumulated in the fields of crop genetics and genomic research, and the data are increasing very quickly. However, the bottleneck to using big data in breeding is integrating the data and developing tools for revealing the relationship between genotypes and phenotypes. Here, we report a rice sub-database of an integrated omics knowledgebase (MBKbase-rice, www.mbkbase.org/rice), which integrates rice germplasm information, multiple reference genomes with a united set of gene loci, population sequencing data, phenotypic data, known alleles and gene expression data. In addition to basic data search functions, MBKbase provides advanced web tools for genotype searches at the population level and for visually displaying the relationship between genotypes and phenotypes. Furthermore, the database also provides online tools for comparing two samples by their genotypes and finding target germplasms by genotype or phenotype information, as well as for analyzing the user submitted SNP or sequence data to find important alleles in the germplasm. A soybean sub-database is planned for release in 3 months and wheat and maize will be added in 1–2 years. The data and tools integrated in MBKbase will facilitate research in crop functional genomics and molecular breeding.
Collapse
Affiliation(s)
- Hua Peng
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Kai Wang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Zhuo Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yinghao Cao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Qiang Gao
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Yan Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiuxiu Li
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hongwei Lu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Huilong Du
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Min Lu
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xin Yang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China
| | - Chengzhi Liang
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
33
|
Turner-Hissong SD, Mabry ME, Beissinger TM, Ross-Ibarra J, Pires JC. Evolutionary insights into plant breeding. CURRENT OPINION IN PLANT BIOLOGY 2020; 54:93-100. [PMID: 32325397 DOI: 10.1016/j.pbi.2020.03.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2019] [Revised: 01/20/2020] [Accepted: 03/04/2020] [Indexed: 06/11/2023]
Abstract
Crop domestication is a fascinating area of study, as shown by a multitude of recent reviews. Coupled with the increasing availability of genomic and phenomic resources in numerous crop species, insights from evolutionary biology will enable a deeper understanding of the genetic architecture and short-term evolution of complex traits, which can be used to inform selection strategies. Future advances in crop improvement will rely on the integration of population genetics with plant breeding methodology, and the development of community resources to support research in a variety of crop life histories and reproductive strategies. We highlight recent advances related to the role of selective sweeps and demographic history in shaping genetic architecture, how these breakthroughs can inform selection strategies, and the application of precision gene editing to leverage these connections.
Collapse
Affiliation(s)
- Sarah D Turner-Hissong
- Center for Population Biology, University of California, Davis, CA, USA; Department of Evolution and Ecology, University of California, Davis, CA, USA.
| | - Makenzie E Mabry
- Bond Life Science Center and Division of Biological Sciences, University of Missouri, Columbia, MO, USA
| | - Timothy M Beissinger
- Division of Plant Breeding Methodology, Department of Crop Science, Georg-August-Universtät, Göttingen, Germany; Center for Integrated Breeding Research, Georg-August-Universtät, Göttingen, Germany
| | - Jeffrey Ross-Ibarra
- Center for Population Biology, University of California, Davis, CA, USA; Department of Evolution and Ecology, University of California, Davis, CA, USA
| | - J Chris Pires
- Bond Life Science Center and Division of Biological Sciences, University of Missouri, Columbia, MO, USA
| |
Collapse
|
34
|
Naithani S, Gupta P, Preece J, D’Eustachio P, Elser JL, Garg P, Dikeman DA, Kiff J, Cook J, Olson A, Wei S, Tello-Ruiz MK, Mundo AF, Munoz-Pomer A, Mohammed S, Cheng T, Bolton E, Papatheodorou I, Stein L, Ware D, Jaiswal P. Plant Reactome: a knowledgebase and resource for comparative pathway analysis. Nucleic Acids Res 2020; 48:D1093-D1103. [PMID: 31680153 PMCID: PMC7145600 DOI: 10.1093/nar/gkz996] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 10/09/2019] [Accepted: 10/14/2019] [Indexed: 12/29/2022] Open
Abstract
Plant Reactome (https://plantreactome.gramene.org) is an open-source, comparative plant pathway knowledgebase of the Gramene project. It uses Oryza sativa (rice) as a reference species for manual curation of pathways and extends pathway knowledge to another 82 plant species via gene-orthology projection using the Reactome data model and framework. It currently hosts 298 reference pathways, including metabolic and transport pathways, transcriptional networks, hormone signaling pathways, and plant developmental processes. In addition to browsing plant pathways, users can upload and analyze their omics data, such as the gene-expression data, and overlay curated or experimental gene-gene interaction data to extend pathway knowledge. The curation team actively engages researchers and students on gene and pathway curation by offering workshops and online tutorials. The Plant Reactome supports, implements and collaborates with the wider community to make data and tools related to genes, genomes, and pathways Findable, Accessible, Interoperable and Re-usable (FAIR).
Collapse
Affiliation(s)
- Sushma Naithani
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Parul Gupta
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Justin Preece
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | | | - Justin L Elser
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Priyanka Garg
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Daemon A Dikeman
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Jason Kiff
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Justin Cook
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | | | | | - Alfonso Munoz-Pomer
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
| | - Suhaib Mohammed
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Evan Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Irene Papatheodorou
- European Molecular Biology Laboratory - European Bioinformatics Institute, Hinxton, UK
| | - Lincoln Stein
- Ontario Institute for Cancer Research, Toronto, ON, Canada
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
- USDA-ARS, RW Holley Center for Agriculture & Health, Ithaca, NY, USA
| | - Pankaj Jaiswal
- Department of Botany & Plant Pathology, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
35
|
Blake VC, Woodhouse MR, Lazo GR, Odell SG, Wight CP, Tinker NA, Wang Y, Gu YQ, Birkett CL, Jannink JL, Matthews DE, Hane DL, Michel SL, Yao E, Sen TZ. GrainGenes: centralized small grain resources and digital platform for geneticists and breeders. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5513438. [PMID: 31210272 DOI: 10.1093/database/baz065] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 04/18/2019] [Accepted: 04/22/2019] [Indexed: 11/13/2022]
Abstract
GrainGenes (https://wheat.pw.usda.gov or https://graingenes.org) is an international centralized repository for curated, peer-reviewed datasets useful to researchers working on wheat, barley, rye and oat. GrainGenes manages genomic, genetic, germplasm and phenotypic datasets through a dynamically generated web interface for facilitated data discovery. Since 1992, GrainGenes has served geneticists and breeders in both the public and private sectors on six continents. Recently, several new datasets were curated into the database along with new tools for analysis. The GrainGenes homepage was enhanced by making it more visually intuitive and by adding links to commonly used pages. Several genome assemblies and genomic tracks are displayed through the genome browsers at GrainGenes, including the Triticum aestivum (bread wheat) cv. 'Chinese Spring' IWGSC RefSeq v1.0 genome assembly, the Aegilops tauschii (D genome progenitor) Aet v4.0 genome assembly, the Triticum turgidum ssp. dicoccoides (wild emmer wheat) cv. 'Zavitan' WEWSeq v.1.0 genome assembly, a T. aestivum (bread wheat) pangenome, the Hordeum vulgare (barley) cv. 'Morex' IBSC genome assembly, the Secale cereale (rye) select 'Lo7' assembly, a partial hexaploid Avena sativa (oat) assembly and the Triticum durum cv. 'Svevo' (durum wheat) RefSeq Release 1.0 assembly. New genetic maps and markers were added and can be displayed through CMAP. Quantitative trait loci, genetic maps and genes from the Wheat Gene Catalogue are indexed and linked through the Wheat Information System (WheatIS) portal. Training videos were created to help users query and reach the data they need. GSP (Genome Specific Primers) and PIECE2 (Plant Intron Exon Comparison and Evolution) tools were implemented and are available to use. As more small grains reference sequences become available, GrainGenes will play an increasingly vital role in helping researchers improve crops.
Collapse
Affiliation(s)
- Victoria C Blake
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Margaret R Woodhouse
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Gerard R Lazo
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Sarah G Odell
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA.,Department of Plant Sciences, University of California, Davis, CA, USA
| | - Charlene P Wight
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | - Nicholas A Tinker
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, Canada
| | - Yi Wang
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Yong Q Gu
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Clay L Birkett
- Robert Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, USA
| | - Jean-Luc Jannink
- Robert Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, USA.,Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
| | - Dave E Matthews
- Robert Holley Center, United States Department of Agriculture-Agricultural Research Service, Ithaca, NY, USA
| | - David L Hane
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Steve L Michel
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA
| | - Eric Yao
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA.,Department of Bioengineering, University of California, Berkeley, Berkeley, CA, USA
| | - Taner Z Sen
- Western Regional Research Center, Crop Improvement and Genetics Research Unit, United States Department of Agriculture-Agricultural Research Service, Albany, CA, USA.,Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, USA
| |
Collapse
|
36
|
Spoor S, Cheng CH, Sanderson LA, Condon B, Almsaeed A, Chen M, Bretaudeau A, Rasche H, Jung S, Main D, Bett K, Staton M, Wegrzyn JL, Feltus FA, Ficklin SP. Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5532788. [PMID: 31328773 PMCID: PMC6643302 DOI: 10.1093/database/baz077] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2019] [Revised: 05/12/2019] [Accepted: 05/22/2019] [Indexed: 12/20/2022]
Abstract
Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User’s Guide and Developer’s Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.
Collapse
Affiliation(s)
- Shawna Spoor
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | | | - Bradford Condon
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Abdullah Almsaeed
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Ming Chen
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Anthony Bretaudeau
- INRA, UMR IGEPP, BIPAA/GenOuest, INRIA/Irisa - Campus de Beaulieu, Rennes Cedex, France
| | - Helena Rasche
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA, USA
| | - Kirstin Bett
- Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA.,Computational Biology Core, Institute for Systems Genomics, University of Connecticut, Storrs, CT, USA
| | - F Alex Feltus
- Dept. of Genetics and Biochemistry, Clemson University, Clemson, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA, USA
| |
Collapse
|
37
|
Wegrzyn JL, Falk T, Grau E, Buehler S, Ramnath R, Herndon N. Cyberinfrastructure and resources to enable an integrative approach to studying forest trees. Evol Appl 2020; 13:228-241. [PMID: 31892954 PMCID: PMC6935593 DOI: 10.1111/eva.12860] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Revised: 08/11/2019] [Accepted: 08/14/2019] [Indexed: 12/19/2022] Open
Abstract
Sequencing technologies and bioinformatic approaches are now available to resolve the challenges associated with complex and heterozygous genomes. Increased access to less expensive and more effective instrumentation will contribute to a wealth of high-quality plant genomes in the next few years. In the meantime, more than 370 tree species are associated with public projects in primary repositories that are interrogating expression profiles, identifying variants, or analyzing targeted capture without a high-quality reference genome. Genomic data from these projects generates sequences that represent intermediate assemblies for transcriptomes and genomes. These data contribute to forest tree biology, but the associated sequence remains trapped in supplemental files that are poorly integrated in plant community databases and comparative genomic platforms. Successful implementation of life science cyberinfrastructure is improving data standards, ontologies, analytic workflows, and integrated database platforms for both model and non-model plant species. Unique to forest trees with large populations that are long-lived, outcrossing, and genetically diverse, the phenotypic and environmental metrics associated with georeferenced populations are just as important as the genomic data sampled for each individual. To address questions related to forest health and productivity, cyberinfrastructure must keep pace with the magnitude of genomic and phenomic sampling of larger populations. This review examines the current landscape of cyberinfrastructure, with an emphasis on best practices and resources to align community data with the Findable, Accessible, Interoperable, and Reusable (FAIR) guidelines.
Collapse
Affiliation(s)
- Jill L. Wegrzyn
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Taylor Falk
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Emily Grau
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Sean Buehler
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Risharde Ramnath
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| | - Nic Herndon
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut
| |
Collapse
|
38
|
Sahruzaini NA, Rejab NA, Harikrishna JA, Khairul Ikram NK, Ismail I, Kugan HM, Cheng A. Pulse Crop Genetics for a Sustainable Future: Where We Are Now and Where We Should Be Heading. FRONTIERS IN PLANT SCIENCE 2020; 11:531. [PMID: 32431724 PMCID: PMC7212832 DOI: 10.3389/fpls.2020.00531] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Accepted: 04/07/2020] [Indexed: 05/12/2023]
Abstract
The last decade has witnessed dramatic changes in global food consumption patterns mainly because of population growth and economic development. Food substitutions for healthier eating, such as swapping regular servings of meat for protein-rich crops, is an emerging diet trend that may shape the future of food systems and the environment worldwide. To meet the erratic consumer demand in a rapidly changing world where resources become increasingly scarce due largely to anthropogenic activity, the need to develop crops that benefit both human health and the environment has become urgent. Legumes are often considered to be affordable plant-based sources of dietary proteins. Growing legumes provides significant benefits to cropping systems and the environment because of their natural ability to perform symbiotic nitrogen fixation, which enhances both soil fertility and water-use efficiency. In recent years, the focus in legume research has seen a transition from merely improving economically important species such as soybeans to increasingly turning attention to some promising underutilized species whose genetic resources hold the potential to address global challenges such as food security and climate change. Pulse crops have gained in popularity as an affordable source of food or feed; in fact, the United Nations designated 2016 as the International Year of Pulses, proclaiming their critical role in enhancing global food security. Given that many studies have been conducted on numerous underutilized pulse crops across the world, we provide a systematic review of the related literature to identify gaps and opportunities in pulse crop genetics research. We then discuss plausible strategies for developing and using pulse crops to strengthen food and nutrition security in the face of climate and anthropogenic changes.
Collapse
Affiliation(s)
- Nurul Amylia Sahruzaini
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Nur Ardiyana Rejab
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- Centre for Research in Biotechnology for Agriculture (CEBAR), University of Malaya, Kuala Lumpur, Malaysia
| | - Jennifer Ann Harikrishna
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- Centre for Research in Biotechnology for Agriculture (CEBAR), University of Malaya, Kuala Lumpur, Malaysia
| | - Nur Kusaira Khairul Ikram
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- Centre for Research in Biotechnology for Agriculture (CEBAR), University of Malaya, Kuala Lumpur, Malaysia
| | - Ismanizan Ismail
- Institute of Systems Biology (INBIOSIS), Universiti Kebangsaan Malaysia, Bangi, Malaysia
| | - Hazel Marie Kugan
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
| | - Acga Cheng
- Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia
- *Correspondence: Acga Cheng,
| |
Collapse
|
39
|
Jung S, Lee T, Cheng CH, Buble K, Zheng P, Yu J, Humann J, Ficklin SP, Gasic K, Scott K, Frank M, Ru S, Hough H, Evans K, Peace C, Olmstead M, DeVetter LW, McFerson J, Coe M, Wegrzyn JL, Staton ME, Abbott AG, Main D. 15 years of GDR: New data and functionality in the Genome Database for Rosaceae. Nucleic Acids Res 2019; 47:D1137-D1145. [PMID: 30357347 PMCID: PMC6324069 DOI: 10.1093/nar/gky1000] [Citation(s) in RCA: 209] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2018] [Accepted: 10/09/2018] [Indexed: 12/13/2022] Open
Abstract
The Genome Database for Rosaceae (GDR, https://www.rosaceae.org) is an integrated web-based community database resource providing access to publicly available genomics, genetics and breeding data and data-mining tools to facilitate basic, translational and applied research in Rosaceae. The volume of data in GDR has increased greatly over the last 5 years. The GDR now houses multiple versions of whole genome assembly and annotation data from 14 species, made available by recent advances in sequencing technology. Annotated and searchable reference transcriptomes, RefTrans, combining peer-reviewed published RNA-Seq as well as EST datasets, are newly available for major crop species. Significantly more quantitative trait loci, genetic maps and markers are available in MapViewer, a new visualization tool that better integrates with other pages in GDR. Pathways can be accessed through the new GDR Cyc Pathways databases, and synteny among the newest genome assemblies from eight species can be viewed through the new synteny browser, SynView. Collated single-nucleotide polymorphism diversity data and phenotypic data from publicly available breeding datasets are integrated with other relevant data. Also, the new Breeding Information Management System allows breeders to upload, manage and analyze their private breeding data within the secure GDR server with an option to release data publicly.
Collapse
Affiliation(s)
- Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Taein Lee
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Katheryn Buble
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Ping Zheng
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Jodi Humann
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Stephen P Ficklin
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Ksenija Gasic
- Department of Plant and Environmental Sciences, Clemson University, Clemson, SC 29634-0310, USA
| | - Kristin Scott
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Morgan Frank
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Sushan Ru
- Department of Agronomy and Plant Genetics, University of Minnesota, St Paul, MN 55108, USA
| | - Heidi Hough
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Kate Evans
- Department of Horticulture, Washington State University Tree Fruit Research and Extension Center, Wenatchee, WA 98801, USA
| | - Cameron Peace
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| | - Mercy Olmstead
- Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA
| | - Lisa W DeVetter
- Department of Horticulture, Washington State University, Northwestern Washington Research and Extension Center, Mount Vernon, WA 98273, USA
| | - James McFerson
- Department of Horticulture, Washington State University Tree Fruit Research and Extension Center, Wenatchee, WA 98801, USA
| | - Michael Coe
- Cedar Lake Research Group, LLC, Portland, OR 97293, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Margaret E Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA
| | - Albert G Abbott
- Forest Health Research and Extension Center, University of Kentucky, Lexington, KY 40546-0091, USA
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA
| |
Collapse
|
40
|
Naithani S, Gupta P, Preece J, Garg P, Fraser V, Padgitt-Cobb LK, Martin M, Vining K, Jaiswal P. Involving community in genes and pathway curation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2019; 2019:5289625. [PMID: 30649295 PMCID: PMC6334007 DOI: 10.1093/database/bay146] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/08/2018] [Accepted: 12/11/2018] [Indexed: 12/25/2022]
Abstract
Biocuration plays a crucial role in building databases and complex systems-level platforms required for processing, annotating and analyzing ‘Big Data’ in biology. However, biocuration efforts cannot keep pace with a dramatic increase in the production of omics data; this presents one of the bottlenecks in genomics. In two pathway curation jamborees, Plant Reactome curators tested strategies for introducing researchers to pathway curation tools, harnessing biologists’ expertise in curating plant pathways and developing a network of community biocurators. We summarize the strategy, workflow and outcomes of these exercises, and discuss the role of community biocuration in advancing databases and genomic resources.
Collapse
Affiliation(s)
- Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Parul Gupta
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Priyanka Garg
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Valerie Fraser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA.,Molecular and Cellular Biology Graduate Program, Oregon State University, Corvallis, OR, USA
| | | | - Matthew Martin
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Kelly Vining
- Department of Horticulture, Oregon State University, Corvallis, OR, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
41
|
Pommier C, Michotey C, Cornut G, Roumet P, Duchêne E, Flores R, Lebreton A, Alaux M, Durand S, Kimmel E, Letellier T, Merceron G, Laine M, Guerche C, Loaec M, Steinbach D, Laporte MA, Arnaud E, Quesneville H, Adam-Blondon AF. Applying FAIR Principles to Plant Phenotypic Data Management in GnpIS. PLANT PHENOMICS (WASHINGTON, D.C.) 2019; 2019:1671403. [PMID: 33313522 PMCID: PMC7718628 DOI: 10.34133/2019/1671403] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 04/08/2019] [Indexed: 05/19/2023]
Abstract
GnpIS is a data repository for plant phenomics that stores whole field and greenhouse experimental data including environment measures. It allows long-term access to datasets following the FAIR principles: Findable, Accessible, Interoperable, and Reusable, by using a flexible and original approach. It is based on a generic and ontology driven data model and an innovative software architecture that uncouples data integration, storage, and querying. It takes advantage of international standards including the Crop Ontology, MIAPPE, and the Breeding API. GnpIS allows handling data for a wide range of species and experiment types, including multiannual perennial plants experimental network or annual plant trials with either raw data, i.e., direct measures, or computed traits. It also ensures the integration and the interoperability among phenotyping datasets and with genotyping data. This is achieved through a careful curation and annotation of the key resources conducted in close collaboration with the communities providing data. Our repository follows the Open Science data publication principles by ensuring citability of each dataset. Finally, GnpIS compliance with international standards enables its interoperability with other data repositories hence allowing data links between phenotype and other data types. GnpIS can therefore contribute to emerging international federations of information systems.
Collapse
Affiliation(s)
- C. Pommier
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - C. Michotey
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - G. Cornut
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - P. Roumet
- AGAP, Univ Montpellier, CIRAD, INRA, Montpellier SupAgro, Montpellier, France
| | - E. Duchêne
- UMR SVQV, 28 rue de Herrlisheim, B.P. 20507, 68021 Colmar, France
| | - R. Flores
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - A. Lebreton
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - M. Alaux
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - S. Durand
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - E. Kimmel
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - T. Letellier
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - G. Merceron
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - M. Laine
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - C. Guerche
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - M. Loaec
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - D. Steinbach
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | - M. A. Laporte
- Bioversity International, parc Scientifique Agropolis II, 34397 Montpellier cedex 5, France
| | - E. Arnaud
- Bioversity International, parc Scientifique Agropolis II, 34397 Montpellier cedex 5, France
| | - H. Quesneville
- URGI, INRA, Université Paris-Saclay, 78026 Versailles, France
| | | |
Collapse
|
42
|
Venkatesan A, Tagny Ngompe G, Hassouni NE, Chentli I, Guignon V, Jonquet C, Ruiz M, Larmande P. Agronomic Linked Data (AgroLD): A knowledge-based system to enable integrative biology in agronomy. PLoS One 2018; 13:e0198270. [PMID: 30500839 PMCID: PMC6269127 DOI: 10.1371/journal.pone.0198270] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2018] [Accepted: 09/03/2018] [Indexed: 12/22/2022] Open
Abstract
Recent advances in high-throughput technologies have resulted in a tremendous increase in the amount of omics data produced in plant science. This increase, in conjunction with the heterogeneity and variability of the data, presents a major challenge to adopt an integrative research approach. We are facing an urgent need to effectively integrate and assimilate complementary datasets to understand the biological system as a whole. The Semantic Web offers technologies for the integration of heterogeneous data and their transformation into explicit knowledge thanks to ontologies. We have developed the Agronomic Linked Data (AgroLD- www.agrold.org), a knowledge-based system relying on Semantic Web technologies and exploiting standard domain ontologies, to integrate data about plant species of high interest for the plant science community e.g., rice, wheat, arabidopsis. We present some integration results of the project, which initially focused on genomics, proteomics and phenomics. AgroLD is now an RDF (Resource Description Format) knowledge base of 100M triples created by annotating and integrating more than 50 datasets coming from 10 data sources-such as Gramene.org and TropGeneDB-with 10 ontologies-such as the Gene Ontology and Plant Trait Ontology. Our evaluation results show users appreciate the multiple query modes which support different use cases. AgroLD's objective is to offer a domain specific knowledge platform to solve complex biological and agronomical questions related to the implication of genes/proteins in, for instances, plant disease resistance or high yield traits. We expect the resolution of these questions to facilitate the formulation of new scientific hypotheses to be validated with a knowledge-oriented approach.
Collapse
Affiliation(s)
- Aravind Venkatesan
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Gildas Tagny Ngompe
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Nordine El Hassouni
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- UMR AGAP, CIRAD, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
| | - Imene Chentli
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Valentin Guignon
- South Green Bioinformatics Platform, Montpellier, France
- Bioversity International, Montpellier, France
| | - Clement Jonquet
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
| | - Manuel Ruiz
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- UMR AGAP, CIRAD, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
- AGAP, Univ. of Montpellier, CIRAD, INRA, INRIA, SupAgro, Montpellier, France
| | - Pierre Larmande
- Institut de Biologie Computationnelle (IBC), Univ. of Montpellier, Montpellier, France
- LIRMM, Univ. of Montpellier & CNRS, Montpellier, France
- South Green Bioinformatics Platform, Montpellier, France
- DIADE, IRD, Univ. of Montpellier, Montpellier, France
| |
Collapse
|