1
|
Xie C, Gao J, Chen J, Zhao X. PotatoG-DKB: a potato gene-disease knowledge base mined from biological literature. PeerJ 2024; 12:e18202. [PMID: 39372719 PMCID: PMC11456291 DOI: 10.7717/peerj.18202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 09/09/2024] [Indexed: 10/08/2024] Open
Abstract
Background Potato is the fourth largest food crop in the world, but potato cultivation faces serious threats from various diseases and pests. Despite significant advancements in research on potato disease resistance, these findings are scattered across numerous publications. For researchers, obtaining relevant knowledge by reading and organizing a large body of literature is a time-consuming and labor-intensive process. Therefore, systematically extracting and organizing the relationships between potato genes and diseases from the literature to establish a potato gene-disease knowledge base is particularly important. Unfortunately, there is currently no such gene-disease knowledge base available. Methods In this study, we constructed a Potato Gene-Disease Knowledge Base (PotatoG-DKB) using natural language processing techniques and large language models. We used PubMed as the data source and obtained 2,906 article abstracts related to potato biology, extracted entities and relationships between potato genes and related disease, and stored them in a Neo4j database. Using web technology, we also constructed the Potato Gene-Disease Knowledge Portal (PotatoG-DKP), an interactive visualization platform. Results PotatoG-DKB encompasses 22 entity types (such as genes, diseases, species, etc.) of 5,206 nodes and 9,443 edges between entities (for example, gene-disease, pathogen-disease, etc.). PotatoG-DKP can intuitively display associative relationships extracted from literature and is a powerful assistant for potato biologists and breeders to understand potato pathogenesis and disease resistance. More details about PotatoG-DKP can be obtained at https://www.potatogd.com.cn/.
Collapse
Affiliation(s)
- Congjiao Xie
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, Inner Mongolia, China
| | - Jing Gao
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, Inner Mongolia, China
- Inner Mongolia Autonomous Region Government Service and Data Management Bureau, Hohhot, Inner Mongolia, China
| | - Junjie Chen
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
- Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application for Agriculture and Animal Husbandry, Hohhot, Inner Mongolia, China
| | - Xuyang Zhao
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| |
Collapse
|
2
|
Amusat OO, Hegde H, Mungall CJ, Giannakou A, Byers NP, Gunter D, Fagnan K, Ramakrishnan L. Automated annotation of scientific texts for ML-based keyphrase extraction and validation. Database (Oxford) 2024; 2024:baae093. [PMID: 39331731 DOI: 10.1093/database/baae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2024] [Revised: 06/28/2024] [Accepted: 08/12/2024] [Indexed: 09/29/2024]
Abstract
Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lack the essential metadata required for researchers to find, curate, and search them effectively. The lack of metadata poses a significant challenge in the utilization of these data sets. Machine learning (ML)-based metadata extraction techniques have emerged as a potentially viable approach to automatically annotating scientific data sets with the metadata necessary for enabling effective search. Text labeling, usually performed manually, plays a crucial role in validating machine-extracted metadata. However, manual labeling is time-consuming and not always feasible; thus, there is a need to develop automated text labeling techniques in order to accelerate the process of scientific innovation. This need is particularly urgent in fields such as environmental genomics and microbiome science, which have historically received less attention in terms of metadata curation and creation of gold-standard text mining data sets. In this paper, we present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts, with specific applications in environmental genomics. Our techniques show the potential of two new ways to leverage existing information that is only available for select documents within a corpus to validate ML models, which can then be used to describe the remaining documents in the corpus. The first technique exploits relationships between different types of data sources related to the same research study, such as publications and proposals. The second technique takes advantage of domain-specific controlled vocabularies or ontologies. In this paper, we detail applying these approaches in the context of environmental genomics research for ML-generated metadata validation. Our results show that the proposed label assignment approaches can generate both generic and highly specific text labels for the unlabeled texts, with up to 44% of the labels matching with those suggested by a ML keyword extraction algorithm.
Collapse
Affiliation(s)
- Oluwamayowa O Amusat
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Harshad Hegde
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Anna Giannakou
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Neil P Byers
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Dan Gunter
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Kjiersten Fagnan
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| | - Lavanya Ramakrishnan
- Scientific Data Division, Lawrence Berkeley National Laboratory, 1 Cyclotron road, Berkeley, CA 94720, United States
| |
Collapse
|
3
|
Deans AR, Nastasi LF, Davis C. GallOnt: An ontology for plant gall phenotypes. Biodivers Data J 2024; 12:e128585. [PMID: 39229384 PMCID: PMC11369494 DOI: 10.3897/bdj.12.e128585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Accepted: 08/18/2024] [Indexed: 09/05/2024] Open
Abstract
Galls are novel plant structures that develop in response to select biotic stressors. These structures, extended phenotypes of the inducer, usually serve to protect and feed the inducer or its progeny. This life history strategy has evolved dozens of times, and tens of thousands of species - including many bacteria, fungi, nematodes, mites and insects - are capable of manipulating plants in this way. The variation in gall phenotypes is extraordinary across species but usually predictable for each species of inducer. We introduce here a new ontology, GallOnt, that facilitates consistent descriptions and the semantic representation of and reasoning over plant gall phenotype data. GallOnt was largely developed from ontologies in the Open Biological and Biomedical Ontology (OBO) Foundry and stands to connect plant gall phenotypes to knowledge derived from model plant systems, including genotype-phenotype and agricultural research. We also introduce the idea of a new gall data standard - Minimum Information for the Description of Galls (MIDG version 0.1) - as a starting point for discussions regarding cecidology best practices.
Collapse
Affiliation(s)
- Andrew R Deans
- Frost Entomological Museum, The Pennsylvania State University, University Park, United States of AmericaFrost Entomological Museum, The Pennsylvania State UniversityUniversity ParkUnited States of America
| | - Louis Frank Nastasi
- Frost Entomological Museum, The Pennsylvania State University, University Park, United States of AmericaFrost Entomological Museum, The Pennsylvania State UniversityUniversity ParkUnited States of America
| | - Charles Davis
- Frost Entomological Museum, The Pennsylvania State University, University Park, United States of AmericaFrost Entomological Museum, The Pennsylvania State UniversityUniversity ParkUnited States of America
| |
Collapse
|
4
|
Vandepoele K, Thierens S, Van Bel M. Application of orthology and network biology to infer gene functions in non-model plants. PHYSIOLOGIA PLANTARUM 2024; 176:e14441. [PMID: 39019770 DOI: 10.1111/ppl.14441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 02/12/2024] [Accepted: 02/13/2024] [Indexed: 07/19/2024]
Abstract
Approximately 60% of the genes and gene products in the model species Arabidopsis thaliana have been functionally characterized. In non-model plant species, the functional annotation of the gene space is largely based on homology, with the assumption that genes with shared common ancestry have conserved functions. However, the wide variety in possible morphological, physiological, and ecological differences between plant species gives rise to many species- and clade-specific genes, for which this transfer of knowledge is not possible. Other complications, such as difficulties with genetic transformation, the absence of large-scale mutagenesis methods, and long generation times, further lead to the slow characterization of genes in non-model species. Here, we discuss different resources that integrate plant gene function information. Different approaches that support the functional annotation of gene products, based on orthology or network biology, are described. While sequence-based tools to characterize the functional landscape in non-model species are maturing and becoming more readily available, easy-to-use network-based methods inferring plant gene functions are not as prevalent and have limited functionality.
Collapse
Affiliation(s)
- Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
- VIB Center for AI & Computational Biology, VIB, Ghent, Belgium
| | - Sander Thierens
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| | - Michiel Van Bel
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, VIB, Ghent, Belgium
| |
Collapse
|
5
|
Nédellec C, Sauvion C, Bossy R, Borovikova M, Deléger L. TaeC: A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature. PLoS One 2024; 19:e0305475. [PMID: 38870159 PMCID: PMC11175518 DOI: 10.1371/journal.pone.0305475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 05/31/2024] [Indexed: 06/15/2024] Open
Abstract
Wheat varieties show a large diversity of traits and phenotypes. Linking them to genetic variability is essential for shorter and more efficient wheat breeding programs. A growing number of plant molecular information networks provide interlinked interoperable data to support the discovery of gene-phenotype interactions. A large body of scientific literature and observational data obtained in-field and under controlled conditions document wheat breeding experiments. The cross-referencing of this complementary information is essential. Text from databases and scientific publications has been identified early on as a relevant source of information. However, the wide variety of terms used to refer to traits and phenotype values makes it difficult to find and cross-reference the textual information, e.g. simple dictionary lookup methods miss relevant terms. Corpora with manually annotated examples are thus needed to evaluate and train textual information extraction methods. While several corpora contain annotations of human and animal phenotypes, no corpus is available for plant traits. This hinders the evaluation of text mining-based crop knowledge graphs (e.g. AgroLD, KnetMiner, WheatIS-FAIDARE) and limits the ability to train machine learning methods and improve the quality of information. The Triticum aestivum trait Corpus is a new gold standard for traits and phenotypes of wheat. It consists of 528 PubMed references that are fully annotated by trait, phenotype, and species. We address the interoperability challenge of crossing sparse assay data and publications by using the Wheat Trait and Phenotype Ontology to normalize trait mentions and the species taxonomy of the National Center for Biotechnology Information to normalize species. The paper describes the construction of the corpus. A study of the performance of state-of-the-art language models for both named entity recognition and linking tasks trained on the corpus shows that it is suitable for training and evaluation. This corpus is currently the most comprehensive manually annotated corpus for natural language processing studies on crop phenotype information from the literature.
Collapse
Affiliation(s)
- Claire Nédellec
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Clara Sauvion
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Robert Bossy
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| | - Mariya Borovikova
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- TETIS, Univ. Montpellier, AgroParisTech, CIRAD, CNRS, INRAE, Montpellier, France
| | - Louise Deléger
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
| |
Collapse
|
6
|
Gao Y, Zhou Q, Luo J, Xia C, Zhang Y, Yue Z. Crop-GPA: an integrated platform of crop gene-phenotype associations. NPJ Syst Biol Appl 2024; 10:15. [PMID: 38346982 PMCID: PMC10861494 DOI: 10.1038/s41540-024-00343-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 01/22/2024] [Indexed: 02/15/2024] Open
Abstract
With the increasing availability of large-scale biology data in crop plants, there is an urgent demand for a versatile platform that fully mines and utilizes the data for modern molecular breeding. We present Crop-GPA ( https://crop-gpa.aielab.net ), a comprehensive and functional open-source platform for crop gene-phenotype association data. The current Crop-GPA provides well-curated information on genes, phenotypes, and their associations (GPAs) to researchers through an intuitive interface, dynamic graphical visualizations, and efficient online tools. Two computational tools, GPA-BERT and GPA-GCN, are specifically developed and integrated into Crop-GPA, facilitating the automatic extraction of gene-phenotype associations from bio-crop literature and predicting unknown relations based on known associations. Through usage examples, we demonstrate how our platform enables the exploration of complex correlations between genes and phenotypes in crop plants. In summary, Crop-GPA serves as a valuable multi-functional resource, empowering the crop research community to gain deeper insights into the biological mechanisms of interest.
Collapse
Affiliation(s)
- Yujia Gao
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Qian Zhou
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Jiaxin Luo
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Chuan Xia
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Youhua Zhang
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China.
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Beidou Precision Agriculture Information Engineering Research Center, Anhui Agricultural University, Hefei, Anhui, 230036, China.
| |
Collapse
|
7
|
Cooper L, Elser J, Laporte MA, Arnaud E, Jaiswal P. Planteome 2024 Update: Reference Ontologies and Knowledgebase for Plant Biology. Nucleic Acids Res 2024; 52:D1548-D1555. [PMID: 38055832 PMCID: PMC10767901 DOI: 10.1093/nar/gkad1028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 10/14/2023] [Accepted: 10/23/2023] [Indexed: 12/08/2023] Open
Abstract
The Planteome project (https://planteome.org/) provides a suite of reference and crop-specific ontologies and an integrated knowledgebase of plant genomics data. The plant genomics data in the Planteome has been obtained through manual and automated curation and sourced from more than 40 partner databases and resources. Here, we report on updates to the Planteome reference ontologies, namely, the Plant Ontology (PO), Trait Ontology (TO), the Plant Experimental Conditions Ontology (PECO), and integration of species/crop-specific vocabularies from our partners, the Crop Ontology (CO) into the TO ontology graph. Currently, 11 CO vocabularies are integrated into the Planteome with the addition of yam, sorghum, and potato since 2018. In addition, the size of the annotation database has increased by 34%, and the number of bioentities (genes, proteins, etc.) from 125 plant taxa has increased by 72%. We developed new tools to facilitate user requests and improvements to the CO vocabularies, and to allow fast searching and browsing of PO terms and definitions. These enhancements and future changes to automate the TO-CO mappings and knowledge discovery tools ensure that the Planteome will continue to be a valuable resource for plant biology.
Collapse
Affiliation(s)
- Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | | | - Elizabeth Arnaud
- Digital Inclusion, Biodiversity International, 34397 Montpellier, France
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| |
Collapse
|
8
|
Dong X, Zhao K, Wang Q, Wu X, Huang Y, Wu X, Zhang T, Dong Y, Gao Y, Chen P, Liu Y, Chen D, Wang S, Yang X, Yang J, Wang Y, Gao Z, Wu X, Bai Q, Li S, Hao G. PlantPAD: a platform for large-scale image phenomics analysis of disease in plant science. Nucleic Acids Res 2024; 52:D1556-D1568. [PMID: 37897364 PMCID: PMC10767946 DOI: 10.1093/nar/gkad917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 09/21/2023] [Accepted: 10/13/2023] [Indexed: 10/30/2023] Open
Abstract
Plant disease, a huge burden, can cause yield loss of up to 100% and thus reduce food security. Actually, smart diagnosing diseases with plant phenomics is crucial for recovering the most yield loss, which usually requires sufficient image information. Hence, phenomics is being pursued as an independent discipline to enable the development of high-throughput phenotyping for plant disease. However, we often face challenges in sharing large-scale image data due to incompatibilities in formats and descriptions provided by different communities, limiting multidisciplinary research exploration. To this end, we build a Plant Phenomics Analysis of Disease (PlantPAD) platform with large-scale information on disease. Our platform contains 421 314 images, 63 crops and 310 diseases. Compared to other databases, PlantPAD has extensive, well-annotated image data and in-depth disease information, and offers pre-trained deep-learning models for accurate plant disease diagnosis. PlantPAD supports various valuable applications across multiple disciplines, including intelligent disease diagnosis, disease education and efficient disease detection and control. Through three applications of PlantPAD, we show the easy-to-use and convenient functions. PlantPAD is mainly oriented towards biologists, computer scientists, plant pathologists, farm managers and pesticide scientists, which may easily explore multidisciplinary research to fight against plant diseases. PlantPAD is freely available at http://plantpad.samlab.cn.
Collapse
Affiliation(s)
- Xinyu Dong
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Kejun Zhao
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Qi Wang
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
- Text Computing & Cognitive Intelligence Engineering Research Center of National Education Ministry, Guizhou University, Guiyang 550025, Guizhou, China
| | - Xingcai Wu
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Yuanqin Huang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Xue Wu
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Tianhan Zhang
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Yawen Dong
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Yangyang Gao
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Panfeng Chen
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Yingwei Liu
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Dongyu Chen
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Shuang Wang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Xiaoyan Yang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Jing Yang
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Yong Wang
- Department of Plant Pathology, Agriculture College, Guizhou University, Guiyang 550025, Guizhou, China
| | - Zhenran Gao
- New Rural Development Research Institute, Guizhou University, Guiyang 550025, Guizhou, China
| | - Xian Wu
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Qingrong Bai
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| | - Shaobo Li
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
| | - Gefei Hao
- State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang 550025, China
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China; Center for Research and Development of Fine Chemicals, Guizhou University, Guiyang 550025, China
| |
Collapse
|
9
|
Deng CH, Naithani S, Kumari S, Cobo-Simón I, Quezada-Rodríguez EH, Skrabisova M, Gladman N, Correll MJ, Sikiru AB, Afuwape OO, Marrano A, Rebollo I, Zhang W, Jung S. Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database (Oxford) 2023; 2023:baad088. [PMID: 38079567 PMCID: PMC10712715 DOI: 10.1093/database/baad088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 10/17/2023] [Accepted: 11/28/2023] [Indexed: 12/18/2023]
Abstract
Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium (https://www.agbiodata.org) to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL: https://www.agbiodata.org.
Collapse
Affiliation(s)
- Cecilia H Deng
- Molecular and Digital Breeding, New Cultivar Innovation, The New Zealand Institute for Plant and Food Research Limited, 120 Mt Albert Road, Auckland 1025, New Zealand
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
| | - Irene Cobo-Simón
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT, USA
- Institute of Forest Science (ICIFOR-INIA, CSIC), Madrid, Spain
| | - Elsa H Quezada-Rodríguez
- Departamento de Producción Agrícola y Animal, Universidad Autónoma Metropolitana-Xochimilco, Ciudad de México, México
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, México
| | - Maria Skrabisova
- Department of Biochemistry, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Nick Gladman
- Cold Spring Harbor Laboratory, 1 Bungtown Rd, Cold Spring Harbor, New York, NY 11724, USA
- U.S. Department of Agriculture-Agricultural Research Service, NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853, USA
| | - Melanie J Correll
- Agricultural and Biological Engineering Department, University of Florida, 1741 Museum Rd, Gainesville, FL 32611, USA
| | | | | | - Annarita Marrano
- Phoenix Bioinformatics, 39899 Balentine Drive, Suite 200, Newark, CA 94560, USA
| | | | - Wentao Zhang
- National Research Council Canada, 110 Gymnasium Pl, Saskatoon, Saskatchewan S7N 0W9, Canada
| | - Sook Jung
- Department of Horticulture, Washington State University, 303c Plant Sciences Building, Pullman, WA 99164-6414, USA
| |
Collapse
|
10
|
Dumschott K, Dörpholz H, Laporte MA, Brilhaus D, Schrader A, Usadel B, Neumann S, Arnaud E, Kranz A. Ontologies for increasing the FAIRness of plant research data. FRONTIERS IN PLANT SCIENCE 2023; 14:1279694. [PMID: 38098789 PMCID: PMC10720748 DOI: 10.3389/fpls.2023.1279694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 11/15/2023] [Indexed: 12/17/2023]
Abstract
The importance of improving the FAIRness (findability, accessibility, interoperability, reusability) of research data is undeniable, especially in the face of large, complex datasets currently being produced by omics technologies. Facilitating the integration of a dataset with other types of data increases the likelihood of reuse, and the potential of answering novel research questions. Ontologies are a useful tool for semantically tagging datasets as adding relevant metadata increases the understanding of how data was produced and increases its interoperability. Ontologies provide concepts for a particular domain as well as the relationships between concepts. By tagging data with ontology terms, data becomes both human- and machine- interpretable, allowing for increased reuse and interoperability. However, the task of identifying ontologies relevant to a particular research domain or technology is challenging, especially within the diverse realm of fundamental plant research. In this review, we outline the ontologies most relevant to the fundamental plant sciences and how they can be used to annotate data related to plant-specific experiments within metadata frameworks, such as Investigation-Study-Assay (ISA). We also outline repositories and platforms most useful for identifying applicable ontologies or finding ontology terms.
Collapse
Affiliation(s)
- Kathryn Dumschott
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Hannah Dörpholz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Marie-Angélique Laporte
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Dominik Brilhaus
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Andrea Schrader
- Data Science and Management & Cluster of Excellence on Plant Sciences (CEPLAS), University of Cologne, Cologne, Germany
| | - Björn Usadel
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science & Cluster of Excellence on Plant Sciences (CEPLAS), Faculty of Mathematics and Life Sciences, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
| | - Steffen Neumann
- Program Center MetaCom, Leibniz Institute of Plant Biochemistry, Halle, Germany
- German Centre for Integrative Biodiversity Research (iDiv), Halle-Jena-Leipzig, Germany
| | - Elizabeth Arnaud
- Digital Solutions Team, Digital Inclusion Lever, Bioversity International, Montpellier Office, Montpellier, France
| | - Angela Kranz
- Institute of Bio- and Geosciences (IBG-4: Bioinformatics) & Bioeconomy Science Center (BioSC), CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| |
Collapse
|
11
|
Clarke JL, Cooper LD, Poelchau MF, Berardini TZ, Elser J, Farmer AD, Ficklin S, Kumari S, Laporte MA, Nelson RT, Sadohara R, Selby P, Thessen AE, Whitehead B, Sen TZ. Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the Agbiodata Consortium. Database (Oxford) 2023; 2023:baad076. [PMID: 37971715 PMCID: PMC10653126 DOI: 10.1093/database/baad076] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Accepted: 10/17/2023] [Indexed: 11/19/2023]
Abstract
Over the last couple of decades, there has been a rapid growth in the number and scope of agricultural genetics, genomics and breeding databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources (https://www.agbiodata.org/databases) covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, respectively, conducted a Consortium-wide survey to assess the current status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data-sharing practices by AgBioData databases are in a fairly healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that, ontology use has not substantially changed since a similar survey was conducted in 2017. Based on our evaluation of the survey results, we recommend (i) providing training for database personnel in a specific data-sharing techniques, as well as in ontology use; (ii) further study on what metadata is shared, and how well it is shared among databases; (iii) promoting an understanding of data sharing and ontologies in the stakeholder community; (iv) improving data sharing and ontologies for specific phenotypic data types and formats; and (v) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means. Database URL https://www.agbiodata.org/databases.
Collapse
Affiliation(s)
- Jennifer L Clarke
- Department of Statistics and Department of Food Science and Technology, University of Nebraska–Lincoln, 340 Hardin Hall North Wing, Lincoln, NE 68583, USA
| | - Laurel D Cooper
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Monica F Poelchau
- USDA, Agricultural Research Service, National Agricultural Library, 10301 Baltimore Ave, Beltsville 20705, USA
| | - Tanya Z Berardini
- The Arabidopsis Information Resource and Phoenix Bioinformatic, 39899 Balentine Drive, Suite 200, Newark, CA, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, 2503 Cordley Hall, Corvallis, OR 97331, USA
| | - Andrew D Farmer
- National Center for Genome Resources, 2935 Rodeo Park Dr. E., Santa Fe, NM 87505, USA
| | - Stephen Ficklin
- Department of Horticulture, Washington State University, 249 Clark Hall, PO Box 646414, Pullman, WA 99164, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, NY 11724, USA
| | - Marie-Angélique Laporte
- Digital Inclusion, Bioversity International, Parc Scientifique Agropolis II, 1990 Bd de la Lironde, Montpellier 34397, France
| | - Rex T Nelson
- USDA, Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Iowa State University, 716 Farmhouse Lane, Ames, IA 50011, USA
| | - Rie Sadohara
- Department of Plant, Soil, and Microbial Sciences, Michigan State University, 1066 Bogue St, East Lansing, MI 48824, USA
| | - Peter Selby
- School of Integrative Plant Science, College of Agriculture and Life Sciences, Cornell University, 215 Garden Avenue, Ithaca, NY 14850, USA
| | - Anne E Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz, 1890 N. Revere Court, Mailstop F600, Aurora CO 80045, USA
| | - Brandon Whitehead
- Data Science and Informatics, Manaaki Whenua—Landcare Research, Ltd., Riddet Road, Massey University, Palmerston North 4472, New Zealand
| | - Taner Z Sen
- USDA, Agricultural Research Service, Crop Improvement Genetics Research Unit, Western Regional Research Center, 800 Buchanan St, Albany 94710, USA
- Department of Bioengineering, University of California, 306 Stanley Hall, Berkeley, CA 94720, USA
| |
Collapse
|
12
|
Zhang Y, Zhu Q, Shao Y, Jiang Y, Ouyang Y, Zhang L, Zhang W. Inferring Historical Introgression with Deep Learning. Syst Biol 2023; 72:1013-1038. [PMID: 37257491 DOI: 10.1093/sysbio/syad033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 05/28/2023] [Accepted: 05/30/2023] [Indexed: 06/02/2023] Open
Abstract
Resolving phylogenetic relationships among taxa remains a challenge in the era of big data due to the presence of genetic admixture in a wide range of organisms. Rapidly developing sequencing technologies and statistical tests enable evolutionary relationships to be disentangled at a genome-wide level, yet many of these tests are computationally intensive and rely on phased genotypes, large sample sizes, restricted phylogenetic topologies, or hypothesis testing. To overcome these difficulties, we developed a deep learning-based approach, named ERICA, for inferring genome-wide evolutionary relationships and local introgressed regions from sequence data. ERICA accepts sequence alignments of both population genomic data and multiple genome assemblies, and efficiently identifies discordant genealogy patterns and exchanged regions across genomes when compared with other methods. We further tested ERICA using real population genomic data from Heliconius butterflies that have undergone adaptive radiation and frequent hybridization. Finally, we applied ERICA to characterize hybridization and introgression in wild and cultivated rice, revealing the important role of introgression in rice domestication and adaptation. Taken together, our findings demonstrate that ERICA provides an effective method for teasing apart evolutionary relationships using whole genome data, which can ultimately facilitate evolutionary studies on hybridization and introgression.
Collapse
Affiliation(s)
- Yubo Zhang
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Qingjie Zhu
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Yi Shao
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Yanchen Jiang
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| | - Yidan Ouyang
- National Key Laboratory of Crop Genetic Improvement and National Centre of Plant Gene Research (Wuhan), Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430070, China
| | - Li Zhang
- Chinese Institute for Brain Research, Beijing 102206, China
| | - Wei Zhang
- State Key Laboratory of Protein and Plant Gene Research, Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Peking University, Beijing 100871, China
| |
Collapse
|
13
|
Stefancsik R, Balhoff JP, Balk MA, Ball RL, Bello SM, Caron AR, Chesler EJ, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA)-computational traits for the life sciences. Mamm Genome 2023; 34:364-378. [PMID: 37076585 PMCID: PMC10382347 DOI: 10.1007/s00335-023-09992-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/06/2023] [Indexed: 04/21/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focussed measurable trait data. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - James P Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC, 27517, USA
| | - Meghan A Balk
- Natural History Museum, University of Oslo, Oslo, Norway
| | - Robyn L Ball
- The Jackson Laboratory, Bar Harbor, ME, 04609, USA
| | | | - Anita R Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Laura W Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Christopher J Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO, 80045, USA
| | - Nicole Vasilevsky
- Data Collaboration Center, Critical Path Institute, Tucson, AZ, 85718, USA
| | | | | |
Collapse
|
14
|
Imbert B, Kreplak J, Flores RG, Aubert G, Burstin J, Tayeh N. Development of a knowledge graph framework to ease and empower translational approaches in plant research: a use-case on grain legumes. Front Artif Intell 2023; 6:1191122. [PMID: 37601035 PMCID: PMC10435283 DOI: 10.3389/frai.2023.1191122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 07/10/2023] [Indexed: 08/22/2023] Open
Abstract
While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for different traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo4j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo4j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities offered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes.
Collapse
Affiliation(s)
- Baptiste Imbert
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Jonathan Kreplak
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Raphaël-Gauthier Flores
- Université Paris-Saclay, INRAE, URGI, Versailles, France
- Université Paris-Saclay, INRAE, BioinfOmics, Plant Bioinformatics Facility, Versailles, France
| | - Grégoire Aubert
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Judith Burstin
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| | - Nadim Tayeh
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne, Univ. Bourgogne Franche-Comté, Dijon, France
| |
Collapse
|
15
|
Ruperao P, Rangan P, Shah T, Thakur V, Kalia S, Mayes S, Rathore A. The Progression in Developing Genomic Resources for Crop Improvement. Life (Basel) 2023; 13:1668. [PMID: 37629524 PMCID: PMC10455509 DOI: 10.3390/life13081668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 07/25/2023] [Indexed: 08/27/2023] Open
Abstract
Sequencing technologies have rapidly evolved over the past two decades, and new technologies are being continually developed and commercialized. The emerging sequencing technologies target generating more data with fewer inputs and at lower costs. This has also translated to an increase in the number and type of corresponding applications in genomics besides enhanced computational capacities (both hardware and software). Alongside the evolving DNA sequencing landscape, bioinformatics research teams have also evolved to accommodate the increasingly demanding techniques used to combine and interpret data, leading to many researchers moving from the lab to the computer. The rich history of DNA sequencing has paved the way for new insights and the development of new analysis methods. Understanding and learning from past technologies can help with the progress of future applications. This review focuses on the evolution of sequencing technologies, their significant enabling role in generating plant genome assemblies and downstream applications, and the parallel development of bioinformatics tools and skills, filling the gap in data analysis techniques.
Collapse
Affiliation(s)
- Pradeep Ruperao
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Parimalan Rangan
- ICAR-National Bureau of Plant Genetic Resources, PUSA Campus, New Delhi 110012, India;
| | - Trushar Shah
- International Institute of Tropical Agriculture (IITA), Nairobi 30709-00100, Kenya;
| | - Vivek Thakur
- Department of Systems & Computational Biology, School of Life Sciences, University of Hyderabad, Hyderabad 500046, India;
| | - Sanjay Kalia
- Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi 110003, India;
| | - Sean Mayes
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad 502324, India
| | - Abhishek Rathore
- Excellence in Breeding, International Maize and Wheat Improvement Center (CIMMYT), Hyderabad 502324, India
| |
Collapse
|
16
|
Thessen AE, Cooper L, Swetnam TL, Hegde H, Reese J, Elser J, Jaiswal P. Using knowledge graphs to infer gene expression in plants. Front Artif Intell 2023; 6:1201002. [PMID: 37384147 PMCID: PMC10298150 DOI: 10.3389/frai.2023.1201002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 05/23/2023] [Indexed: 06/30/2023] Open
Abstract
Introduction Climate change is already affecting ecosystems around the world and forcing us to adapt to meet societal needs. The speed with which climate change is progressing necessitates a massive scaling up of the number of species with understood genotype-environment-phenotype (G×E×P) dynamics in order to increase ecosystem and agriculture resilience. An important part of predicting phenotype is understanding the complex gene regulatory networks present in organisms. Previous work has demonstrated that knowledge about one species can be applied to another using ontologically-supported knowledge bases that exploit homologous structures and homologous genes. These types of structures that can apply knowledge about one species to another have the potential to enable the massive scaling up that is needed through in silico experimentation. Methods We developed one such structure, a knowledge graph (KG) using information from Planteome and the EMBL-EBI Expression Atlas that connects gene expression, molecular interactions, functions, and pathways to homology-based gene annotations. Our preliminary analysis uses data from gene expression studies in Arabidopsis thaliana and Populus trichocarpa plants exposed to drought conditions. Results A graph query identified 16 pairs of homologous genes in these two taxa, some of which show opposite patterns of gene expression in response to drought. As expected, analysis of the upstream cis-regulatory region of these genes revealed that homologs with similar expression behavior had conserved cis-regulatory regions and potential interaction with similar trans-elements, unlike homologs that changed their expression in opposite ways. Discussion This suggests that even though the homologous pairs share common ancestry and functional roles, predicting expression and phenotype through homology inference needs careful consideration of integrating cis and trans-regulatory components in the curated and inferred knowledge graph.
Collapse
Affiliation(s)
- Anne E. Thessen
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, CO, United States
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Tyson L. Swetnam
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
| | - Harshad Hegde
- Environmental Genomics and Systems Biology Division, Berkeley Lab (DOE), Berkeley, CA, United States
| | - Justin Reese
- Environmental Genomics and Systems Biology Division, Berkeley Lab (DOE), Berkeley, CA, United States
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, United States
| |
Collapse
|
17
|
Roberts M, Josephs EB. Weaker selection on genes with treatment-specific expression consistent with a limit on plasticity evolution in Arabidopsis thaliana. Genetics 2023; 224:iyad074. [PMID: 37094602 PMCID: PMC10484170 DOI: 10.1093/genetics/iyad074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 03/06/2023] [Accepted: 04/07/2023] [Indexed: 04/26/2023] Open
Abstract
Differential gene expression between environments often underlies phenotypic plasticity. However, environment-specific expression patterns are hypothesized to relax selection on genes, and thus limit plasticity evolution. We collated over 27 terabases of RNA-sequencing data on Arabidopsis thaliana from over 300 peer-reviewed studies and 200 treatment conditions to investigate this hypothesis. Consistent with relaxed selection, genes with more treatment-specific expression have higher levels of nucleotide diversity and divergence at nonsynonymous sites but lack stronger signals of positive selection. This result persisted even after controlling for expression level, gene length, GC content, the tissue specificity of expression, and technical variation between studies. Overall, our investigation supports the existence of a hypothesized trade-off between the environment specificity of a gene's expression and the strength of selection on said gene in A. thaliana. Future studies should leverage multiple genome-scale datasets to tease apart the contributions of many variables in limiting plasticity evolution.
Collapse
Affiliation(s)
- Miles Roberts
- Genetics and Genome Sciences Program, Michigan State University, East Lansing, MI 48824, USA
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI 48824, USA
- Ecology, Evolution, and Behavior Program, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
18
|
Przybylska MS, Violle C, Vile D, Scheepens JF, Lacombe B, Le Roux X, Perrier L, Sales-Mabily L, Laumond M, Vinyeta M, Moulin P, Beurier G, Rouan L, Cornet D, Vasseur F. AraDiv: a dataset of functional traits and leaf hyperspectral reflectance of Arabidopsis thaliana. Sci Data 2023; 10:314. [PMID: 37225767 DOI: 10.1038/s41597-023-02189-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Accepted: 04/27/2023] [Indexed: 05/26/2023] Open
Abstract
Data from functional trait databases have been increasingly used to address questions related to plant diversity and trait-environment relationships. However, such databases provide intraspecific data that combine individual records obtained from distinct populations at different sites and, hence, environmental conditions. This prevents distinguishing sources of variation (e.g., genetic-based variation vs. phenotypic plasticity), a necessary condition to test for adaptive processes and other determinants of plant phenotypic diversity. Consequently, individual traits measured under common growing conditions and encompassing within-species variation across the occupied geographic range have the potential to leverage trait databases with valuable data for functional and evolutionary ecology. Here, we recorded 16 functional traits and leaf hyperspectral reflectance (NIRS) data for 721 widely distributed Arabidopsis thaliana natural accessions grown in a common garden experiment. These data records, together with meteorological variables obtained during the experiment, were assembled to create the AraDiv dataset. AraDiv is a comprehensive dataset of A. thaliana's intraspecific variability that can be explored to address questions at the interface of genetics and ecology.
Collapse
Affiliation(s)
- Maria Stefania Przybylska
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France.
- LEPSE, Univ Montpellier, INRAE, Institut Agro Montpellier, Montpellier, France.
| | - Cyrille Violle
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Denis Vile
- LEPSE, Univ Montpellier, INRAE, Institut Agro Montpellier, Montpellier, France
| | - J F Scheepens
- Plant Evolutionary Ecology, Institute of Ecology, Evolution and Diversity, Faculty of Biological Sciences, Goethe University Frankfurt, Max-von-Laue-Str. 13, 60438, Frankfurt am Main, Germany
| | - Benoit Lacombe
- IPSIM, Univ Montpellier, CNRS, INRAE, Institut Agro Montpellier, Montpellier, France
| | - Xavier Le Roux
- Microbial Ecology Centre, UMR 1418 INRAE, UMR 5557 CNRS, INRAE, CNRS, University Lyon 1, University of Lyon, VetAgroSup, Villeurbanne, France
| | - Lisa Perrier
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | | | | | - Mariona Vinyeta
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Pierre Moulin
- CEFE, Univ Montpellier, CNRS, EPHE, IRD, Montpellier, France
| | - Gregory Beurier
- CIRAD, UMR AGAP Institut, F-34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
| | - Lauriane Rouan
- CIRAD, UMR AGAP Institut, F-34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
| | - Denis Cornet
- CIRAD, UMR AGAP Institut, F-34398, Montpellier, France
- UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, F-34398, Montpellier, France
| | | |
Collapse
|
19
|
Karabulut E, Erkoç K, Acı M, Aydın M, Barriball S, Braley J, Cassetta E, Craine EB, Diaz-Garcia L, Hershberger J, Meyering B, Miller AJ, Rubin MJ, Tesdell O, Schlautman B, Şakiroğlu M. Sainfoin ( Onobrychis spp.) crop ontology: supporting germplasm characterization and international research collaborations. FRONTIERS IN PLANT SCIENCE 2023; 14:1177406. [PMID: 37255566 PMCID: PMC10225502 DOI: 10.3389/fpls.2023.1177406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 04/18/2023] [Indexed: 06/01/2023]
Abstract
Sainfoin (Onobrychis spp.) is a perennial forage legume that is also attracting attention as a perennial pulse with potential for human consumption. The dual use of sainfoin underpins diverse research and breeding programs focused on improving sainfoin lines for forage and pulses, which is driving the generation of complex datasets describing high dimensional phenotypes in the post-omics era. To ensure that multiple user groups, for example, breeders selecting for forage and those selecting for edible seed, can utilize these rich datasets, it is necessary to develop common ontologies and accessible ontology platforms. One such platform, Crop Ontology, was created in 2008 by the Consortium of International Agricultural Research Centers (CGIAR) to host crop-specific trait ontologies that support standardized plant breeding databases. In the present study, we describe the sainfoin crop ontology (CO). An in-depth literature review was performed to develop a comprehensive list of traits measured and reported in sainfoin. Because the same traits can be measured in different ways, ultimately, a set of 98 variables (variable = plant trait + method of measurement + scale of measurement) used to describe variation in sainfoin were identified. Variables were formatted and standardized based on guidelines provided here for inclusion in the sainfoin CO. The 98 variables contained a total of 82 traits from four trait classes of which 24 were agronomic, 31 were morphological, 19 were seed and forage quality related, and 8 were phenological. In addition to the developed variables, we have provided a roadmap for developing and submission of new traits to the sainfoin CO.
Collapse
Affiliation(s)
- Ebrar Karabulut
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| | - Kübra Erkoç
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| | - Murat Acı
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
- The Land Institute, Salina, KS, United States
| | - Mahmut Aydın
- Department of Computer Engineering, Kafkas University, Kars, Türkiye
| | | | - Jackson Braley
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | | | | | - Luis Diaz-Garcia
- Department of Viticulture and Enology, University of California Davis, Davis, CA, United States
| | - Jenna Hershberger
- Plant and Environmental Sciences Department, Clemson University, Clemson, SC, United States
| | - Bo Meyering
- The Land Institute, Salina, KS, United States
| | - Allison J. Miller
- Donald Danforth Plant Science Center, St. Louis, MO, United States
- Department. of Biology, Saint Louis University, St. Louis, MO, United States
| | - Matthew J. Rubin
- Donald Danforth Plant Science Center, St. Louis, MO, United States
| | - Omar Tesdell
- Department of Geography, Birzeit University, Birzeit, West Bank, Palestine
| | | | - Muhammet Şakiroğlu
- Bioengineering Department, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye
| |
Collapse
|
20
|
Depuydt T, De Rybel B, Vandepoele K. Charting plant gene functions in the multi-omics and single-cell era. TRENDS IN PLANT SCIENCE 2023; 28:283-296. [PMID: 36307271 DOI: 10.1016/j.tplants.2022.09.008] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 09/09/2022] [Accepted: 09/30/2022] [Indexed: 06/16/2023]
Abstract
Despite the increased access to high-quality plant genome sequences, the set of genes with a known function remains far from complete. With the advent of novel bulk and single-cell omics profiling methods, we are entering a new era where advanced and highly integrative functional annotation strategies are being developed to elucidate the functions of all plant genes. Here, we review different multi-omics approaches to improve functional and regulatory gene characterization and highlight the power of machine learning and network biology to fully exploit the complementary information embedded in different omics layers. Finally, we discuss the potential of emerging single-cell methods and algorithms to further increase the resolution, allowing generation of functional insights about plant biology.
Collapse
Affiliation(s)
- Thomas Depuydt
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; Vlaams Instituut voor Biotechnologie, Center for Plant Systems Biology, Ghent, Belgium
| | - Bert De Rybel
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; Vlaams Instituut voor Biotechnologie, Center for Plant Systems Biology, Ghent, Belgium
| | - Klaas Vandepoele
- Ghent University, Department of Plant Biotechnology and Bioinformatics, Ghent, Belgium; Vlaams Instituut voor Biotechnologie, Center for Plant Systems Biology, Ghent, Belgium; Ghent University, Bioinformatics Institute Ghent, Ghent, Belgium.
| |
Collapse
|
21
|
Chan LE, Thessen AE, Duncan WD, Matentzoglu N, Schmitt C, Grondin CJ, Vasilevsky N, McMurry JA, Robinson PN, Mungall CJ, Haendel MA. The Environmental Conditions, Treatments, and Exposures Ontology (ECTO): connecting toxicology and exposure to human health and beyond. J Biomed Semantics 2023; 14:3. [PMID: 36823605 PMCID: PMC9951428 DOI: 10.1186/s13326-023-00283-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 02/03/2023] [Indexed: 02/25/2023] Open
Abstract
BACKGROUND Evaluating the impact of environmental exposures on organism health is a key goal of modern biomedicine and is critically important in an age of greater pollution and chemicals in our environment. Environmental health utilizes many different research methods and generates a variety of data types. However, to date, no comprehensive database represents the full spectrum of environmental health data. Due to a lack of interoperability between databases, tools for integrating these resources are needed. In this manuscript we present the Environmental Conditions, Treatments, and Exposures Ontology (ECTO), a species-agnostic ontology focused on exposure events that occur as a result of natural and experimental processes, such as diet, work, or research activities. ECTO is intended for use in harmonizing environmental health data resources to support cross-study integration and inference for mechanism discovery. METHODS AND FINDINGS ECTO is an ontology designed for describing organismal exposures such as toxicological research, environmental variables, dietary features, and patient-reported data from surveys. ECTO utilizes the base model established within the Exposure Ontology (ExO). ECTO is developed using a combination of manual curation and Dead Simple OWL Design Patterns (DOSDP), and contains over 2700 environmental exposure terms, and incorporates chemical and environmental ontologies. ECTO is an Open Biological and Biomedical Ontology (OBO) Foundry ontology that is designed for interoperability, reuse, and axiomatization with other ontologies. ECTO terms have been utilized in axioms within the Mondo Disease Ontology to represent diseases caused or influenced by environmental factors, as well as for survey encoding for the Personalized Environment and Genes Study (PEGS). CONCLUSIONS We constructed ECTO to meet Open Biological and Biomedical Ontology (OBO) Foundry principles to increase translation opportunities between environmental health and other areas of biology. ECTO has a growing community of contributors consisting of toxicologists, public health epidemiologists, and health care providers to provide the necessary expertise for areas that have been identified previously as gaps.
Collapse
Affiliation(s)
| | - Anne E Thessen
- Oregon State University, Corvallis, OR, 97331, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| | | | | | - Charles Schmitt
- National Institute of Environmental Health Sciences, Durham, NC, 27709, USA
| | | | - Nicole Vasilevsky
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| | - Julie A McMurry
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, 06032, USA
| | | | - Melissa A Haendel
- Oregon State University, Corvallis, OR, 97331, USA
- University of Colorado Anschutz Medical Campus, Aurora, CO, 80054, USA
| |
Collapse
|
22
|
Chen Y, Guo Y, Guan P, Wang Y, Wang X, Wang Z, Qin Z, Ma S, Xin M, Hu Z, Yao Y, Ni Z, Sun Q, Guo W, Peng H. A wheat integrative regulatory network from large-scale complementary functional datasets enables trait-associated gene discovery for crop improvement. MOLECULAR PLANT 2023; 16:393-414. [PMID: 36575796 DOI: 10.1016/j.molp.2022.12.019] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/28/2022] [Accepted: 12/18/2022] [Indexed: 06/17/2023]
Abstract
Gene regulation is central to all aspects of organism growth, and understanding it using large-scale functional datasets can provide a whole view of biological processes controlling complex phenotypic traits in crops. However, the connection between massive functional datasets and trait-associated gene discovery for crop improvement is still lacking. In this study, we constructed a wheat integrative gene regulatory network (wGRN) by combining an updated genome annotation and diverse complementary functional datasets, including gene expression, sequence motif, transcription factor (TF) binding, chromatin accessibility, and evolutionarily conserved regulation. wGRN contains 7.2 million genome-wide interactions covering 5947 TFs and 127 439 target genes, which were further verified using known regulatory relationships, condition-specific expression, gene functional information, and experiments. We used wGRN to assign genome-wide genes to 3891 specific biological pathways and accurately prioritize candidate genes associated with complex phenotypic traits in genome-wide association studies. In addition, wGRN was used to enhance the interpretation of a spike temporal transcriptome dataset to construct high-resolution networks. We further unveiled novel regulators that enhance the power of spike phenotypic trait prediction using machine learning and contribute to the spike phenotypic differences among modern wheat accessions. Finally, we developed an interactive webserver, wGRN (http://wheat.cau.edu.cn/wGRN), for the community to explore gene regulation and discover trait-associated genes. Collectively, this community resource establishes the foundation for using large-scale functional datasets to guide trait-associated gene discovery for crop improvement.
Collapse
Affiliation(s)
- Yongming Chen
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yiwen Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Panfeng Guan
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yongfa Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Xiaobo Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zihao Wang
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zhen Qin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Shengwei Ma
- Hainan Yazhou Bay Seed Laboratory, Sanya, Hainan, China; State Key Laboratory of Plant Cell and Chromosome Engineering, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
| | - Mingming Xin
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zhaorong Hu
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Yingyin Yao
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Zhongfu Ni
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Qixin Sun
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Weilong Guo
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| | - Huiru Peng
- Frontiers Science Center for Molecular Design Breeding, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China.
| |
Collapse
|
23
|
Stefancsik R, Balhoff JP, Balk MA, Ball R, Bello SM, Caron AR, Chessler E, de Souza V, Gehrke S, Haendel M, Harris LW, Harris NL, Ibrahim A, Koehler S, Matentzoglu N, McMurry JA, Mungall CJ, Munoz-Torres MC, Putman T, Robinson P, Smedley D, Sollis E, Thessen AE, Vasilevsky N, Walton DO, Osumi-Sutherland D. The Ontology of Biological Attributes (OBA) - Computational Traits for the Life Sciences. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.26.525742. [PMID: 36747660 PMCID: PMC9900877 DOI: 10.1101/2023.01.26.525742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Existing phenotype ontologies were originally developed to represent phenotypes that manifest as a character state in relation to a wild-type or other reference. However, these do not include the phenotypic trait or attribute categories required for the annotation of genome-wide association studies (GWAS), Quantitative Trait Loci (QTL) mappings or any population-focused measurable trait data. Moreover, variations in gene expression in response to environmental disturbances even without any genetic alterations can also be associated with particular biological attributes. The integration of trait and biological attribute information with an ever increasing body of chemical, environmental and biological data greatly facilitates computational analyses and it is also highly relevant to biomedical and clinical applications. The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. OBA has a modular design which provides several benefits for users and data integrators, including an automated and meaningful classification of trait terms computed on the basis of logical inferences drawn from domain-specific ontologies for cells, anatomical and other relevant entities. The logical axioms in OBA also provide a previously missing bridge that can computationally link Mendelian phenotypes with GWAS and quantitative traits. The term components in OBA provide semantic links and enable knowledge and data integration across specialised research community boundaries, thereby breaking silos.
Collapse
Affiliation(s)
- Ray Stefancsik
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - James P. Balhoff
- Renaissance Computing Institute, University of North Carolina, Chapel Hill, NC 27517, USA
| | - Meghan A. Balk
- National Ecological Observatory Network, Battelle, Boulder, CO 80301, USA
| | - Robyn Ball
- The Jackson Laboratory, Bar Harbor, ME 04609, USA
| | | | - Anita R. Caron
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | - Vinicius de Souza
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah Gehrke
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Melissa Haendel
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Laura W. Harris
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Nomi L. Harris
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Arwa Ibrahim
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | | | | | - Julie A. McMurry
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Christopher J. Mungall
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | | | - Tim Putman
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | - Damian Smedley
- William Harvey Research Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK
| | - Elliot Sollis
- European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Anne E Thessen
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | - Nicole Vasilevsky
- Anschutz Medical Campus, University of Colorado, Aurora, CO 80045, USA
| | | | | |
Collapse
|
24
|
Liu X, Tian D, Li C, Tang B, Wang Z, Zhang R, Pan Y, Wang Y, Zou D, Zhang Z, Song S. GWAS Atlas: an updated knowledgebase integrating more curated associations in plants and animals. Nucleic Acids Res 2023; 51:D969-D976. [PMID: 36263826 PMCID: PMC9825481 DOI: 10.1093/nar/gkac924] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/02/2022] [Accepted: 10/19/2022] [Indexed: 01/30/2023] Open
Abstract
GWAS Atlas (https://ngdc.cncb.ac.cn/gwas/) is a manually curated resource of genome-wide genotype-to-phenotype associations for a wide range of species. Here, we present an updated implementation of GWAS Atlas by curating and incorporating more high-quality associations, with significant improvements and advances over the previous version. Specifically, the current release of GWAS Atlas incorporates a total of 278,109 curated genotype-to-phenotype associations for 1,444 different traits across 15 species (10 plants and 5 animals) from 830 publications and 3,432 studies. A collection of 6,084 lead SNPs of 439 traits and 486 experiment-validated causal variants of 157 traits are newly added. Moreover, 1,056 trait ontology terms are newly defined, resulting in 1,172 and 431 terms for Plant Phenotype and Trait Ontology and Animal Phenotype and Trait Ontology, respectively. Additionally, it is equipped with four online analysis tools and a submission platform, allowing users to perform data analysis and data submission. Collectively, as a core resource in the National Genomics Data Center, GWAS Atlas provides valuable genotype-to-phenotype associations for a diversity of species and thus plays an important role in agronomic trait study and molecular breeding.
Collapse
Affiliation(s)
- Xiaonan Liu
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongmei Tian
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Cuiping Li
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Bixia Tang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhonghuang Wang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Rongqin Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yitong Pan
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformatics, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yi Wang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dong Zou
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shuhui Song
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
- Sino-Danish College, University of Chinese Academy of Sciences, Beijing 100049, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
25
|
Fahlgren N, Kapoor M, Yordanova G, Papatheodorou I, Waese J, Cole B, Harrison P, Ware D, Tickle T, Paten B, Burdett T, Elsik CG, Tuggle CK, Provart NJ. Toward a data infrastructure for the Plant Cell Atlas. PLANT PHYSIOLOGY 2023; 191:35-46. [PMID: 36200899 PMCID: PMC9806565 DOI: 10.1093/plphys/kiac468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/18/2022] [Indexed: 06/16/2023]
Abstract
We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.
Collapse
Affiliation(s)
- Noah Fahlgren
- Donald Danforth Plant Science Center, Saint Louis, Missouri 63132, USA
| | - Muskan Kapoor
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | | | | | - Jamie Waese
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| | - Benjamin Cole
- DOE-Joint Genome Institute, Lawrence Berkeley National Laboratory, 1, Cyclotron Road, Berkeley, California 94720, USA
| | - Peter Harrison
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Doreen Ware
- Cold Spring Harbor Laboratory, One Bungtown Road, Cold Spring Harbor, New York 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, New York 14853, USA
| | - Timothy Tickle
- Data Sciences Platform, The Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, Massachusetts 02142, USA
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, Baskin School of Engineering, 1156 High Street, Santa Cruz, California 95064, USA
| | - Tony Burdett
- EMBL-EBI, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Christine G Elsik
- Division of Animal Sciences/Division of Plant Science & Technology/Institute for Data Science & Informatics, University of Missouri, Columbia, Missouri 65211, USA
| | - Christopher K Tuggle
- Bioinformatics and Computational Biology Program, Department of Animal Science, Iowa State University, Ames, Iowa 50011, USA
| | - Nicholas J Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario M5S 3B2, Canada
| |
Collapse
|
26
|
Bove CB, Ingersoll MV, Davies SW. Help Me, Symbionts, You're My Only Hope: Approaches to Accelerate our Understanding of Coral Holobiont Interactions. Integr Comp Biol 2022; 62:1756-1769. [PMID: 36099871 DOI: 10.1093/icb/icac141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 08/24/2022] [Accepted: 09/05/2022] [Indexed: 01/05/2023] Open
Abstract
Tropical corals construct the three-dimensional framework for one of the most diverse ecosystems on the planet, providing habitat to a plethora of species across taxa. However, these ecosystem engineers are facing unprecedented challenges, such as increasing disease prevalence and marine heatwaves associated with anthropogenic global change. As a result, major declines in coral cover and health are being observed across the world's oceans, often due to the breakdown of coral-associated symbioses. Here, we review the interactions between the major symbiotic partners of the coral holobiont-the cnidarian host, algae in the family Symbiodiniaceae, and the microbiome-that influence trait variation, including the molecular mechanisms that underlie symbiosis and the resulting physiological benefits of different microbial partnerships. In doing so, we highlight the current framework for the formation and maintenance of cnidarian-Symbiodiniaceae symbiosis, and the role that immunity pathways play in this relationship. We emphasize that understanding these complex interactions is challenging when you consider the vast genetic variation of the cnidarian host and algal symbiont, as well as their highly diverse microbiome, which is also an important player in coral holobiont health. Given the complex interactions between and among symbiotic partners, we propose several research directions and approaches focused on symbiosis model systems and emerging technologies that will broaden our understanding of how these partner interactions may facilitate the prediction of coral holobiont phenotype, especially under rapid environmental change.
Collapse
Affiliation(s)
- Colleen B Bove
- Department of Biology, Boston University, Boston, MA 02215, USA
| | | | - Sarah W Davies
- Department of Biology, Boston University, Boston, MA 02215, USA
| |
Collapse
|
27
|
Duan G, Wu G, Chen X, Tian D, Li Z, Sun Y, Du Z, Hao L, Song S, Gao Y, Xiao J, Zhang Z, Bao Y, Tang B, Zhao W. HGD: an integrated homologous gene database across multiple species. Nucleic Acids Res 2022; 51:D994-D1002. [PMID: 36318261 PMCID: PMC9825607 DOI: 10.1093/nar/gkac970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/28/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022] Open
Abstract
Homology is fundamental to infer genes' evolutionary processes and relationships with shared ancestry. Existing homolog gene resources vary in terms of inferring methods, homologous relationship and identifiers, posing inevitable difficulties for choosing and mapping homology results from one to another. Here, we present HGD (Homologous Gene Database, https://ngdc.cncb.ac.cn/hgd), a comprehensive homologs resource integrating multi-species, multi-resources and multi-omics, as a complement to existing resources providing public and one-stop data service. Currently, HGD houses a total of 112 383 644 homologous pairs for 37 species, including 19 animals, 16 plants and 2 microorganisms. Meanwhile, HGD integrates various annotations from public resources, including 16 909 homologs with traits, 276 670 homologs with variants, 398 573 homologs with expression and 536 852 homologs with gene ontology (GO) annotations. HGD provides a wide range of omics gene function annotations to help users gain a deeper understanding of gene function.
Collapse
Affiliation(s)
| | | | - Xiaoning Chen
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Dongmei Tian
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhaohua Li
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yanling Sun
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Zhenglin Du
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Lili Hao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
| | - Shuhui Song
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yuan Gao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jingfa Xiao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhang Zhang
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yiming Bao
- National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bixia Tang
- Correspondence may also be addressed to Bixia Tang.
| | - Wenming Zhao
- To whom correspondence should be addressed. Tel: +86 1084097636; Fax: +86 1084097720;
| |
Collapse
|
28
|
Senger E, Osorio S, Olbricht K, Shaw P, Denoyes B, Davik J, Predieri S, Karhu S, Raubach S, Lippi N, Höfer M, Cockerton H, Pradal C, Kafkas E, Litthauer S, Amaya I, Usadel B, Mezzetti B. Towards smart and sustainable development of modern berry cultivars in Europe. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 111:1238-1251. [PMID: 35751152 DOI: 10.1111/tpj.15876] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 06/15/2022] [Accepted: 06/22/2022] [Indexed: 06/15/2023]
Abstract
Fresh berries are a popular and important component of the human diet. The demand for high-quality berries and sustainable production methods is increasing globally, challenging breeders to develop modern berry cultivars that fulfill all desired characteristics. Since 1994, research projects have characterized genetic resources, developed modern tools for high-throughput screening, and published data in publicly available repositories. However, the key findings of different disciplines are rarely linked together, and only a limited range of traits and genotypes has been investigated. The Horizon2020 project BreedingValue will address these challenges by studying a broader panel of strawberry, raspberry and blueberry genotypes in detail, in order to recover the lost genetic diversity that has limited the aroma and flavor intensity of recent cultivars. We will combine metabolic analysis with sensory panel tests and surveys to identify the key components of taste, flavor and aroma in berries across Europe, leading to a high-resolution map of quality requirements for future berry cultivars. Traits linked to berry yields and the effect of environmental stress will be investigated using modern image analysis methods and modeling. We will also use genetic analysis to determine the genetic basis of complex traits for the development and optimization of modern breeding technologies, such as molecular marker arrays, genomic selection and genome-wide association studies. Finally, the results, raw data and metadata will be made publicly available on the open platform Germinate in order to meet FAIR data principles and provide the basis for sustainable research in the future.
Collapse
Affiliation(s)
- Elisa Senger
- Institute of Bio- and Geosciences, IBG-4 Bioinformatics, BioSC, CEPLAS, Forschungszentrum Jülich, Jülich, Germany
| | - Sonia Osorio
- Departamento de Biología Molecular y Bioquímica, Instituto de Hortofruticultura Subtropical y Mediterránea 'La Mayora', Universidad de Málaga-Consejo Superior de Investigaciones Científicas, Campus de Teatinos, Málaga, Spain
| | | | - Paul Shaw
- Department of Information and Computational Sciences, The James Hutton Institute, Invergowrie, Scotland, UK
| | - Béatrice Denoyes
- Université de Bordeaux, UMR BFP, INRAE, Villenave d'Ornon, France
| | - Jahn Davik
- Department of Molecular Plant Biology, Norwegian Institute of Bioeconomy Research (NIBIO), Ås, Norway
| | - Stefano Predieri
- Bio-Agrofood Department, Institute for Bioeconomy, IBE-CNR, Italian National Research Council, Bologna, Italy
| | - Saila Karhu
- Natural Resources Institute Finland (Luke), Turku, Finland
| | - Sebastian Raubach
- Department of Information and Computational Sciences, The James Hutton Institute, Invergowrie, Scotland, UK
| | - Nico Lippi
- Bio-Agrofood Department, Institute for Bioeconomy, IBE-CNR, Italian National Research Council, Bologna, Italy
| | - Monika Höfer
- Institute of Breeding Research on Fruit Crops, Federal Research Centre for Cultivated Plants (JKI), Dresden, Germany
| | - Helen Cockerton
- Genetics, Genomics and Breeding Department, NIAB, East Malling, UK
| | - Christophe Pradal
- CIRAD and UMR AGAP Institute, Montpellier, France
- INRIA and LIRMM, University Montpellier, CNRS, Montpellier, France
| | - Ebru Kafkas
- Department of Horticulture, Faculty of Agriculture, Çukurova University, Balcalı, Adana, Turkey
| | | | - Iraida Amaya
- Unidad Asociada deI + D + i IFAPA-CSIC Biotecnología y Mejora en Fresa, Málaga, Spain
- Laboratorio de Genómica y Biotecnología, Centro IFAPA de Málaga, Instituto Andaluz de Investigación y Formación Agraria y Pesquera, Málaga, Spain
| | - Björn Usadel
- Institute of Bio- and Geosciences, IBG-4 Bioinformatics, BioSC, CEPLAS, Forschungszentrum Jülich, Jülich, Germany
- Institute for Biological Data Science, Heinrich-Heine University Düsseldorf, Düsseldorf, Germany
| | - Bruno Mezzetti
- Department of Agricultural, Food and Environmental Sciences, Università Politecnica delle Marche, Ancona, Italy
| |
Collapse
|
29
|
Li X, Xu X, Chen M, Xu M, Wang W, Liu C, Yu L, Liu W, Yang W. The field phenotyping platform's next darling: Dicotyledons. FRONTIERS IN PLANT SCIENCE 2022; 13:935748. [PMID: 36092402 PMCID: PMC9449727 DOI: 10.3389/fpls.2022.935748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
The genetic information and functional properties of plants have been further identified with the completion of the whole-genome sequencing of numerous crop species and the rapid development of high-throughput phenotyping technologies, laying a suitable foundation for advanced precision agriculture and enhanced genetic gains. Collecting phenotypic data from dicotyledonous crops in the field has been identified as a key factor in the collection of large-scale phenotypic data of crops. On the one hand, dicotyledonous plants account for 4/5 of all angiosperm species and play a critical role in agriculture. However, their morphology is complex, and an abundance of dicot phenotypic information is available, which is critical for the analysis of high-throughput phenotypic data in the field. As a result, the focus of this paper is on the major advancements in ground-based, air-based, and space-based field phenotyping platforms over the last few decades and the research progress in the high-throughput phenotyping of dicotyledonous field crop plants in terms of morphological indicators, physiological and biochemical indicators, biotic/abiotic stress indicators, and yield indicators. Finally, the future development of dicots in the field is explored from the perspectives of identifying new unified phenotypic criteria, developing a high-performance infrastructure platform, creating a phenotypic big data knowledge map, and merging the data with those of multiomic techniques.
Collapse
Affiliation(s)
- Xiuni Li
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Xiangyao Xu
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Menggen Chen
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Mei Xu
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Wenyan Wang
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Chunyan Liu
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Liang Yu
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Weiguo Liu
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| | - Wenyu Yang
- College of Agronomy, Sichuan Agricultural University, Chengdu, China
- Sichuan Engineering Research Center for Crop Strip Intercropping System, Chengdu, China
- Key Laboratory of Crop Ecophysiology and Farming System in Southwest, Ministry of Agriculture, Chengdu, China
| |
Collapse
|
30
|
Eizenga GC, Kim H, Jung JKH, Greenberg AJ, Edwards JD, Naredo MEB, Banaticla-Hilario MCN, Harrington SE, Shi Y, Kimball JA, Harper LA, McNally KL, McCouch SR. Phenotypic Variation and the Impact of Admixture in the Oryza rufipogon Species Complex ( ORSC). FRONTIERS IN PLANT SCIENCE 2022; 13:787703. [PMID: 35769295 PMCID: PMC9235872 DOI: 10.3389/fpls.2022.787703] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/13/2022] [Indexed: 06/15/2023]
Abstract
Crop wild relatives represent valuable reservoirs of variation for breeding, but their populations are threatened in natural habitats, are sparsely represented in genebanks, and most are poorly characterized. The focus of this study is the Oryza rufipogon species complex (ORSC), wild progenitor of Asian rice (Oryza sativa L.). The ORSC comprises perennial, annual and intermediate forms which were historically designated as O. rufipogon, O. nivara, and O. sativa f. spontanea (or Oryza spp., an annual form of mixed O. rufipogon/O. nivara and O. sativa ancestry), respectively, based on non-standardized morphological, geographical, and/or ecologically-based species definitions and boundaries. Here, a collection of 240 diverse ORSC accessions, characterized by genotyping-by-sequencing (113,739 SNPs), was phenotyped for 44 traits associated with plant, panicle, and seed morphology in the screenhouse at the International Rice Research Institute, Philippines. These traits included heritable phenotypes often recorded as characterization data by genebanks. Over 100 of these ORSC accessions were also phenotyped in the greenhouse for 18 traits in Stuttgart, Arkansas, and 16 traits in Ithaca, New York, United States. We implemented a Bayesian Gaussian mixture model to infer accession groups from a subset of these phenotypic data and ascertained three phenotype-based group assignments. We used concordance between the genotypic subpopulations and these phenotype-based groups to identify a suite of phenotypic traits that could reliably differentiate the ORSC populations, whether measured in tropical or temperate regions. The traits provide insight into plant morphology, life history (perenniality versus annuality) and mating habit (self- versus cross-pollinated), and are largely consistent with genebank species designations. One phenotypic group contains predominantly O. rufipogon accessions characterized as perennial and largely out-crossing and one contains predominantly O. nivara accessions characterized as annual and largely inbreeding. From these groups, 42 "core" O. rufipogon and 25 "core" O. nivara accessions were identified for domestication studies. The third group, comprising 20% of our collection, has the most accessions identified as Oryza spp. (51.2%) and levels of O. sativa admixture accounting for more than 50% of the genome. This third group is potentially useful as a "pre-breeding" pool for breeders attempting to incorporate novel variation into elite breeding lines.
Collapse
Affiliation(s)
- Georgia C. Eizenga
- Dale Bumpers National Rice Research Center, USDA-ARS, Stuttgart, AR, United States
| | - HyunJung Kim
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Janelle K. H. Jung
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | | | - Jeremy D. Edwards
- Dale Bumpers National Rice Research Center, USDA-ARS, Stuttgart, AR, United States
| | | | | | - Sandra E. Harrington
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Yuxin Shi
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Jennifer A. Kimball
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | - Lisa A. Harper
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| | | | - Susan R. McCouch
- Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, NY, United States
| |
Collapse
|
31
|
Rice SL, Lazarus E, Anderton C, Birnbaum K, Brophy J, Cole B, Dickel D, Ehrhardt D, Fahlgren N, Frank M, Haswell E, Huang SC, Leiboff S, Libault M, Otegui MS, Provart N, Uhrig RG, Rhee SY. First Plant Cell Atlas symposium report. PLANT DIRECT 2022; 6:e406. [PMID: 35774620 PMCID: PMC9219010 DOI: 10.1002/pld3.406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 05/16/2022] [Accepted: 05/18/2022] [Indexed: 06/15/2023]
Abstract
The Plant Cell Atlas (PCA) community hosted a virtual symposium on December 9 and 10, 2021 on single cell and spatial omics technologies. The conference gathered almost 500 academic, industry, and government leaders to identify the needs and directions of the PCA community and to explore how establishing a data synthesis center would address these needs and accelerate progress. This report details the presentations and discussions focused on the possibility of a data synthesis center for a PCA and the expected impacts of such a center on advancing science and technology globally. Community discussions focused on topics such as data analysis tools and annotation standards; computational expertise and cyber-infrastructure; modes of community organization and engagement; methods for ensuring a broad reach in the PCA community; recruitment, training, and nurturing of new talent; and the overall impact of the PCA initiative. These targeted discussions facilitated dialogue among the participants to gauge whether PCA might be a vehicle for formulating a data synthesis center. The conversations also explored how online tools can be leveraged to help broaden the reach of the PCA (i.e., online contests, virtual networking, and social media stakeholder engagement) and decrease costs of conducting research (e.g., virtual REU opportunities). Major recommendations for the future of the PCA included establishing standards, creating dashboards for easy and intuitive access to data, and engaging with a broad community of stakeholders. The discussions also identified the following as being essential to the PCA's success: identifying homologous cell-type markers and their biocuration, publishing datasets and computational pipelines, utilizing online tools for communication (such as Slack), and user-friendly data visualization and data sharing. In conclusion, the development of a data synthesis center will help the PCA community achieve these goals by providing a centralized repository for existing and new data, a platform for sharing tools, and new analytical approaches through collaborative, multidisciplinary efforts. A data synthesis center will help the PCA reach milestones, such as community-supported data evaluation metrics, accelerating plant research necessary for human and environmental health.
Collapse
Affiliation(s)
- Selena L. Rice
- Department of Plant BiologyCarnegie Institution for ScienceStanfordCaliforniaUSA
| | - Elena Lazarus
- Department of Plant BiologyCarnegie Institution for ScienceStanfordCaliforniaUSA
| | - Christopher Anderton
- Environmental Molecular Sciences DivisionPacific Northwest National LaboratoryRichlandWashingtonUSA
| | - Kenneth Birnbaum
- Center for Genomics and Systems BiologyNew York UniversityNew YorkNew YorkUSA
| | - Jennifer Brophy
- Department of BioengineeringStanford UniversityStanfordCaliforniaUSA
| | - Benjamin Cole
- Lawrence Berkeley National LaboratoryBerkeleyCaliforniaUSA
| | | | - David Ehrhardt
- Department of Plant BiologyCarnegie Institution for ScienceStanfordCaliforniaUSA
| | - Noah Fahlgren
- Donald Danforth Plant Science CenterSt. LouisMissouriUSA
| | - Margaret Frank
- Department of Plant BiologyCornell UniversityIthacaNew YorkUSA
| | - Elizabeth Haswell
- Department of BiologyWashington University in St. LouisSt. LouisMissouriUSA
| | | | - Samuel Leiboff
- Department of Botany and Plant PathologyOregon State UniversityCorvallisOregonUSA
| | - Marc Libault
- Department of Agronomy and HorticultureUniversity of Nebraska‐LincolnLincolnNebraskaUSA
| | - Marisa S. Otegui
- Department of BotanyUniversity of Wisconsin‐MadisonMadisonWisconsinUSA
| | - Nicholas Provart
- Department of Cell and Systems Biology/Centre for the Analysis of Genome Evolution and FunctionUniversity of TorontoTorontoOntarioCanada
| | - R. Glen Uhrig
- Department of ScienceUniversity of AlbertaEdmontonAlbertaCanada
| | - Seung Y. Rhee
- Department of Plant BiologyCarnegie Institution for ScienceStanfordCaliforniaUSA
| | | |
Collapse
|
32
|
Salim JA, Saraiva AM, Zermoglio PF, Agostini K, Wolowski M, Drucker DP, Soares FM, Bergamo PJ, Varassin IG, Freitas L, Maués MM, Rech AR, Veiga AK, Acosta AL, Araujo AC, Nogueira A, Blochtein B, Freitas BM, Albertini BC, Maia-Silva C, Nunes CEP, Pires CSS, dos Santos CF, Queiroz EP, Cartolano EA, de Oliveira FF, Amorim FW, Fontúrbel FE, da Silva GV, Consolaro H, Alves-dos-Santos I, Machado IC, Silva JS, Aleixo KP, Carvalheiro LG, Rocca MA, Pinheiro M, Hrncir M, Streher NS, Ferreira PA, de Albuquerque PMC, Maruyama PK, Borges RC, Giannini TC, Brito VLG. Data standardization of plant-pollinator interactions. Gigascience 2022; 11:giac043. [PMID: 35639882 PMCID: PMC9154084 DOI: 10.1093/gigascience/giac043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Animal pollination is an important ecosystem function and service, ensuring both the integrity of natural systems and human well-being. Although many knowledge shortfalls remain, some high-quality data sets on biological interactions are now available. The development and adoption of standards for biodiversity data and metadata has promoted great advances in biological data sharing and aggregation, supporting large-scale studies and science-based public policies. However, these standards are currently not suitable to fully support interaction data sharing. RESULTS Here we present a vocabulary of terms and a data model for sharing plant-pollinator interactions data based on the Darwin Core standard. The vocabulary introduces 48 new terms targeting several aspects of plant-pollinator interactions and can be used to capture information from different approaches and scales. Additionally, we provide solutions for data serialization using RDF, XML, and DwC-Archives and recommendations of existing controlled vocabularies for some of the terms. Our contribution supports open access to standardized data on plant-pollinator interactions. CONCLUSIONS The adoption of the vocabulary would facilitate data sharing to support studies ranging from the spatial and temporal distribution of interactions to the taxonomic, phenological, functional, and phylogenetic aspects of plant-pollinator interactions. We expect to fill data and knowledge gaps, thus further enabling scientific research on the ecology and evolution of plant-pollinator communities, biodiversity conservation, ecosystem services, and the development of public policies. The proposed data model is flexible and can be adapted for sharing other types of interactions data by developing discipline-specific vocabularies of terms.
Collapse
Affiliation(s)
- José A Salim
- Escola Politécnica, Universidade de São Paulo, São Paulo, SP, 05508-010, Brazil
| | - Antonio M Saraiva
- Escola Politécnica, Universidade de São Paulo, São Paulo, SP, 05508-010, Brazil
| | - Paula F Zermoglio
- Departamento de Ecología, Genética y Evolución, Instituto IEGEBA (CONICET-UBA), Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - Kayna Agostini
- Departamento de Ciências da Natureza, Matemática e Educação, Universidade Federal de São Carlos, Rodovia Anhanguera km 174, Araras, São Paulo, Caixa Postal 153. CEP 13600-970, Brazil
| | - Marina Wolowski
- Instituto de Ciências da Natureza, Universidade Federal de Alfenas, Rua Gabriel Monteiro da Silva 700, Alfenas, Minas Gerais, 37130-001, Brazil
| | - Debora P Drucker
- Embrapa Agricultura Digital, Empresa Brasileira de Pesquisa Agropecuária (Embrapa), Campinas, SP, Brazil
| | - Filipi M Soares
- Escola Politécnica, Universidade de São Paulo, São Paulo, SP, 05508-010, Brazil
| | - Pedro J Bergamo
- Jardim Botânico do Rio de Janeiro, R. Pacheco Leão 915, Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| | - Isabela G Varassin
- Departamento de Botânica, Universidade Federal do Paraná, Curitiba, Paraná, Brazil
| | - Leandro Freitas
- Jardim Botânico do Rio de Janeiro, R. Pacheco Leão 915, Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil
| | - Márcia M Maués
- Laboratório de Entomologia, Embrapa Amazônia Oriental, Trav. Dr. Enéas Pinheiro, s/n°, Bairro do Marco, Belém, Pará, 66095-903, Brazil
| | - Andre R Rech
- Faculdade Interdisciplinar de Humanidades, Centro Multiusuário de Pesquisa em Ciência Florestal (MULTIFLOR), Universidade Federal dos Vales do Jequitinhonha e Mucuri, Diamantina, Minas Gerais, 39100-000, Brazil
| | - Allan K Veiga
- Escola Politécnica, Universidade de São Paulo, São Paulo, SP, 05508-010, Brazil
| | - Andre L Acosta
- Instituto Tecnológico Vale. Rua Boaventura da Silva, 955, 66055-900, Belém, Pará, Brazil
| | - Andréa C Araujo
- Instituto de Biociências, Universidade Federal de Mato Grosso do Sul, Campo Grande, Mato Grosso do Sul, Brazil
| | - Anselmo Nogueira
- Laboratório de Interações Plant-Animal (LIPA), Centro de Ciências Naturais e Humanas (CCNH), Universidade Federal do ABC, Alameda da Universidade, s/nº, Anchieta, São Bernardo do Campo, São Paulo, Brazil
| | - Betina Blochtein
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, RS, 90619-900, Brazil
| | - Breno M Freitas
- Departamento de Zootecnia, Campus Universitário do Pici, Universidade Federal do Ceará, Centro de Ciências Agrárias, Fortaleza, CE, Brazil
| | - Bruno C Albertini
- Escola Politécnica, Universidade de São Paulo, São Paulo, SP, 05508-010, Brazil
| | - Camila Maia-Silva
- Departamento de Biociências, Universidade Federal Rural do Semi-Árido, Av. Francisco Mota, n° 572, Presidente Costa e Silva, Mossoró, RN, 59625-900, Brazil
| | - Carlos E P Nunes
- Department of Biological and Environmental Sciences, Cottrell Building, University of Stirling, Stirling FK9 4LA, Scotland, United Kingdom
| | - Carmen S S Pires
- Embrapa Recursos Genéticos e Biotecnologia, Brasília, Distrito Federal, Brazil
| | - Charles F dos Santos
- Escola de Ciências da Saúde e da Vida, Pontifícia Universidade Católica do Rio Grande do Sul, Porto Alegre, RS, 90619-900, Brazil
| | - Elisa P Queiroz
- Departamento de Ecologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Etienne A Cartolano
- Escola Politécnica, Universidade de São Paulo, São Paulo, SP, 05508-010, Brazil
| | - Favízia F de Oliveira
- Laboratório de Bionomia, Biogeografia e Sistemática de Insetos (BIOSIS), Instituto de Biologia (IBIO), Universidade Federal da Bahia, 40170-115 Salvador, Bahia, Brazil
| | - Felipe W Amorim
- Laboratório de Ecologia da Polinização e Interações (LEPI), Programa de Pós-graduação em Botânica, Programa de Pós-graduação em Zoologia, Instituto de Biociências, Universidade Estadual Paulista, Botucatu, SP, Brazil
| | - Francisco E Fontúrbel
- Instituto de Biología, Facultad de Ciencias, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
| | - Gleycon V da Silva
- Programa de Pós-Graduação em Ecologia / INPA-V8 - Instituto Nacional de Pesquisas da Amazônia, Av. André Araújo 2936, Petrópolis, 69067-375, Manaus - AM, Brazil
| | - Hélder Consolaro
- Instituto de Biotecnologia, Universidade Federal de Catalão, Catalão, Goiás, Brazil
| | - Isabel Alves-dos-Santos
- Departamento de Ecologia, Instituto de Biociências, Universidade de São Paulo, São Paulo, Brazil
| | - Isabel C Machado
- Programa de Pós-Graduação em Biologia Vegetal, Departamento de Botânica, Universidade Federal de Pernambuco, Recife, PE 50670-901, Brazil
| | - Juliana S Silva
- Instituto Federal de Educação Ciência e Tecnologia de Mato Grosso, Avenida Sen. Filinto Müller, 953 - CEP: 78043-400 - Cuiabá, MT, Brazil
| | - Kátia P Aleixo
- Associação Brasileira de Estudos das Abelhas (A.B.E.L.H.A.), São Paulo, SP, 04535-001, Brazil
| | - Luísa G Carvalheiro
- Departamento de Ecologia, Universidade Federal de Goiás, Campus Samambaia, Goiânia, Brazil Centre for Ecology, Evolution and Environmental Changes (cE3c), University of Lisboa, Lisbon, Portugal
| | - Márcia A Rocca
- Departamento de Ecologia, Centro de Ciências Biológicas e da Saúde, Universidade Federal de Sergipe, Avenida Marechal Rondon s/n, São Cristóvão, Sergipe, 49100-000, Brazil
| | - Mardiore Pinheiro
- Universidade Federal da Fronteira Sul, R. Major Antônio Cardoso 590, Cerro Largo, Rio Grande do Sul, 97900-000, Brazil
| | - Michael Hrncir
- Departamento de Fisiologia, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, 321, Travessa 14, São Paulo, São Paulo, 05508-900, Brazil
| | - Nathália S Streher
- Department of Biological Sciences, University of Pittsburgh, Pittsburgh, PA,15260, United States of America
| | - Patricia A Ferreira
- Environmental Sciences Department, Federal University of São Carlos, São Paulo, Brazil
| | | | - Pietro K Maruyama
- Centro de Síntese Ecológica e Conservação, Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Rafael C Borges
- Instituto Tecnológico Vale. Rua Boaventura da Silva, 955, 66055-900, Belém, Pará, Brazil
| | - Tereza C Giannini
- Instituto Tecnológico Vale. Rua Boaventura da Silva, 955, 66055-900, Belém, Pará, Brazil
| | - Vinícius L G Brito
- Instituto de Biologia, Universidade Federal de Uberlândia, Rua Ceará sn, Uberlândia, Minas Gerais, 38.405-302, Brazil
| |
Collapse
|
33
|
Yao E, Blake VC, Cooper L, Wight CP, Michel S, Cagirici HB, Lazo GR, Birkett CL, Waring DJ, Jannink JL, Holmes I, Waters AJ, Eickholt DP, Sen TZ. GrainGenes: a data-rich repository for small grains genetics and genomics. Database (Oxford) 2022; 2022:6591224. [PMID: 35616118 PMCID: PMC9216595 DOI: 10.1093/database/baac034] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 04/01/2022] [Accepted: 04/26/2022] [Indexed: 05/16/2023]
Abstract
As one of the US Department of Agriculture-Agricultural Research Service flagship databases, GrainGenes (https://wheat.pw.usda.gov) serves the data and community needs of globally distributed small grains researchers for the genetic improvement of the Triticeae family and Avena species that include wheat, barley, rye and oat. GrainGenes accomplishes its mission by continually enriching its cross-linked data content following the findable, accessible, interoperable and reusable principles, enhancing and maintaining an intuitive web interface, creating tools to enable easy data access and establishing data connections within and between GrainGenes and other biological databases to facilitate knowledge discovery. GrainGenes operates within the biological database community, collaborates with curators and genome sequencing groups and contributes to the AgBioData Consortium and the International Wheat Initiative through the Wheat Information System (WheatIS). Interactive and linked content is paramount for successful biological databases and GrainGenes now has 2917 manually curated gene records, including 289 genes and 254 alleles from the Wheat Gene Catalogue (WGC). There are >4.8 million gene models in 51 genome browser assemblies, 6273 quantitative trait loci and >1.4 million genetic loci on 4756 genetic and physical maps contained within 443 mapping sets, complete with standardized metadata. Most notably, 50 new genome browsers that include outputs from the Wheat and Barley PanGenome projects have been created. We provide an example of an expression quantitative trait loci track on the International Wheat Genome Sequencing Consortium Chinese Spring wheat browser to demonstrate how genome browser tracks can be adapted for different data types. To help users benefit more from its data, GrainGenes created four tutorials available on YouTube. GrainGenes is executing its vision of service by continuously responding to the needs of the global small grains community by creating a centralized, long-term, interconnected data repository. Database URL:https://wheat.pw.usda.gov.
Collapse
Affiliation(s)
- Eric Yao
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
- Department of Bioengineering, University of California, Stanley Hall, Berkeley, CA 94720-1762, USA
| | - Victoria C Blake
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
- Department of Plant Sciences and Plant Pathology, Montana State University, 119 Plant Biosciences Building, Bozeman, MT 59717, USA
| | - Laurel Cooper
- Department of Botany and Plant Pathology, Oregon State University, 1500 SW Jefferson Way, Corvallis, OR 97331, USA
| | - Charlene P Wight
- Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, 960 Carling Ave., Ottawa, ON K1A 0C6, Canada
| | - Steve Michel
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
| | - H Busra Cagirici
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
| | - Gerard R Lazo
- United States Department of Agriculture—Agricultural Research Service, Western Regional Research Center, Crop Improvement and Genetics Research Unit, 800 Buchanan St., Albany, CA 94710, USA
| | - Clay L Birkett
- United States Department of Agriculture—Agricultural Research Service, Robert Holley Center, 538 Tower Rd., Ithaca, NY 14853, USA
| | - David J Waring
- Section of Plant Breeding and Genetics, Cornell University, Bradfield Hall, 306 Tower Rd, Ithaca, NY 14853, USA
| | - Jean-Luc Jannink
- United States Department of Agriculture—Agricultural Research Service, Robert Holley Center, 538 Tower Rd., Ithaca, NY 14853, USA
- Section of Plant Breeding and Genetics, Cornell University, Bradfield Hall, 306 Tower Rd, Ithaca, NY 14853, USA
| | - Ian Holmes
- Department of Bioengineering, University of California, Stanley Hall, Berkeley, CA 94720-1762, USA
| | - Amanda J Waters
- PepsiCo R&D, 1991 Upper Buford Circle, 210 Borlaug Hall, St. Paul, MN 55108, USA
| | - David P Eickholt
- PepsiCo R&D, 1991 Upper Buford Circle, 210 Borlaug Hall, St. Paul, MN 55108, USA
| | - Taner Z Sen
- *Corresponding author: Tel: +1 (510) 559-5982; Fax: + 1 (510) 559-5963;
| |
Collapse
|
34
|
Sheng M, Ma X, Wang J, Xue T, Li Z, Cao Y, Yu X, Zhang X, Wang Y, Xu W, Su Z. KNOX II transcription factor HOS59 functions in regulating rice grain size. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2022; 110:863-880. [PMID: 35167131 DOI: 10.1111/tpj.15709] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Revised: 01/30/2022] [Accepted: 02/10/2022] [Indexed: 06/14/2023]
Abstract
Plant Knotted1-like homeobox (KNOX) genes encode homeodomain-containing transcription factors. In rice (Oryza sativa L.), little is known about the downstream target genes of KNOX Class II subfamily proteins. Here we generated chromatin immunoprecipitation (ChIP)-sequencing datasets for HOS59, a member of the rice KNOX Class II subfamily, and characterized the genome-wide binding sites of HOS59. We conducted trait ontology (TO) analysis of 9705 identified downstream target genes, and found that multiple TO terms are related to plant structure morphology and stress traits. ChIP-quantitative PCR (qPCR) was conducted to validate some key target genes. Meanwhile, our IP-MS datasets showed that HOS59 was closely associated with BELL family proteins, some grain size regulators (OsSPL13, OsSPL16, OsSPL18, SLG, etc.), and some epigenetic modification factors such as OsAGO4α and OsAGO4β, proteins involved in small interfering RNA-mediated gene silencing. Furthermore, we employed CRISPR/Cas9 editing and transgenic approaches to generate hos59 mutants and overexpression lines, respectively. Compared with wild-type plants, the hos59 mutants have longer grains and increased glume cell length, a loose plant architecture, and drooping leaves, while the overexpression lines showed smaller grain size, erect leaves, and lower plant height. The qRT-PCR results showed that mutation of the HOS59 gene led to upregulation of some grain size-related genes such as OsSPL13, OsSPL18, and PGL2. In summary, our results indicate that HOS59 may be a repressor of the downstream target genes, negatively regulating glume cell length, rice grain size, plant architecture, etc. The identified downstream target genes and possible interaction proteins of HOS59 improve our understanding of the KNOX regulatory networks.
Collapse
Affiliation(s)
- Minghao Sheng
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xuelian Ma
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Jiyao Wang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Tianxi Xue
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Zhongqiu Li
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Yaxin Cao
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xinyue Yu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Xinyi Zhang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Yonghong Wang
- State Key Laboratory of Plant Genomics and National Center for Plant Gene Research (Beijing), Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, 100101, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
35
|
Kang S, Kim KT, Choi J, Kim H, Cheong K, Bandara A, Lee YH. Genomics and Informatics, Conjoined Tools Vital for Understanding and Protecting Plant Health. PHYTOPATHOLOGY 2022; 112:981-995. [PMID: 34889667 DOI: 10.1094/phyto-10-21-0418-rvw] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Genomics' impact on crop production continuously expands. The number of sequenced plant and microbial species and strains representing diverse populations of individual species rapidly increases thanks to the advent of next-generation sequencing technologies. Their genomic blueprints revealed candidate genes involved in various functions and processes crucial for crop health and helped in understanding how the sequenced organisms have evolved at the genome level. Functional genomics quickly translates these blueprints into a detailed mechanistic understanding of how such functions and processes work and are regulated; this understanding guides and empowers efforts to protect crops from diverse biotic and abiotic threats. Metagenome analyses help identify candidate microbes crucial for crop health and uncover how microbial communities associated with crop production respond to environmental conditions and cultural practices, presenting opportunities to enhance crop health by judiciously configuring microbial communities. Efficient conversion of disparate types of massive genomics data into actionable knowledge requires a robust informatics infrastructure supporting data preservation, analysis, and sharing. This review starts with an overview of how genomics came about and has quickly transformed life science. We illuminate how genomics and informatics can be applied to investigate various crop health-related problems using selected studies. We end the review by noting why community empowerment via crowdsourcing is crucial to harnessing genomics to protect global food and nutrition security without continuously expanding the environmental footprint of crop production.
Collapse
Affiliation(s)
- Seogchan Kang
- Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University, University Park, PA 16802, U.S.A
| | - Ki-Tae Kim
- Department of Agricultural Life Science, Sunchon National University, Suncheon 57922, Korea
| | - Jaeyoung Choi
- Korea Institute of Science and Technology Gangneung Institute of Natural Products, Gangneung 25451, Korea
| | - Hyun Kim
- Department of Agricultural Biotechnology, Seoul National University, Seoul 08826, Korea
| | - Kyeongchae Cheong
- Plant Immunity Research Center, Seoul National University, Seoul 08826, Korea
| | - Ananda Bandara
- Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University, University Park, PA 16802, U.S.A
| | - Yong-Hwan Lee
- Department of Agricultural Biotechnology, Seoul National University, Seoul 08826, Korea
- Plant Immunity Research Center, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
36
|
Petereit J, Marsh JI, Bayer PE, Danilevicz MF, Thomas WJW, Batley J, Edwards D. Genetic and Genomic Resources for Soybean Breeding Research. PLANTS (BASEL, SWITZERLAND) 2022; 11:1181. [PMID: 35567182 PMCID: PMC9101001 DOI: 10.3390/plants11091181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 04/21/2022] [Accepted: 04/22/2022] [Indexed: 11/17/2022]
Abstract
Soybean (Glycine max) is a legume species of significant economic and nutritional value. The yield of soybean continues to increase with the breeding of improved varieties, and this is likely to continue with the application of advanced genetic and genomic approaches for breeding. Genome technologies continue to advance rapidly, with an increasing number of high-quality genome assemblies becoming available. With accumulating data from marker arrays and whole-genome resequencing, studying variations between individuals and populations is becoming increasingly accessible. Furthermore, the recent development of soybean pangenomes has highlighted the significant structural variation between individuals, together with knowledge of what has been selected for or lost during domestication and breeding, information that can be applied for the breeding of improved cultivars. Because of this, resources such as genome assemblies, SNP datasets, pangenomes and associated databases are becoming increasingly important for research underlying soybean crop improvement.
Collapse
Affiliation(s)
| | - Jacob I. Marsh
- School of Biological Sciences, The University of Western Australia, Perth, WA 6009, Australia; (J.P.); (J.I.M.); (P.E.B.); (M.F.D.); (W.J.W.T.); (J.B.)
| | | | | | | | | | - David Edwards
- School of Biological Sciences, The University of Western Australia, Perth, WA 6009, Australia; (J.P.); (J.I.M.); (P.E.B.); (M.F.D.); (W.J.W.T.); (J.B.)
| |
Collapse
|
37
|
Eid R, Landès C, Pernet A, Benoît E, Santagostini P, Ghaziri AE, Bourbeillon J. DIVIS: a semantic DIstance to improve the VISualisation of heterogeneous phenotypic datasets. BioData Min 2022; 15:10. [PMID: 35379292 PMCID: PMC8981856 DOI: 10.1186/s13040-022-00293-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/27/2022] [Indexed: 11/24/2022] Open
Abstract
Background Thanks to the wider spread of high-throughput experimental techniques, biologists are accumulating large amounts of datasets which often mix quantitative and qualitative variables and are not always complete, in particular when they regard phenotypic traits. In order to get a first insight into these datasets and reduce the data matrices size scientists often rely on multivariate analysis techniques. However such approaches are not always easily practicable in particular when faced with mixed datasets. Moreover displaying large numbers of individuals leads to cluttered visualisations which are difficult to interpret. Results We introduced a new methodology to overcome these limits. Its main feature is a new semantic distance tailored for both quantitative and qualitative variables which allows for a realistic representation of the relationships between individuals (phenotypic descriptions in our case). This semantic distance is based on ontologies which are engineered to represent real-life knowledge regarding the underlying variables. For easier handling by biologists, we incorporated its use into a complete tool, from raw data file to visualisation. Following the distance calculation, the next steps performed by the tool consist in (i) grouping similar individuals, (ii) representing each group by emblematic individuals we call archetypes and (iii) building sparse visualisations based on these archetypes. Our approach was implemented as a Python pipeline and applied to a rosebush dataset including passport and phenotypic data. Conclusions The introduction of our new semantic distance and of the archetype concept allowed us to build a comprehensive representation of an incomplete dataset characterised by a large proportion of qualitative data. The methodology described here could have wider use beyond information characterizing organisms or species and beyond plant science. Indeed we could apply the same approach to any mixed dataset. Supplementary Information The online version contains supplementary material available at (10.1186/s13040-022-00293-y).
Collapse
Affiliation(s)
- Rayan Eid
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France
| | - Claudine Landès
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France
| | - Alix Pernet
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France
| | | | | | | | - Julie Bourbeillon
- Institut Agro, Univ Angers, INRAE, IRHS, SFR QuaSaV, Angers, 49000, France.
| |
Collapse
|
38
|
Gupta P, Naithani S, Preece J, Kim S, Cheng T, D'Eustachio P, Elser J, Bolton EE, Jaiswal P. Plant Reactome and PubChem: The Plant Pathway and (Bio)Chemical Entity Knowledgebases. Methods Mol Biol 2022; 2443:511-525. [PMID: 35037224 DOI: 10.1007/978-1-0716-2067-0_27] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Plant Reactome (https://plantreactome.gramene.org) and PubChem ( https://pubchem.ncbi.nlm.nih.gov ) are two reference data portals and resources for curated plant pathways, small molecules, metabolites, gene products, and macromolecular interactions. Plant Reactome knowledgebase, a conceptual plant pathway network, is built by biocuration and integrating (bio)chemical entities, gene products, and macromolecular interactions. It provides manually curated pathways for the reference species Oryza sativa (rice) and gene orthology-based projections that extend pathway knowledge to 106 plant species. Currently, it hosts 320 reference pathways for plant metabolism, hormone signaling, transport, genetic regulation, plant organ development and differentiation, and biotic and abiotic stress responses. In addition to the pathway browsing and search functions, the Plant Reactome provides the analysis tools for pathway comparison between reference and projected species, pathway enrichment in gene expression data, and overlay of gene-gene interaction data on pathways. PubChem, a popular reference database of (bio)chemical entities, provides information on small molecules and other types of chemical entities, such as siRNAs, miRNAs, lipids, carbohydrates, and chemically modified nucleotides. The data in PubChem is collected from hundreds of data sources, including Plant Reactome. This chapter provides a brief overview of the Plant Reactome and the PubChem knowledgebases, their association to other public resources providing accessory information, and how users can readily access the contents.
Collapse
Affiliation(s)
- Parul Gupta
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Sunghwan Kim
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | | | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA
| | - Evan E Bolton
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA.
| |
Collapse
|
39
|
Larmande P, Tagny Ngompe G, Venkatesan A, Ruiz M. AgroLD: A Knowledge Graph Database for Plant Functional Genomics. Methods Mol Biol 2022; 2443:527-540. [PMID: 35037225 DOI: 10.1007/978-1-0716-2067-0_28] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent advances in high-throughput technologies have resulted in tremendous increase in the amount of data in the agronomic domain. There is an urgent need to effectively integrate complementary information to understand the biological system in its entirety. We have developed AgroLD, a knowledge graph that exploits the Semantic Web technology and some of the relevant standard domain ontologies, to integrate information on plant species and in this way facilitating the formulation of new scientific hypotheses. This chapter outlines some integration results of the project, which initially focused on genomics, proteomics and phenomics.
Collapse
Affiliation(s)
- Pierre Larmande
- DIADE, IRD, CIRAD, Univ. Montpellier, Montpellier, France.
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France.
| | - Gildas Tagny Ngompe
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- AGAP, CIRAD, INRAE, Univ. Montpellier, av Agropolis, Montpellier, France
| | | | - Manuel Ruiz
- French Institute of Bioinformatics (IFB)-South Green Bioinformatics Platform, Bioversity, CIRAD, INRAE, IRD, Montpellier, France
- AGAP, CIRAD, INRAE, Univ. Montpellier, av Agropolis, Montpellier, France
| |
Collapse
|
40
|
Yu J, Jung S, Cheng CH, Lee T, Zheng P, Buble K, Crabb J, Humann J, Hough H, Jones D, Campbell JT, Udall J, Main D. CottonGen: The Community Database for Cotton Genomics, Genetics, and Breeding Research. PLANTS (BASEL, SWITZERLAND) 2021; 10:plants10122805. [PMID: 34961276 PMCID: PMC8705096 DOI: 10.3390/plants10122805] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Revised: 12/11/2021] [Accepted: 12/12/2021] [Indexed: 05/12/2023]
Abstract
Over the last eight years, the volume of whole genome, gene expression, SNP genotyping, and phenotype data generated by the cotton research community has exponentially increased. The efficient utilization/re-utilization of these complex and large datasets for knowledge discovery, translation, and application in crop improvement requires them to be curated, integrated with other types of data, and made available for access and analysis through efficient online search tools. Initiated in 2012, CottonGen is an online community database providing access to integrated peer-reviewed cotton genomic, genetic, and breeding data, and analysis tools. Used by cotton researchers worldwide, and managed by experts with crop-specific knowledge, it continuous to be the logical choice to integrate new data and provide necessary interfaces for information retrieval. The repository in CottonGen contains colleague, gene, genome, genotype, germplasm, map, marker, metabolite, phenotype, publication, QTL, species, transcriptome, and trait data curated by the CottonGen team. The number of data entries housed in CottonGen has increased dramatically, for example, since 2014 there has been an 18-fold increase in genes/mRNAs, a 23-fold increase in whole genomes, and a 372-fold increase in genotype data. New tools include a genetic map viewer, a genome browser, a synteny viewer, a metabolite pathways browser, sequence retrieval, BLAST, and a breeding information management system (BIMS), as well as various search pages for new data types. CottonGen serves as the home to the International Cotton Genome Initiative, managing its elections and serving as a communication and coordination hub for the community. With its extensive curation and integration of data and online tools, CottonGen will continue to facilitate utilization of its critical resources to empower research for cotton crop improvement.
Collapse
Affiliation(s)
- Jing Yu
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Sook Jung
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Chun-Huai Cheng
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Taein Lee
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Ping Zheng
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Katheryn Buble
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - James Crabb
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Jodi Humann
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Heidi Hough
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
| | - Don Jones
- Cotton Incorporated, Cary, NC 27513, USA;
| | - J. Todd Campbell
- The Agricultural Research Service of U.S. Department of Agriculture, Florence, SC 29501, USA;
| | - Josh Udall
- The Agricultural Research Service of U.S. Department of Agriculture, College Station, TX 77845, USA;
| | - Dorrie Main
- Department of Horticulture, Washington State University, Pullman, WA 99164, USA; (J.Y.); (S.J.); (C.-H.C.); (T.L.); (P.Z.); (K.B.); (J.C.); (J.H.); (H.H.)
- Correspondence: ; Tel.: +1-509-335-2774
| |
Collapse
|
41
|
Depuydt T, Vandepoele K. Multi-omics network-based functional annotation of unknown Arabidopsis genes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 108:1193-1212. [PMID: 34562334 DOI: 10.1111/tpj.15507] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/20/2021] [Indexed: 06/13/2023]
Abstract
Unraveling gene function is pivotal to understanding the signaling cascades that control plant development and stress responses. As experimental profiling is costly and labor intensive, there is a clear need for high-confidence computational annotation. In contrast to detailed gene-specific functional information, transcriptomics data are widely available for both model and crop species. Here, we describe a novel automated function prediction method, which leverages complementary information from multiple expression datasets by analyzing study-specific gene co-expression networks. First, we benchmarked the prediction performance on recently characterized Arabidopsis thaliana genes, and showed that our method outperforms state-of-the-art expression-based approaches. Next, we predicted biological process annotations for known (n = 15 790) and unknown (n = 11 865) genes in A. thaliana and validated our predictions using experimental protein-DNA and protein-protein interaction data (covering >220 000 interactions in total), obtaining a set of high-confidence functional annotations. Our method assigned at least one validated annotation to 5054 (42.6%) unknown genes, and at least one novel validated function to 3408 (53.0%) genes with computational annotations only. These omics-supported functional annotations shed light on a variety of developmental processes and molecular responses, such as flower and root development, defense responses to fungi and bacteria, and phytohormone signaling, and help fill the information gap on biological process annotations in Arabidopsis. An in-depth analysis of two context-specific networks, modeling seed development and response to water deprivation, shows how previously uncharacterized genes function within the respective networks. Moreover, our automated function prediction approach can be applied in future studies to facilitate gene discovery for crop improvement.
Collapse
Affiliation(s)
- Thomas Depuydt
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
| | - Klaas Vandepoele
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Center for Plant Systems Biology, Vlaams Instituut voor Biotechnologie, Ghent, Belgium
- Bioinformatics Institute Ghent, Ghent University, Ghent, Belgium
| |
Collapse
|
42
|
Volk GM, Byrne PF, Coyne CJ, Flint-Garcia S, Reeves PA, Richards C. Integrating Genomic and Phenomic Approaches to Support Plant Genetic Resources Conservation and Use. PLANTS (BASEL, SWITZERLAND) 2021; 10:2260. [PMID: 34834625 PMCID: PMC8619436 DOI: 10.3390/plants10112260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 10/20/2021] [Accepted: 10/20/2021] [Indexed: 05/17/2023]
Abstract
Plant genebanks provide genetic resources for breeding and research programs worldwide. These programs benefit from having access to high-quality, standardized phenotypic and genotypic data. Technological advances have made it possible to collect phenomic and genomic data for genebank collections, which, with the appropriate analytical tools, can directly inform breeding programs. We discuss the importance of considering genebank accession homogeneity and heterogeneity in data collection and documentation. Citing specific examples, we describe how well-documented genomic and phenomic data have met or could meet the needs of plant genetic resource managers and users. We explore future opportunities that may emerge from improved documentation and data integration among plant genetic resource information systems.
Collapse
Affiliation(s)
- Gayle M. Volk
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| | - Patrick F. Byrne
- Department of Soil and Crop Sciences, Colorado State University, Fort Collins, CO 80523, USA;
| | - Clarice J. Coyne
- United States Department of Agriculture, Agricultural Research Service, Western Regional Plant Introduction Station, Pullman, WA 99164, USA;
| | - Sherry Flint-Garcia
- Plant Genetics Research Unit, United States Department of Agriculture, Agricultural Research Service, Columbia, MO 65211, USA;
| | - Patrick A. Reeves
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| | - Chris Richards
- United States Department of Agriculture, Agricultural Research Service, National Laboratory for Genetic Resources Preservation, Fort Collins, CO 80521, USA; (P.A.R.); (C.R.)
| |
Collapse
|
43
|
Ma X, Yan H, Yang J, Liu Y, Li Z, Sheng M, Cao Y, Yu X, Yi X, Xu W, Su Z. PlantGSAD: a comprehensive gene set annotation database for plant species. Nucleic Acids Res 2021; 50:D1456-D1467. [PMID: 34534340 PMCID: PMC8728169 DOI: 10.1093/nar/gkab794] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Revised: 08/26/2021] [Accepted: 09/01/2021] [Indexed: 12/17/2022] Open
Abstract
With the accumulation of massive data sets from high-throughput experiments and the rapid emergence of new types of omics data, gene sets have become more diverse and essential for the refinement of gene annotation at multidimensional levels. Accordingly, we collected and defined 236 007 gene sets across different categories for 44 plant species in the Plant Gene Set Annotation Database (PlantGSAD). These gene sets were divided into nine main categories covering many functional subcategories, such as trait ontology, co-expression modules, chromatin states, and liquid-liquid phase separation. The annotations from the collected gene sets covered all of the genes in the Brassicaceae species Arabidopsis and Poaceae species Oryza sativa. Several GSEA tools are implemented in PlantGSAD to improve the efficiency of the analysis, including custom SEA for a flexible strategy based on customized annotations, SEACOMPARE for the cross-comparison of SEA results, and integrated visualization features for ontological analysis that intuitively reflects their parent-child relationships. In summary, PlantGSAD provides numerous gene sets for multiple plant species and highly efficient analysis tools. We believe that PlantGSAD will become a multifunctional analysis platform that can be used to predict and elucidate the functions and mechanisms of genes of interest. PlantGSAD is publicly available at http://systemsbiology.cau.edu.cn/PlantGSEAv2/.
Collapse
Affiliation(s)
- Xuelian Ma
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Hengyu Yan
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jiaotong Yang
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yue Liu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhongqiu Li
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Minghao Sheng
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Yaxin Cao
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xinyue Yu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Xin Yi
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Wenying Xu
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Zhen Su
- State Key Laboratory of Plant Physiology and Biochemistry, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| |
Collapse
|
44
|
Zogopoulos VL, Saxami G, Malatras A, Angelopoulou A, Jen CH, Duddy WJ, Daras G, Hatzopoulos P, Westhead DR, Michalopoulos I. Arabidopsis Coexpression Tool: a tool for gene coexpression analysis in Arabidopsis thaliana. iScience 2021; 24:102848. [PMID: 34381973 PMCID: PMC8334378 DOI: 10.1016/j.isci.2021.102848] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 06/23/2021] [Accepted: 07/08/2021] [Indexed: 02/08/2023] Open
Abstract
Gene coexpression analysis refers to the discovery of sets of genes which exhibit similar expression patterns across multiple transcriptomic data sets, such as microarray experiment data of public repositories. Arabidopsis Coexpression Tool (ACT), a gene coexpression analysis web tool for Arabidopsis thaliana, identifies genes which are correlated to a driver gene. Primary microarray data from ATH1 Affymetrix platform were processed with Single-Channel Array Normalization algorithm and combined to produce a coexpression tree which contains ∼21,000 A. thaliana genes. ACT was developed to present subclades of coexpressed genes, as well as to perform gene set enrichment analysis, being unique in revealing enriched transcription factors targeting coexpressed genes. ACT offers a simple and user-friendly interface producing working hypotheses which can be experimentally verified for the discovery of gene partnership, pathway membership, and transcriptional regulation. ACT analyses have been successful in identifying not only genes with coordinated ubiquitous expressions but also genes with tissue-specific expressions.
Collapse
Affiliation(s)
- Vasileios L. Zogopoulos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Georgia Saxami
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| | - Apostolos Malatras
- Center for Research in Myology, Sorbonne Université, Paris 75013, France
| | - Antonia Angelopoulou
- Department of Biotechnology, Agricultural University of Athens, Athens 11855, Greece
| | - Chih-Hung Jen
- Cold Spring Biotech Corp, Da Hu Science Park, New Taipei City, Taiwan
| | - William J. Duddy
- Center for Research in Myology, Sorbonne Université, Paris 75013, France
- Northern Ireland Centre for Stratified Medicine, Altnagelvin Hospital Campus, Ulster University, Londonderry BT52 1SJ, UK
| | - Gerasimos Daras
- Department of Biotechnology, Agricultural University of Athens, Athens 11855, Greece
| | | | - David R. Westhead
- School of Molecular and Cellular Biology, Faculty of Biological Sciences, University of Leeds, Leeds LS2 9JT, UK
| | - Ioannis Michalopoulos
- Centre of Systems Biology, Biomedical Research Foundation, Academy of Athens, Athens 11527, Greece
| |
Collapse
|
45
|
Ramírez-Andreotta MD, Walls R, Youens-Clark K, Blumberg K, Isaacs KE, Kaufmann D, Maier RM. Alleviating Environmental Health Disparities Through Community Science and Data Integration. FRONTIERS IN SUSTAINABLE FOOD SYSTEMS 2021; 5. [PMID: 35664667 PMCID: PMC9165534 DOI: 10.3389/fsufs.2021.620470] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Environmental contamination is a fundamental determinant of health and well-being, and when the environment is compromised, vulnerabilities are generated. The complex challenges associated with environmental health and food security are influenced by current and emerging political, social, economic, and environmental contexts. To solve these “wicked” dilemmas, disparate public health surveillance efforts are conducted by local, state, and federal agencies. More recently, citizen/community science (CS) monitoring efforts are providing site-specific data. One of the biggest challenges in using these government datasets, let alone incorporating CS data, for a holistic assessment of environmental exposure is data management and interoperability. To facilitate a more holistic perspective and approach to solution generation, we have developed a method to provide a common data model that will allow environmental health researchers working at different scales and research domains to exchange data and ask new questions. We anticipate that this method will help to address environmental health disparities, which are unjust and avoidable, while ensuring CS datasets are ethically integrated to achieve environmental justice. Specifically, we used a transdisciplinary research framework to develop a methodology to integrate CS data with existing governmental environmental monitoring and social attribute data (vulnerability and resilience variables) that span across 10 different federal and state agencies. A key challenge in integrating such different datasets is the lack of widely adopted ontologies for vulnerability and resiliency factors. In addition to following the best practice of submitting new term requests to existing ontologies to fill gaps, we have also created an application ontology, the Superfund Research Project Data Interface Ontology (SRPDIO).
Collapse
Affiliation(s)
- Mónica D. Ramírez-Andreotta
- Department of Environmental Science, University of Arizona, Tucson, AZ, United States
- Mel and Enid Zuckerman College of Public Health’s Division of Community, Environment and Policy, University of Arizona, Tucson, AZ, United States
- Correspondence: Mónica D. Ramírez-Andreotta
| | - Ramona Walls
- BIO5 Institute, University of Arizona, Tucson, AZ, United States
| | - Ken Youens-Clark
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Kai Blumberg
- Department of Biosystems Engineering, University of Arizona, Tucson, AZ, United States
| | - Katherine E. Isaacs
- Department of Computer Science, University of Arizona, Tucson, AZ, United States
| | - Dorsey Kaufmann
- Department of Environmental Science, University of Arizona, Tucson, AZ, United States
| | - Raina M. Maier
- Department of Environmental Science, University of Arizona, Tucson, AZ, United States
| |
Collapse
|
46
|
Marsh JI, Hu H, Gill M, Batley J, Edwards D. Crop breeding for a changing climate: integrating phenomics and genomics with bioinformatics. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1677-1690. [PMID: 33852055 DOI: 10.1007/s00122-021-03820-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 03/18/2021] [Indexed: 05/05/2023]
Abstract
Safeguarding crop yields in a changing climate requires bioinformatics advances in harnessing data from vast phenomics and genomics datasets to translate research findings into climate smart crops in the field. Climate change and an additional 3 billion mouths to feed by 2050 raise serious concerns over global food security. Crop breeding and land management strategies will need to evolve to maximize the utilization of finite resources in coming years. High-throughput phenotyping and genomics technologies are providing researchers with the information required to guide and inform the breeding of climate smart crops adapted to the environment. Bioinformatics has a fundamental role to play in integrating and exploiting this fast accumulating wealth of data, through association studies to detect genomic targets underlying key adaptive climate-resilient traits. These data provide tools for breeders to tailor crops to their environment and can be introduced using advanced selection or genome editing methods. To effectively translate research into the field, genomic and phenomic information will need to be integrated into comprehensive clade-specific databases and platforms alongside accessible tools that can be used by breeders to inform the selection of climate adaptive traits. Here we discuss the role of bioinformatics in extracting, analysing, integrating and managing genomic and phenomic data to improve climate resilience in crops, including current, emerging and potential approaches, applications and bottlenecks in the research and breeding pipeline.
Collapse
Affiliation(s)
- Jacob I Marsh
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - Haifei Hu
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - Mitchell Gill
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - Jacqueline Batley
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia
| | - David Edwards
- School of Biological Sciences and Institute of Agriculture, The University of Western Australia, Perth, 6009, Australia.
| |
Collapse
|
47
|
Sokoloff DD, Remizowa MV. The use of plant ontologies in comparative and evolutionary studies should be flexible. AMERICAN JOURNAL OF BOTANY 2021; 108:909-911. [PMID: 34157126 DOI: 10.1002/ajb2.1692] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Accepted: 03/10/2021] [Indexed: 06/13/2023]
Affiliation(s)
- Dmitry D Sokoloff
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, 119234, Russia
| | - Margarita V Remizowa
- Faculty of Biology, M.V. Lomonosov Moscow State University, Moscow, 119234, Russia
| |
Collapse
|
48
|
Smith DT, Potgieter AB, Chapman SC. Scaling up high-throughput phenotyping for abiotic stress selection in the field. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:1845-1866. [PMID: 34076731 DOI: 10.1007/s00122-021-03864-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Accepted: 05/13/2021] [Indexed: 05/18/2023]
Abstract
High-throughput phenotyping (HTP) is in its infancy for deployment in large-scale breeding programmes. With the ability to measure correlated traits associated with physiological ideotypes, in-field phenotyping methods are available for screening of abiotic stress responses. As cropping environments become more hostile and unpredictable due to the effects of climate change, the need to characterise variability across spatial and temporal scales will become increasingly important. The sensor technologies that have enabled HTP from macroscopic through to satellite sensors may also be utilised here to complement spatial characterisation using envirotyping, which can improve estimations of genotypic performance across environments by better accounting for variation at the plot, trial and inter-trial levels. Climate change is leading to increased variation at all physical and temporal scales in the cropping environment. Maintaining yield stability under circumstances with greater levels of abiotic stress while capitalising upon yield potential in good years, requires approaches to plant breeding that target the physiological limitations to crop performance in specific environments. This requires dynamic modelling of conditions within target populations of environments, GxExM predictions, clustering of environments so breeding trajectories can be defined, and the development of screens that enable selection for genetic gain to occur. High-throughput phenotyping (HTP), combined with related technologies used for envirotyping, can help to address these challenges. Non-destructive analysis of the morphological, biochemical and physiological qualities of plant canopies using HTP has great potential to complement whole-genome selection, which is becoming increasingly common in breeding programmes. A range of novel analytic techniques, such as machine learning and deep learning, combined with a widening range of sensors, allow rapid assessment of large breeding populations that are repeatable and objective. Secondary traits underlying radiation use efficiency and water use efficiency can be screened with HTP for selection at the early stages of a breeding programme. HTP and envirotyping technologies can also characterise spatial variability at trial and within-plot levels, which can be used to correct for spatial variations that confound measurements of genotypic values. This review explores HTP for abiotic stress selection through a physiological trait lens and additionally investigates the use of envirotyping and EC to characterise spatial variability at all physical scales in METs.
Collapse
Affiliation(s)
- Daniel T Smith
- The University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia
| | - Andries B Potgieter
- Centre for Crop Science, Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane, QLD, 4072, Australia
| | - Scott C Chapman
- The University of Queensland, St Lucia, Brisbane, QLD, 4072, Australia.
| |
Collapse
|
49
|
Tello-Ruiz MK, Naithani S, Gupta P, Olson A, Wei S, Preece J, Jiao Y, Wang B, Chougule K, Garg P, Elser J, Kumari S, Kumar V, Contreras-Moreira B, Naamati G, George N, Cook J, Bolser D, D'Eustachio P, Stein LD, Gupta A, Xu W, Regala J, Papatheodorou I, Kersey PJ, Flicek P, Taylor C, Jaiswal P, Ware D. Gramene 2021: harnessing the power of comparative genomics and pathways for plant research. Nucleic Acids Res 2021; 49:D1452-D1463. [PMID: 33170273 DOI: 10.1093/nar/gkaa979] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 10/09/2020] [Indexed: 01/27/2023] Open
Abstract
Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
Collapse
Affiliation(s)
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Parul Gupta
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Yinping Jiao
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Priyanka Garg
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Vivek Kumar
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bruno Contreras-Moreira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Justin Cook
- Informatics and Bio-computing Program, Ontario Institute of Cancer Research, Toronto M5G 1L7, Canada
| | - Daniel Bolser
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK.,Current affiliation: Geromics Inc., Cambridge CB1 3NF, UK
| | - Peter D'Eustachio
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Lincoln D Stein
- Adaptive Oncology Program, Ontario Institute for Cancer Research, Toronto M5G 0A3, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Amit Gupta
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758, USA
| | - Weijia Xu
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758, USA
| | - Jennifer Regala
- American Society of Plant Biologists, Rockville, MD 20855-2768, USA.,Current affiliation: American Urological Association, Linthicum, MD 21090, USA
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Paul J Kersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK.,Current affiliation: Royal Botanic Gardens, Kew Richmond, Surrey TW9 3AE, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Crispin Taylor
- American Society of Plant Biologists, Rockville, MD 20855-2768, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.,USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| |
Collapse
|
50
|
Tello-Ruiz MK, Naithani S, Gupta P, Olson A, Wei S, Preece J, Jiao Y, Wang B, Chougule K, Garg P, Elser J, Kumari S, Kumar V, Contreras-Moreira B, Naamati G, George N, Cook J, Bolser D, D'Eustachio P, Stein LD, Gupta A, Xu W, Regala J, Papatheodorou I, Kersey PJ, Flicek P, Taylor C, Jaiswal P, Ware D. Gramene 2021: harnessing the power of comparative genomics and pathways for plant research. Nucleic Acids Res 2021; 49:D1452-D1463. [PMID: 33170273 DOI: 10.1093/nar/gkaa979/5973447] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 10/09/2020] [Indexed: 05/20/2023] Open
Abstract
Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.
Collapse
Affiliation(s)
| | - Sushma Naithani
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Parul Gupta
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Andrew Olson
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sharon Wei
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Justin Preece
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Yinping Jiao
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bo Wang
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Priyanka Garg
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Justin Elser
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Sunita Kumari
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Vivek Kumar
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Bruno Contreras-Moreira
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Guy Naamati
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Nancy George
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Justin Cook
- Informatics and Bio-computing Program, Ontario Institute of Cancer Research, Toronto M5G 1L7, Canada
| | - Daniel Bolser
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
- Current affiliation: Geromics Inc., Cambridge CB1 3NF, UK
| | - Peter D'Eustachio
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Lincoln D Stein
- Adaptive Oncology Program, Ontario Institute for Cancer Research, Toronto M5G 0A3, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Amit Gupta
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758, USA
| | - Weijia Xu
- Texas Advanced Computing Center, University of Texas at Austin, Austin, TX 78758, USA
| | - Jennifer Regala
- American Society of Plant Biologists, Rockville, MD 20855-2768, USA
- Current affiliation: American Urological Association, Linthicum, MD 21090, USA
| | - Irene Papatheodorou
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Paul J Kersey
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
- Current affiliation: Royal Botanic Gardens, Kew Richmond, Surrey TW9 3AE, UK
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Crispin Taylor
- American Society of Plant Biologists, Rockville, MD 20855-2768, USA
| | - Pankaj Jaiswal
- Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR 97331, USA
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
- USDA ARS NAA Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| |
Collapse
|