1
|
Wang S, Wei S, Deng Y, Wu S, Peng H, Qing Y, Zhai X, Zhou S, Li J, Li H, Feng Y, Yi Y, Li R, Zhang H, Wang Y, Zhang R, Ning L, Yao Y, Fei Z, Zheng Y. HortGenome Search Engine, a universal genomic search engine for horticultural crops. HORTICULTURE RESEARCH 2024; 11:uhae100. [PMID: 38863996 PMCID: PMC11165154 DOI: 10.1093/hr/uhae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 03/27/2024] [Indexed: 06/13/2024]
Abstract
Horticultural crops comprising fruit, vegetable, ornamental, beverage, medicinal and aromatic plants play essential roles in food security and human health, as well as landscaping. With the advances of sequencing technologies, genomes for hundreds of horticultural crops have been deciphered in recent years, providing a basis for understanding gene functions and regulatory networks and for the improvement of horticultural crops. However, these valuable genomic data are scattered in warehouses with various complex searching and displaying strategies, which increases learning and usage costs and makes comparative and functional genomic analyses across different horticultural crops very challenging. To this end, we have developed a lightweight universal search engine, HortGenome Search Engine (HSE; http://hort.moilab.net), which allows for the querying of genes, functional annotations, protein domains, homologs, and other gene-related functional information of more than 500 horticultural crops. In addition, four commonly used tools, including 'BLAST', 'Batch Query', 'Enrichment analysis', and 'Synteny Viewer' have been developed for efficient mining and analysis of these genomic data.
Collapse
Affiliation(s)
- Sen Wang
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Shangxiao Wei
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yuling Deng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Shaoyuan Wu
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Haixu Peng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - You Qing
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Xuyang Zhai
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Shijie Zhou
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Jinrong Li
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Hua Li
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yijian Feng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yating Yi
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Rui Li
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Hui Zhang
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yiding Wang
- College of Intelligent Science and Engineering, Beijing University of Agriculture, Beijing 102206, China
| | - Renlong Zhang
- College of Intelligent Science and Engineering, Beijing University of Agriculture, Beijing 102206, China
| | - Lu Ning
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
- Library, Beijing University of Agriculture, Beijing 102206, China
| | - Yuncong Yao
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Yi Zheng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| |
Collapse
|
2
|
Chen J, Scholz U, Zhou R, Lange M. LAILAPS-QSM: A RESTful API and JAVA library for semantic query suggestions. PLoS Comput Biol 2018; 14:e1006058. [PMID: 29529024 PMCID: PMC5871001 DOI: 10.1371/journal.pcbi.1006058] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 03/27/2018] [Accepted: 02/23/2018] [Indexed: 11/19/2022] Open
Abstract
In order to access and filter content of life-science databases, full text search is a widely applied query interface. But its high flexibility and intuitiveness is paid for with potentially imprecise and incomplete query results. To reduce this drawback, query assistance systems suggest those combinations of keywords with the highest potential to match most of the relevant data records. Widespread approaches are syntactic query corrections that avoid misspelling and support expansion of words by suffixes and prefixes. Synonym expansion approaches apply thesauri, ontologies, and query logs. All need laborious curation and maintenance. Furthermore, access to query logs is in general restricted. Approaches that infer related queries by their query profile like research field, geographic location, co-authorship, affiliation etc. require user's registration and its public accessibility that contradict privacy concerns. To overcome these drawbacks, we implemented LAILAPS-QSM, a machine learning approach that reconstruct possible linguistic contexts of a given keyword query. The context is referred from the text records that are stored in the databases that are going to be queried or extracted for a general purpose query suggestion from PubMed abstracts and UniProt data. The supplied tool suite enables the pre-processing of these text records and the further computation of customized distributed word vectors. The latter are used to suggest alternative keyword queries. An evaluated of the query suggestion quality was done for plant science use cases. Locally present experts enable a cost-efficient quality assessment in the categories trait, biological entity, taxonomy, affiliation, and metabolic function which has been performed using ontology term similarities. LAILAPS-QSM mean information content similarity for 15 representative queries is 0.70, whereas 34% have a score above 0.80. In comparison, the information content similarity for human expert made query suggestions is 0.90. The software is either available as tool set to build and train dedicated query suggestion services or as already trained general purpose RESTful web service. The service uses open interfaces to be seamless embeddable into database frontends. The JAVA implementation uses highly optimized data structures and streamlined code to provide fast and scalable response for web service calls. The source code of LAILAPS-QSM is available under GNU General Public License version 2 in Bitbucket GIT repository: https://bitbucket.org/ipk_bit_team/bioescorte-suggestion.
Collapse
Affiliation(s)
- Jinbo Chen
- Research Group Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland OT Gatersleben, Germany
| | - Uwe Scholz
- Research Group Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland OT Gatersleben, Germany
| | - Ruonan Zhou
- Research Group Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland OT Gatersleben, Germany
| | - Matthias Lange
- Research Group Bioinformatics and Information Technology, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland OT Gatersleben, Germany
| |
Collapse
|
3
|
Schmutzer T, Bolger ME, Rudd S, Chen J, Gundlach H, Arend D, Oppermann M, Weise S, Lange M, Spannagl M, Usadel B, Mayer KFX, Scholz U. Bioinformatics in the plant genomic and phenomic domain: The German contribution to resources, services and perspectives. J Biotechnol 2017; 261:37-45. [PMID: 28698099 DOI: 10.1016/j.jbiotec.2017.07.006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Revised: 06/30/2017] [Accepted: 07/04/2017] [Indexed: 10/19/2022]
Abstract
Plant genetic resources are a substantial opportunity for plant breeding, preservation and maintenance of biological diversity. As part of the German Network for Bioinformatics Infrastructure (de.NBI) the German Crop BioGreenformatics Network (GCBN) focuses mainly on crop plants and provides both data and software infrastructure which are tailored to the needs of the plant research community. Our mission and key objectives include: (1) provision of transparent access to germplasm seeds, (2) the delivery of improved workflows for plant gene annotation, and (3) implementation of bioinformatics services that link genotypes and phenotypes. This review introduces the GCBN's spectrum of web-services and integrated data resources that address common research problems in the plant genomics community.
Collapse
Affiliation(s)
- Thomas Schmutzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Marie E Bolger
- Forschungszentrum Jülich (FZJ), Institute of Bio- and Geosciences (IBG-2) Plant Sciences, Wilhelm-Johnen-Straße, 52425 Jülich, Germany
| | - Stephen Rudd
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Jinbo Chen
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Heidrun Gundlach
- Helmholtz Zentrum München (HMGU), Plant Genome and Systems Biology (PGSB), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Daniel Arend
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Markus Oppermann
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Stephan Weise
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Matthias Lange
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany
| | - Manuel Spannagl
- Helmholtz Zentrum München (HMGU), Plant Genome and Systems Biology (PGSB), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
| | - Björn Usadel
- Forschungszentrum Jülich (FZJ), Institute of Bio- and Geosciences (IBG-2) Plant Sciences, Wilhelm-Johnen-Straße, 52425 Jülich, Germany
| | - Klaus F X Mayer
- Helmholtz Zentrum München (HMGU), Plant Genome and Systems Biology (PGSB), Ingolstädter Landstraße 1, 85764 Neuherberg, Germany; School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354 Freising, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstraße 3, 06466 Seeland, Germany.
| |
Collapse
|
4
|
Bauer E, Schmutzer T, Barilar I, Mascher M, Gundlach H, Martis MM, Twardziok SO, Hackauf B, Gordillo A, Wilde P, Schmidt M, Korzun V, Mayer KFX, Schmid K, Schön CC, Scholz U. Towards a whole-genome sequence for rye (Secale cereale L.). THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2017; 89:853-869. [PMID: 27888547 DOI: 10.1111/tpj.13436] [Citation(s) in RCA: 120] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2016] [Revised: 11/08/2016] [Accepted: 11/21/2016] [Indexed: 05/18/2023]
Abstract
We report on a whole-genome draft sequence of rye (Secale cereale L.). Rye is a diploid Triticeae species closely related to wheat and barley, and an important crop for food and feed in Central and Eastern Europe. Through whole-genome shotgun sequencing of the 7.9-Gbp genome of the winter rye inbred line Lo7 we obtained a de novo assembly represented by 1.29 million scaffolds covering a total length of 2.8 Gbp. Our reference sequence represents nearly the entire low-copy portion of the rye genome. This genome assembly was used to predict 27 784 rye gene models based on homology to sequenced grass genomes. Through resequencing of 10 rye inbred lines and one accession of the wild relative S. vavilovii, we discovered more than 90 million single nucleotide variants and short insertions/deletions in the rye genome. From these variants, we developed the high-density Rye600k genotyping array with 600 843 markers, which enabled anchoring the sequence contigs along a high-density genetic map and establishing a synteny-based virtual gene order. Genotyping data were used to characterize the diversity of rye breeding pools and genetic resources, and to obtain a genome-wide map of selection signals differentiating the divergent gene pools. This rye whole-genome sequence closes a gap in Triticeae genome research, and will be highly valuable for comparative genomics, functional studies and genome-based breeding in rye.
Collapse
Affiliation(s)
- Eva Bauer
- Technical University of Munich, Plant Breeding, Liesel-Beckmann-Str. 2, 85354, Freising, Germany
| | - Thomas Schmutzer
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466, Stadt Seeland, Germany
| | - Ivan Barilar
- Universität Hohenheim, Crop Biodiversity and Breeding Informatics, Fruwirthstr. 21, 70599, Stuttgart, Germany
| | - Martin Mascher
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466, Stadt Seeland, Germany
| | - Heidrun Gundlach
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Mihaela M Martis
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Sven O Twardziok
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Bernd Hackauf
- Julius Kühn-Institute, Institute for Breeding Research on Agricultural Crops, Rudolf-Schick-Platz 3a, 18190, Sanitz, Germany
| | - Andres Gordillo
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Peer Wilde
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Malthe Schmidt
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Viktor Korzun
- KWS LOCHOW GMBH, Ferdinand-von-Lochow-Str. 5, 29303, Bergen, Germany
| | - Klaus F X Mayer
- Helmholtz Zentrum München, Plant Genome and Systems Biology, Ingolstädter Landstraße 1, 85764, Neuherberg, Germany
| | - Karl Schmid
- Universität Hohenheim, Crop Biodiversity and Breeding Informatics, Fruwirthstr. 21, 70599, Stuttgart, Germany
| | - Chris-Carolin Schön
- Technical University of Munich, Plant Breeding, Liesel-Beckmann-Str. 2, 85354, Freising, Germany
| | - Uwe Scholz
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Corrensstr. 3, 06466, Stadt Seeland, Germany
| |
Collapse
|
5
|
Spannagl M, Alaux M, Lange M, Bolser DM, Bader KC, Letellier T, Kimmel E, Flores R, Pommier C, Kerhornou A, Walts B, Nussbaumer T, Grabmuller C, Chen J, Colmsee C, Beier S, Mascher M, Schmutzer T, Arend D, Thanki A, Ramirez-Gonzalez R, Ayling M, Ayling S, Caccamo M, Mayer KFX, Scholz U, Steinbach D, Quesneville H, Kersey PJ. transPLANT Resources for Triticeae Genomic Data. THE PLANT GENOME 2016; 9. [PMID: 27898761 DOI: 10.3835/plantgenome2015.06.0038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The genome sequences of many important Triticeae species, including bread wheat ( L.) and barley ( L.), remained uncharacterized for a long time because their high repeat content, large sizes, and polyploidy. As a result of improvements in sequencing technologies and novel analyses strategies, several of these have recently been deciphered. These efforts have generated new insights into Triticeae biology and genome organization and have important implications for downstream usage by breeders, experimental biologists, and comparative genomicists. transPLANT () is an EU-funded project aimed at constructing hardware, software, and data infrastructure for genome-scale research in the life sciences. Since the Triticeae data are intrinsically complex, heterogenous, and distributed, the transPLANT consortium has undertaken efforts to develop common data formats and tools that enable the exchange and integration of data from distributed resources. Here we present an overview of the individual Triticeae genome resources hosted by transPLANT partners, introduce the objectives of transPLANT, and outline common developments and interfaces supporting integrated data access.
Collapse
|