1
|
Wang S, Wei S, Deng Y, Wu S, Peng H, Qing Y, Zhai X, Zhou S, Li J, Li H, Feng Y, Yi Y, Li R, Zhang H, Wang Y, Zhang R, Ning L, Yao Y, Fei Z, Zheng Y. HortGenome Search Engine, a universal genomic search engine for horticultural crops. HORTICULTURE RESEARCH 2024; 11:uhae100. [PMID: 38863996 PMCID: PMC11165154 DOI: 10.1093/hr/uhae100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 03/27/2024] [Indexed: 06/13/2024]
Abstract
Horticultural crops comprising fruit, vegetable, ornamental, beverage, medicinal and aromatic plants play essential roles in food security and human health, as well as landscaping. With the advances of sequencing technologies, genomes for hundreds of horticultural crops have been deciphered in recent years, providing a basis for understanding gene functions and regulatory networks and for the improvement of horticultural crops. However, these valuable genomic data are scattered in warehouses with various complex searching and displaying strategies, which increases learning and usage costs and makes comparative and functional genomic analyses across different horticultural crops very challenging. To this end, we have developed a lightweight universal search engine, HortGenome Search Engine (HSE; http://hort.moilab.net), which allows for the querying of genes, functional annotations, protein domains, homologs, and other gene-related functional information of more than 500 horticultural crops. In addition, four commonly used tools, including 'BLAST', 'Batch Query', 'Enrichment analysis', and 'Synteny Viewer' have been developed for efficient mining and analysis of these genomic data.
Collapse
Affiliation(s)
- Sen Wang
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Shangxiao Wei
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yuling Deng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Shaoyuan Wu
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Haixu Peng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - You Qing
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Xuyang Zhai
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Shijie Zhou
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Jinrong Li
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Hua Li
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yijian Feng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yating Yi
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Rui Li
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Hui Zhang
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| | - Yiding Wang
- College of Intelligent Science and Engineering, Beijing University of Agriculture, Beijing 102206, China
| | - Renlong Zhang
- College of Intelligent Science and Engineering, Beijing University of Agriculture, Beijing 102206, China
| | - Lu Ning
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
- Library, Beijing University of Agriculture, Beijing 102206, China
| | - Yuncong Yao
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
| | - Zhangjun Fei
- Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
- USDA-ARS, Robert W. Holley Center for Agriculture and Health, Ithaca, NY 14853, USA
| | - Yi Zheng
- Beijing Key Laboratory for Agricultural Application and New Technique, College of Plant Science and Technology, Beijing University of Agriculture, Beijing 102206, China
- Bioinformatics Center, Beijing University of Agriculture, Beijing 102206, China
| |
Collapse
|
2
|
Williams NP, Rodrigues CHM, Truong J, Ascher DB, Holien JK. DockNet: high-throughput protein-protein interface contact prediction. Bioinformatics 2022; 39:6885444. [PMID: 36484688 PMCID: PMC9825772 DOI: 10.1093/bioinformatics/btac797] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/27/2022] [Accepted: 12/08/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Over 300 000 protein-protein interaction (PPI) pairs have been identified in the human proteome and targeting these is fast becoming the next frontier in drug design. Predicting PPI sites, however, is a challenging task that traditionally requires computationally expensive and time-consuming docking simulations. A major weakness of modern protein docking algorithms is the inability to account for protein flexibility, which ultimately leads to relatively poor results. RESULTS Here, we propose DockNet, an efficient Siamese graph-based neural network method which predicts contact residues between two interacting proteins. Unlike other methods that only utilize a protein's surface or treat the protein structure as a rigid body, DockNet incorporates the entire protein structure and places no limits on protein flexibility during an interaction. Predictions are modeled at the residue level, based on a diverse set of input node features including residue type, surface accessibility, residue depth, secondary structure, pharmacophore and torsional angles. DockNet is comparable to current state-of-the-art methods, achieving an area under the curve (AUC) value of up to 0.84 on an independent test set (DB5), can be applied to a variety of different protein structures and can be utilized in situations where accurate unbound protein structures cannot be obtained. AVAILABILITY AND IMPLEMENTATION DockNet is available at https://github.com/npwilliams09/docknet and an easy-to-use webserver at https://biosig.lab.uq.edu.au/docknet. All other data underlying this article are available in the article and in its online supplementary material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jia Truong
- STEM College, RMIT University, Melbourne, VIC, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
3
|
Rodrigues CHM, Ascher DB. CSM-Potential: mapping protein interactions and biological ligands in 3D space using geometric deep learning. Nucleic Acids Res 2022; 50:W204-W209. [PMID: 35609999 PMCID: PMC9252741 DOI: 10.1093/nar/gkac381] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/19/2022] [Accepted: 05/05/2022] [Indexed: 11/13/2022] Open
Abstract
Recent advances in protein structural modelling have enabled the accurate prediction of the holo 3D structures of almost any protein, however protein function is intrinsically linked to the interactions it makes. While a number of computational approaches have been proposed to explore potential biological interactions, they have been limited to specific interactions, and have not been readily accessible for non-experts or use in bioinformatics pipelines. Here we present CSM-Potential, a geometric deep learning approach to identify regions of a protein surface that are likely to mediate protein-protein and protein-ligand interactions in order to provide a link between 3D structure and biological function. Our method has shown robust performance, outperforming existing methods for both predictive tasks. By assessing the performance of CSM-Potential on independent blind tests, we show that our method was able to achieve ROC AUC values of up to 0.81 for the identification of potential protein-protein binding sites, and up to 0.96 accuracy on biological ligand classification. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia.,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
4
|
DeVoe E, Oliver GR, Zenka R, Blackburn PR, Cousin MA, Boczek NJ, Kocher JPA, Urrutia R, Klee EW, Zimmermann MT. P 2T 2: Protein Panoramic annoTation Tool for the interpretation of protein coding genetic variants. JAMIA Open 2021; 4:ooab065. [PMID: 34377961 PMCID: PMC8346652 DOI: 10.1093/jamiaopen/ooab065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/06/2021] [Accepted: 07/17/2021] [Indexed: 11/29/2022] Open
Abstract
MOTIVATION Genomic data are prevalent, leading to frequent encounters with uninterpreted variants or mutations with unknown mechanisms of effect. Researchers must manually aggregate data from multiple sources and across related proteins, mentally translating effects between the genome and proteome, to attempt to understand mechanisms. MATERIALS AND METHODS P2T2 presents diverse data and annotation types in a unified protein-centric view, facilitating the interpretation of coding variants and hypothesis generation. Information from primary sequence, domain, motif, and structural levels are presented and also organized into the first Paralog Annotation Analysis across the human proteome. RESULTS Our tool assists research efforts to interpret genomic variation by aggregating diverse, relevant, and proteome-wide information into a unified interactive web-based interface. Additionally, we provide a REST API enabling automated data queries, or repurposing data for other studies. CONCLUSION The unified protein-centric interface presented in P2T2 will help researchers interpret novel variants identified through next-generation sequencing. Code and server link available at github.com/GenomicInterpretation/p2t2.
Collapse
Affiliation(s)
- Elias DeVoe
- Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
| | - Gavin R Oliver
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Roman Zenka
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Patrick R Blackburn
- Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
- Center for Individualized Medicine, Mayo Clinic, Jacksonville, Florida, USA
| | - Margot A Cousin
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Nicole J Boczek
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Jean-Pierre A Kocher
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Raul Urrutia
- Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
- Department of Surgery, Medical College of Wisconsin, Milwaukee, Wisconsin, 53226, USA
| | - Eric W Klee
- Department of Health Science Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota, USA
- Center for Individualized Medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Michael T Zimmermann
- Clinical and Translational Sciences Institute, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
- Genomic Sciences and Precision Medicine Center, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
- Department of Biochemistry, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
| |
Collapse
|
5
|
NemChR-DB: a database of parasitic nematode chemosensory G-protein coupled receptors. Int J Parasitol 2020; 51:333-337. [PMID: 33275943 DOI: 10.1016/j.ijpara.2020.09.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 09/23/2020] [Accepted: 09/29/2020] [Indexed: 12/12/2022]
Abstract
Nematode Chemosensory G-Protein Coupled Receptors have expanded within nematodes, where they play important roles in foraging and host-seeking behaviour. Nematode Chemosensory G-Protein Coupled Receptors are most highly expressed during free-living stages when chemosensory signalling is required for host detection and nematode activation in various parasitic nematodes, and therefore position Nematode Chemosensory G-Protein Coupled Receptors at the transition from infective to parasitic stages, making them important regulators to study in terms of host-seeking and host specificity. To facilitate the analysis of Nematode Chemosensory G-Protein Coupled Receptors, here we describe an integrative database of nematode chemoreceptors called NemChR-DB. This database enables users to study diverse parasitic nematode chemoreceptors, functionally explore sequence entries through structural and literature-based annotations, and perform cross-species comparisons.
Collapse
|
6
|
O’Halloran DM. phylo-node: A molecular phylogenetic toolkit using Node.js. PLoS One 2017; 12:e0175480. [PMID: 28410421 PMCID: PMC5391935 DOI: 10.1371/journal.pone.0175480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 03/27/2017] [Indexed: 12/05/2022] Open
Abstract
Background Node.js is an open-source and cross-platform environment that provides a JavaScript codebase for back-end server-side applications. JavaScript has been used to develop very fast and user-friendly front-end tools for bioinformatic and phylogenetic analyses. However, no such toolkits are available using Node.js to conduct comprehensive molecular phylogenetic analysis. Results To address this problem, I have developed, phylo-node, which was developed using Node.js and provides a stable and scalable toolkit that allows the user to perform diverse molecular and phylogenetic tasks. phylo-node can execute the analysis and process the resulting outputs from a suite of software options that provides tools for read processing and genome alignment, sequence retrieval, multiple sequence alignment, primer design, evolutionary modeling, and phylogeny reconstruction. Furthermore, phylo-node enables the user to deploy server dependent applications, and also provides simple integration and interoperation with other Node modules and languages using Node inheritance patterns, and a customized piping module to support the production of diverse pipelines. Conclusions phylo-node is open-source and freely available to all users without sign-up or login requirements. All source code and user guidelines are openly available at the GitHub repository: https://github.com/dohalloran/phylo-node.
Collapse
Affiliation(s)
- Damien M. O’Halloran
- Department of Biological Sciences, The George Washington University, Washington, DC, United States of America
- Institute for Neuroscience, The George Washington University, Washington, DC, United States of America
- * E-mail:
| |
Collapse
|
7
|
Pignatelli M. TnT: a set of libraries for visualizing trees and track-based annotations for the web. Bioinformatics 2016; 32:2524-5. [PMID: 27153646 PMCID: PMC4978938 DOI: 10.1093/bioinformatics/btw210] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2016] [Accepted: 04/12/2016] [Indexed: 11/15/2022] Open
Abstract
UNLABELLED There is an increasing need for rich and dynamic biological data visualizations in bioinformatic web applications. New standards in web technologies, like SVG or Canvas, are now supported by most modern web browsers allowing the blossoming of powerful visualizations in biological data analysis. The exploration of different ways to visualize genomic data is still challenging due to the lack of flexible tools to develop them. Here, I present a set of libraries aimed at creating powerful tree- and track-based visualizations for the web. Its modularity and rich API facilitate the development of many different visualizations ranging from simple species trees to complex visualizations comprising per-node data annotations or even simple genome browsers. AVAILABILITY AND IMPLEMENTATION The TnT libraries have been written in Javascript, licensed under the APACHE 2.0 license and hosted at https://github.com/tntvis CONTACT mp@ebi.ac.uk.
Collapse
Affiliation(s)
- Miguel Pignatelli
- Centre for Therapeutic Target Validation and European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
8
|
Sedova M, Jaroszewski L, Godzik A. Protael: protein data visualization library for the web. Bioinformatics 2015; 32:602-4. [PMID: 26515826 DOI: 10.1093/bioinformatics/btv605] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 10/15/2015] [Indexed: 11/13/2022] Open
Abstract
UNLABELLED Protael is a JavaScript library for creating interactive visualizations of biological sequences and various associated data. It allows users to generate high-quality vector graphics (SVG) and integrate it into web pages. AVAILABILITY AND IMPLEMENTATION Protael distribution, documentation and examples are freely available at http://protael.org; source code is hosted at https://github.com/sanshu/protaeljs.
Collapse
Affiliation(s)
- Mayya Sedova
- Bioinformatics and Systems Biology Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92307, USA and Center for Structural Genomics of Infectious Diseases (CSGID), Chicago, IL 60611, USA
| | - Lukasz Jaroszewski
- Bioinformatics and Systems Biology Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92307, USA and Center for Structural Genomics of Infectious Diseases (CSGID), Chicago, IL 60611, USA
| | - Adam Godzik
- Bioinformatics and Systems Biology Program, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92307, USA and Center for Structural Genomics of Infectious Diseases (CSGID), Chicago, IL 60611, USA
| |
Collapse
|
9
|
Jaschob D, Davis TN, Riffle M. Mason: a JavaScript web site widget for visualizing and comparing annotated features in nucleotide or protein sequences. BMC Res Notes 2015; 8:70. [PMID: 25884379 PMCID: PMC4354989 DOI: 10.1186/s13104-015-1009-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 02/10/2015] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Sequence feature annotations (e.g., protein domain boundaries, binding sites, and secondary structure predictions) are an essential part of biological research. Annotations are widely used by scientists during research and experimental design, and are frequently the result of biological studies. A generalized and simple means of disseminating and visualizing these data via the web would be of value to the research community. FINDINGS Mason is a web site widget designed to visualize and compare annotated features of one or more nucleotide or protein sequence. Annotated features may be of virtually any type, ranging from annotating transcription binding sites or exons and introns in DNA to secondary structure or domain boundaries in proteins. Mason is simple to use and easy to integrate into web sites. Mason has a highly dynamic and configurable interface supporting multiple sets of annotations per sequence, overlapping regions, customization of interface and user-driven events (e.g., clicks and text to appear for tooltips). It is written purely in JavaScript and SVG, requiring no 3(rd) party plugins or browser customization. CONCLUSIONS Mason is a solution for dissemination of sequence annotation data on the web. It is highly flexible, customizable, simple to use, and is designed to be easily integrated into web sites. Mason is open source and freely available at https://github.com/yeastrc/mason.
Collapse
Affiliation(s)
- Daniel Jaschob
- Department of Biochemistry, University of Washington, UW Box 357350, 1705 NE Pacific St, Seattle, WA, 98195-7350, USA.
| | - Trisha N Davis
- Department of Biochemistry, University of Washington, UW Box 357350, 1705 NE Pacific St, Seattle, WA, 98195-7350, USA.
| | - Michael Riffle
- Department of Biochemistry, University of Washington, UW Box 357350, 1705 NE Pacific St, Seattle, WA, 98195-7350, USA. .,Department of Genome Sciences, University of Washington, UW Box 357350, 1705 NE Pacific St, Seattle, WA, 98195-7350, USA.
| |
Collapse
|
10
|
Abstract
Data-driven research has gained momentum in the life sciences. Visualisation of these data is essential for quick generation of hypotheses and their translation into useful knowledge. BioJS is a new proposed standard for JavaScript-based components to visualise biological data. BioJS is an open source community project that to date provides 39 different components contributed by a global community. Here, we present the BioJS F1000Research collection series. A total of 12 components and a project status article are published in bulk. This collection does not intend to be an all-encompassing, comprehensive source of BioJS articles, but an initial set; future submissions from BioJS contributors are welcome.
Collapse
Affiliation(s)
- Manuel Corpas
- The Genome Analysis Centre, Norwich Research Park, Norwich, NR4 7UH, UK
| |
Collapse
|