1
|
De Coninck T, Gippert GP, Henrissat B, Desmet T, Van Damme EJM. Investigating diversity and similarity between CBM13 modules and ricin-B lectin domains using sequence similarity networks. BMC Genomics 2024; 25:643. [PMID: 38937673 PMCID: PMC11212257 DOI: 10.1186/s12864-024-10554-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/24/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND The CBM13 family comprises carbohydrate-binding modules that occur mainly in enzymes and in several ricin-B lectins. The ricin-B lectin domain resembles the CBM13 module to a large extent. Historically, ricin-B lectins and CBM13 proteins were considered completely distinct, despite their structural and functional similarities. RESULTS In this data mining study, we investigate structural and functional similarities of these intertwined protein groups. Because of the high structural and functional similarities, and differences in nomenclature usage in several databases, confusion can arise. First, we demonstrate how public protein databases use different nomenclature systems to describe CBM13 modules and putative ricin-B lectin domains. We suggest the introduction of a novel CBM13 domain identifier, as well as the extension of CAZy cross-references in UniProt to guard the distinction between CAZy and non-CAZy entries in public databases. Since similar problems may occur with other lectin families and CBM families, we suggest the introduction of novel CBM InterPro domain identifiers to all existing CBM families. Second, we investigated phylogenetic, nomenclatural and structural similarities between putative ricin-B lectin domains and CBM13 modules, making use of sequence similarity networks. We concluded that the ricin-B/CBM13 superfamily may be larger than initially thought and that several putative ricin-B lectin domains may display CAZyme functionalities, although biochemical proof remains to be delivered. CONCLUSIONS Ricin-B lectin domains and CBM13 modules are associated groups of proteins whose database semantics are currently biased towards ricin-B lectins. Revision of the CAZy cross-reference in UniProt and introduction of a dedicated CBM13 domain identifier in InterPro may resolve this issue. In addition, our analyses show that several proteins with putative ricin-B lectin domains show very strong structural similarity to CBM13 modules. Therefore ricin-B lectin domains and CBM13 modules could be considered distant members of a larger ricin-B/CBM13 superfamily.
Collapse
Affiliation(s)
- Tibo De Coninck
- Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium
- Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Garry P Gippert
- Section for Protein Chemistry and Enzyme Technology, Department of Biotechnology & Biomedicine, Technical University of Denmark, Søltofts Plads 224, Kgs. Lyngby, 2800, Denmark
| | - Bernard Henrissat
- Section for Protein Chemistry and Enzyme Technology, Department of Biotechnology & Biomedicine, Technical University of Denmark, Søltofts Plads 224, Kgs. Lyngby, 2800, Denmark
| | - Tom Desmet
- Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Els J M Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium.
| |
Collapse
|
2
|
Stanford BC, Clake DJ, Morris MR, Rogers SM. The power and limitations of gene expression pathway analyses toward predicting population response to environmental stressors. Evol Appl 2020; 13:1166-1182. [PMID: 32684953 PMCID: PMC7359838 DOI: 10.1111/eva.12935] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 02/03/2020] [Accepted: 02/05/2020] [Indexed: 12/16/2022] Open
Abstract
Rapid environmental changes impact the global distribution and abundance of species, highlighting the urgency to understand and predict how populations will respond. The analysis of differentially expressed genes has elucidated areas of the genome involved in adaptive divergence to past and present environmental change. Such studies however have been hampered by large numbers of differentially expressed genes and limited knowledge of how these genes work in conjunction with each other. Recent methods (broadly termed "pathway analyses") have emerged that aim to group genes that behave in a coordinated fashion to a factor of interest. These methods aid in functional annotation and uncovering biological pathways, thereby collapsing complex datasets into more manageable units, providing more nuanced understandings of both the organism-level effects of modified gene expression, and the targets of adaptive divergence. Here, we reanalyze a dataset that investigated temperature-induced changes in gene expression in marine-adapted and freshwater-adapted threespine stickleback (Gasterosteus aculeatus), using Weighted Gene Co-expression Network Analysis (WGCNA) with PANTHER Gene Ontology (GO)-Slim overrepresentation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. Six modules exhibited a conserved response and six a divergent response between marine and freshwater stickleback when acclimated to 7°C or 22°C. One divergent module showed freshwater-specific response to temperature, and the remaining divergent modules showed differences in height of reaction norms. PPARAa, a transcription factor that regulates fatty acid metabolism and has been implicated in adaptive divergence, was located in a module that had higher expression at 7°C and in freshwater stickleback. This updated methodology revealed patterns that were not found in the original publication. Although such methods hold promise toward predicting population response to environmental stressors, many limitations remain, particularly with regard to module expression representation, database resources, and cross-database integration.
Collapse
Affiliation(s)
| | - Danielle J. Clake
- Department of Biological SciencesUniversity of CalgaryCalgaryABCanada
| | | | - Sean M. Rogers
- Department of Biological SciencesUniversity of CalgaryCalgaryABCanada
- Bamfield Marine Sciences CentreBamfieldBCCanada
| |
Collapse
|
3
|
Weirick T, Militello G, Ponomareva Y, John D, Döring C, Dimmeler S, Uchida S. Logic programming to infer complex RNA expression patterns from RNA-seq data. Brief Bioinform 2019; 19:199-209. [PMID: 28011754 DOI: 10.1093/bib/bbw117] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Indexed: 12/15/2022] Open
Abstract
To meet the increasing demand in the field, numerous long noncoding RNA (lncRNA) databases are available. Given many lncRNAs are specifically expressed in certain cell types and/or time-dependent manners, most lncRNA databases fall short of providing such profiles. We developed a strategy using logic programming to handle the complex organization of organs, their tissues and cell types as well as gender and developmental time points. To showcase this strategy, we introduce 'RenalDB' (http://renaldb.uni-frankfurt.de), a database providing expression profiles of RNAs in major organs focusing on kidney tissues and cells. RenalDB uses logic programming to describe complex anatomy, sample metadata and logical relationships defining expression, enrichment or specificity. We validated the content of RenalDB with biological experiments and functionally characterized two long intergenic noncoding RNAs: LOC440173 is important for cell growth or cell survival, whereas PAXIP1-AS1 is a regulator of cell death. We anticipate RenalDB will be used as a first step toward functional studies of lncRNAs in the kidney.
Collapse
Affiliation(s)
- Tyler Weirick
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany.,Cardiovascular Innovation Institute, University of Louisville, 302 E Muhammad Ali Blvd, Louisville, KY, U.S.A
| | - Giuseppe Militello
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany.,Cardiovascular Innovation Institute, University of Louisville, 302 E Muhammad Ali Blvd, Louisville, KY, U.S.A
| | - Yuliya Ponomareva
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany
| | - David John
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany
| | - Claudia Döring
- Dr. Senckenberg Institute of Pathology, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany
| | - Stefanie Dimmeler
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany
| | - Shizuka Uchida
- Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt, Theodor-Stern-Kai 7, Frankfurt am Main, Germany.,German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany.,Cardiovascular Innovation Institute, University of Louisville, 302 E Muhammad Ali Blvd, Louisville, KY, U.S.A
| |
Collapse
|
4
|
Abstract
SIGNIFICANCE The concepts of junk DNA and transcriptional noise are long gone as the existence of noncoding RNAs (ncRNAs) has been tested extensively in recent years. Given that the epigenetic status of cells affects many biological processes, how ncRNAs mechanistically contribute to these processes is of great interest. Recent Advances: Recent studies show that various ncRNAs interact with epigenetic and/or transcription factors to modulate the epigenetic status of cells directly and/or indirectly. There exists growing interest in the field of cardiovascular research to understand the roles of ncRNAs. Due to the large number of ncRNAs in the mammalian genome, only a handful of ncRNAs have been functionally elucidated, which makes it difficult to understand how ncRNAs interact with protein-coding genes and their encoded proteins. CRITICAL ISSUES Although the canonical function of microRNAs (miRNAs) to inhibit the translation of protein-coding genes is well established, the number of functionally annotated long noncoding RNAs (lncRNAs) is still small, which is especially true in the heart. FUTURE DIRECTIONS Future studies must connect the epigenetic controls of various cellular phenomena by incorporating both miRNAs and lncRNAs. Antioxid. Redox Signal. 29, 832-845.
Collapse
Affiliation(s)
- Shizuka Uchida
- 1 Cardiovascular Innovation Institute, University of Louisville , Louisville, Kentucky
| | - Roberto Bolli
- 1 Cardiovascular Innovation Institute, University of Louisville , Louisville, Kentucky.,2 Institute of Molecular Cardiology, University of Louisville , Louisville, Kentucky
| |
Collapse
|
5
|
Weirick T, Militello G, Uchida S. Long Non-coding RNAs in Endothelial Biology. Front Physiol 2018; 9:522. [PMID: 29867565 PMCID: PMC5960726 DOI: 10.3389/fphys.2018.00522] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 04/24/2018] [Indexed: 01/08/2023] Open
Abstract
In recent years, the role of RNA has expanded to the extent that protein-coding RNAs are now the minority with a variety of non-coding RNAs (ncRNAs) now comprising the majority of RNAs in higher organisms. A major contributor to this shift in understanding is RNA sequencing (RNA-seq), which allows a largely unconstrained method for monitoring the status of RNA from whole organisms down to a single cell. This observational power presents both challenges and new opportunities, which require specialized bioinformatics tools to extract knowledge from the data and the ability to reuse data for multiple studies. In this review, we summarize the current status of long non-coding RNA (lncRNA) research in endothelial biology. Then, we will cover computational methods for identifying, annotating, and characterizing lncRNAs in the heart, especially endothelial cells.
Collapse
Affiliation(s)
- Tyler Weirick
- Cardiovascular Innovation Institute, University of Louisville, Louisville, KY, United States
| | - Giuseppe Militello
- Cardiovascular Innovation Institute, University of Louisville, Louisville, KY, United States
| | - Shizuka Uchida
- Cardiovascular Innovation Institute, University of Louisville, Louisville, KY, United States
| |
Collapse
|
6
|
ANGIOGENES: knowledge database for protein-coding and noncoding RNA genes in endothelial cells. Sci Rep 2016; 6:32475. [PMID: 27582018 PMCID: PMC5007478 DOI: 10.1038/srep32475] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Accepted: 08/08/2016] [Indexed: 12/29/2022] Open
Abstract
Increasing evidence indicates the presence of long noncoding RNAs (lncRNAs) is specific to various cell types. Although lncRNAs are speculated to be more numerous than protein-coding genes, the annotations of lncRNAs remain primitive due to the lack of well-structured schemes for their identification and description. Here, we introduce a new knowledge database “ANGIOGENES” (http://angiogenes.uni-frankfurt.de) to allow for in silico screening of protein-coding genes and lncRNAs expressed in various types of endothelial cells, which are present in all tissues. Using the latest annotations of protein-coding genes and lncRNAs, publicly-available RNA-seq data was analyzed to identify transcripts that are expressed in endothelial cells of human, mouse and zebrafish. The analyzed data were incorporated into ANGIOGENES to provide a one-stop-shop for transcriptomics data to facilitate further biological validation. ANGIOGENES is an intuitive and easy-to-use database to allow in silico screening of expressed, enriched and/or specific endothelial transcripts under various conditions. We anticipate that ANGIOGENES serves as a starting point for functional studies to elucidate the roles of protein-coding genes and lncRNAs in angiogenesis.
Collapse
|