1
|
De Coninck T, Gippert GP, Henrissat B, Desmet T, Van Damme EJM. Investigating diversity and similarity between CBM13 modules and ricin-B lectin domains using sequence similarity networks. BMC Genomics 2024; 25:643. [PMID: 38937673 PMCID: PMC11212257 DOI: 10.1186/s12864-024-10554-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 06/24/2024] [Indexed: 06/29/2024] Open
Abstract
BACKGROUND The CBM13 family comprises carbohydrate-binding modules that occur mainly in enzymes and in several ricin-B lectins. The ricin-B lectin domain resembles the CBM13 module to a large extent. Historically, ricin-B lectins and CBM13 proteins were considered completely distinct, despite their structural and functional similarities. RESULTS In this data mining study, we investigate structural and functional similarities of these intertwined protein groups. Because of the high structural and functional similarities, and differences in nomenclature usage in several databases, confusion can arise. First, we demonstrate how public protein databases use different nomenclature systems to describe CBM13 modules and putative ricin-B lectin domains. We suggest the introduction of a novel CBM13 domain identifier, as well as the extension of CAZy cross-references in UniProt to guard the distinction between CAZy and non-CAZy entries in public databases. Since similar problems may occur with other lectin families and CBM families, we suggest the introduction of novel CBM InterPro domain identifiers to all existing CBM families. Second, we investigated phylogenetic, nomenclatural and structural similarities between putative ricin-B lectin domains and CBM13 modules, making use of sequence similarity networks. We concluded that the ricin-B/CBM13 superfamily may be larger than initially thought and that several putative ricin-B lectin domains may display CAZyme functionalities, although biochemical proof remains to be delivered. CONCLUSIONS Ricin-B lectin domains and CBM13 modules are associated groups of proteins whose database semantics are currently biased towards ricin-B lectins. Revision of the CAZy cross-reference in UniProt and introduction of a dedicated CBM13 domain identifier in InterPro may resolve this issue. In addition, our analyses show that several proteins with putative ricin-B lectin domains show very strong structural similarity to CBM13 modules. Therefore ricin-B lectin domains and CBM13 modules could be considered distant members of a larger ricin-B/CBM13 superfamily.
Collapse
Affiliation(s)
- Tibo De Coninck
- Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium
- Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Garry P Gippert
- Section for Protein Chemistry and Enzyme Technology, Department of Biotechnology & Biomedicine, Technical University of Denmark, Søltofts Plads 224, Kgs. Lyngby, 2800, Denmark
| | - Bernard Henrissat
- Section for Protein Chemistry and Enzyme Technology, Department of Biotechnology & Biomedicine, Technical University of Denmark, Søltofts Plads 224, Kgs. Lyngby, 2800, Denmark
| | - Tom Desmet
- Centre for Synthetic Biology, Department of Biotechnology, Ghent University, Coupure Links 653, Ghent, 9000, Belgium
| | - Els J M Van Damme
- Laboratory of Biochemistry and Glycobiology, Department of Biotechnology, Ghent University, Proeftuinstraat 86, Ghent, 9000, Belgium.
| |
Collapse
|
2
|
Smith DR. Revisiting published genomes with fresh eyes and new data: Revising old sequencing data can yield unexpected insights and identify errors. EMBO Rep 2019; 20:e49482. [PMID: 31680386 DOI: 10.15252/embr.201949482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Old data are like yesterday's leftovers: sapped of novelty and excitement. But revisiting old sequence data with a fresh mind and new techniques can yield new and unexpected results.
Collapse
Affiliation(s)
- David R Smith
- Department of Biology, University of Western Ontario, London, ON, Canada
| |
Collapse
|
3
|
The Nothoaspis amazoniensis Complete Mitogenome: A Comparative and Phylogenetic Analysis. Vet Sci 2018; 5:vetsci5020037. [PMID: 29584648 PMCID: PMC6024882 DOI: 10.3390/vetsci5020037] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Revised: 03/22/2018] [Accepted: 03/23/2018] [Indexed: 12/03/2022] Open
Abstract
The molecular biology era, together with morphology, molecular phylogenetics, bioinformatics, and high-throughput sequencing technologies, improved the taxonomic identification of Argasidae family members, especially when considering specimens at different development stages, which remains a great difficulty for acarologists. These tools could provide important data and insights on the history and evolutionary relationships of argasids. To better understand these relationships, we sequenced and assembled the first complete mitochondrial genome of Nothoaspis amazoniensis. We used phylogenomics to identify the evolutionary history of this species of tick, comparing the data obtained with 26 complete mitochondrial sequences available in biological databases. The results demonstrated the absence of genetic rearrangements, high similarity and identity, and a close organizational link between the mitogenomes of N. amazoniensis and other argasids analyzed. In addition, the mitogenome had a monophyletic cladistic taxonomic arrangement, encompassed by representatives of the Afrotropical and Neotropical regions, with specific parasitism in bats, which may be indicative of an evolutionary process of cospeciation between vectors and the host.
Collapse
|
4
|
Acuña-Amador L, Primot A, Cadieu E, Roulet A, Barloy-Hubler F. Genomic repeats, misassembly and reannotation: a case study with long-read resequencing of Porphyromonas gingivalis reference strains. BMC Genomics 2018; 19:54. [PMID: 29338683 PMCID: PMC5771137 DOI: 10.1186/s12864-017-4429-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 12/29/2017] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Without knowledge of their genomic sequences, it is impossible to make functional models of the bacteria that make up human and animal microbiota. Unfortunately, the vast majority of publicly available genomes are only working drafts, an incompleteness that causes numerous problems and constitutes a major obstacle to genotypic and phenotypic interpretation. In this work, we began with an example from the class Bacteroidia in the phylum Bacteroidetes, which is preponderant among human orodigestive microbiota. We successfully identify the genetic loci responsible for assembly breaks and misassemblies and demonstrate the importance and usefulness of long-read sequencing and curated reannotation. RESULTS We showed that the fragmentation in Bacteroidia draft genomes assembled from massively parallel sequencing linearly correlates with genomic repeats of the same or greater size than the reads. We also demonstrated that some of these repeats, especially the long ones, correspond to misassembled loci in three reference Porphyromonas gingivalis genomes marked as circularized (thus complete or finished). We prove that even at modest coverage (30X), long-read resequencing together with PCR contiguity verification (rrn operons and an integrative and conjugative element or ICE) can be used to identify and correct the wrongly combined or assembled regions. Finally, although time-consuming and labor-intensive, consistent manual biocuration of three P. gingivalis strains allowed us to compare and correct the existing genomic annotations, resulting in a more accurate interpretation of the genomic differences among these strains. CONCLUSIONS In this study, we demonstrate the usefulness and importance of long-read sequencing in verifying published genomes (even when complete) and generating assemblies for new bacterial strains/species with high genomic plasticity. We also show that when combined with biological validation processes and diligent biocurated annotation, this strategy helps reduce the propagation of errors in shared databases, thus limiting false conclusions based on incomplete or misleading information.
Collapse
Affiliation(s)
- Luis Acuña-Amador
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.,Laboratorio de Investigación en Bacteriología Anaerobia, Centro de Investigación en Enfermedades Tropicales, Facultad de Microbiología, Universidad de Costa Rica, San José, Costa Rica
| | - Aline Primot
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Edouard Cadieu
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France
| | - Alain Roulet
- GenoToul Genome & Transcriptome (GeT-PlaGe), INRA, US1426, Castanet-Tolosan, France
| | - Frédérique Barloy-Hubler
- Institut de Génétique et Développement de Rennes, CNRS, UMR6290, Université de Rennes 1, Rennes, France.
| |
Collapse
|
5
|
Abstract
A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.
Collapse
Affiliation(s)
- Kyle Fletcher
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, Davis, CA, USA
| | - Richard Michelmore
- The Genome Center, Genome and Biomedical Sciences Facility, University of California, Davis, CA, USA.
| |
Collapse
|
6
|
Schultz JH, Adema CM. Comparative immunogenomics of molluscs. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2017; 75:3-15. [PMID: 28322934 PMCID: PMC5494275 DOI: 10.1016/j.dci.2017.03.013] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Revised: 03/10/2017] [Accepted: 03/15/2017] [Indexed: 05/22/2023]
Abstract
Comparative immunology, studying both vertebrates and invertebrates, provided the earliest descriptions of phagocytosis as a general immune mechanism. However, the large scale of animal diversity challenges all-inclusive investigations and the field of immunology has developed by mostly emphasizing study of a few vertebrate species. In addressing the lack of comprehensive understanding of animal immunity, especially that of invertebrates, comparative immunology helps toward management of invertebrates that are food sources, agricultural pests, pathogens, or transmit diseases, and helps interpret the evolution of animal immunity. Initial studies showed that the Mollusca (second largest animal phylum), and invertebrates in general, possess innate defenses but lack the lymphocytic immune system that characterizes vertebrate immunology. Recognizing the reality of both common and taxon-specific immune features, and applying up-to-date cell and molecular research capabilities, in-depth studies of a select number of bivalve and gastropod species continue to reveal novel aspects of molluscan immunity. The genomics era heralded a new stage of comparative immunology; large-scale efforts yielded an initial set of full molluscan genome sequences that is available for analyses of full complements of immune genes and regulatory sequences. Next-generation sequencing (NGS), due to lower cost and effort required, allows individual researchers to generate large sequence datasets for growing numbers of molluscs. RNAseq provides expression profiles that enable discovery of immune genes and genome sequences reveal distribution and diversity of immune factors across molluscan phylogeny. Although computational de novo sequence assembly will benefit from continued development and automated annotation may require some experimental validation, NGS is a powerful tool for comparative immunology, especially increasing coverage of the extensive molluscan diversity. To date, immunogenomics revealed new levels of complexity of molluscan defense by indicating sequence heterogeneity in individual snails and bivalves, and members of expanded immune gene families are expressed differentially to generate pathogen-specific defense responses.
Collapse
Affiliation(s)
- Jonathan H Schultz
- Center for Evolutionary and Theoretical Immunology, Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| | - Coen M Adema
- Center for Evolutionary and Theoretical Immunology, Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA.
| |
Collapse
|
7
|
R. DENVER DEE, J. RAGSDALE ERIK, THOMAS WKELLEY, A. ZASADA INGA. Introduction to Nematode Genome and Transcriptome Announcements in the Journal of Nematology. J Nematol 2017. [DOI: 10.21307/jofnem-2017-053] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
8
|
Sanitá Lima M, Woods LC, Cartwright MW, Smith DR. The (in)complete organelle genome: exploring the use and nonuse of available technologies for characterizing mitochondrial and plastid chromosomes. Mol Ecol Resour 2016; 16:1279-1286. [PMID: 27482846 DOI: 10.1111/1755-0998.12585] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Revised: 06/23/2016] [Accepted: 06/23/2016] [Indexed: 02/04/2023]
Abstract
Not long ago, scientists paid dearly in time, money and skill for every nucleotide that they sequenced. Today, DNA sequencing technologies epitomize the slogan 'faster, easier, cheaper and more', and in many ways, sequencing an entire genome has become routine, even for the smallest laboratory groups. This is especially true for mitochondrial and plastid genomes. Given their relatively small sizes and high copy numbers per cell, organelle DNAs are currently among the most highly sequenced kind of chromosome. But accurately characterizing an organelle genome and the information it encodes can require much more than DNA sequencing and bioinformatics analyses. Organelle genomes can be surprisingly complex and can exhibit convoluted and unconventional modes of gene expression. Unravelling this complexity can demand a wide assortment of experiments, from pulsed-field gel electrophoresis to Southern and Northern blots to RNA analyses. Here, we show that it is exactly these types of 'complementary' analyses that are often lacking from contemporary organelle genome papers, particularly short 'genome announcement' articles. Consequently, crucial and interesting features of organelle chromosomes are going undescribed, which could ultimately lead to a poor understanding and even a misrepresentation of these genomes and the genes they express. High-throughput sequencing and bioinformatics have made it easy to sequence and assemble entire chromosomes, but they should not be used as a substitute for or at the expense of other types of genomic characterization methods.
Collapse
Affiliation(s)
- Matheus Sanitá Lima
- Department of Biology, University of Western Ontario, London, Ontario, Canada, N6A 5B7
| | - Laura C Woods
- Department of Biology, University of Western Ontario, London, Ontario, Canada, N6A 5B7
| | - Matthew W Cartwright
- Department of Biology, University of Western Ontario, London, Ontario, Canada, N6A 5B7
| | - David Roy Smith
- Department of Biology, University of Western Ontario, London, Ontario, Canada, N6A 5B7.
| |
Collapse
|