1
|
Silver LW, Edwards RJ, Neaves L, Manning AD, Hogg CJ, Banks S. A reference genome for the eastern bettong ( Bettongia gaimardi). F1000Res 2025; 13:1544. [PMID: 39816984 PMCID: PMC11733418 DOI: 10.12688/f1000research.157851.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/23/2025] [Indexed: 01/18/2025] Open
Abstract
The eastern or Tasmanian bettong ( Bettongia gaimardi) is one of four extant bettong species and is listed as 'Near Threatened' by the IUCN. We sequenced short read data on the 10x system to generate a reference genome 3.46Gb in size and contig N50 of 87.36Kb and scaffold N50 of 2.93Mb. Additionally, we used GeMoMa to provide and accompanying annotation for the reference genome. The generation of a reference genome for the eastern bettong provides a vital resource for the conservation of the species.
Collapse
Affiliation(s)
- Luke W Silver
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, Camperdown, NSW, 2006, Australia
- The University of Sydney School of Life and Environmental Sciences, Camperdown, New South Wales, 2006, Australia
| | - Richard J Edwards
- Minderoo OceanOmics Centre at UWA, The University of Western Australia Oceans Institute, Crawley, Western Australia, 6009, Australia
- Evolution and Ecology Research Centre, University of New South Wales School of Biotechnology and Biomolecular Sciences, Kensington, New South Wales, 2033, Australia
| | - Linda Neaves
- Australian National University Fenner School of Environment and Society, Acton, Australian Capital Territory, 2601, Australia
| | - Adrian D Manning
- Australian National University Fenner School of Environment and Society, Acton, Australian Capital Territory, 2601, Australia
| | - Carolyn J Hogg
- Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, Camperdown, NSW, 2006, Australia
- The University of Sydney School of Life and Environmental Sciences, Camperdown, New South Wales, 2006, Australia
| | - Sam Banks
- Charles Darwin University Research Institute for the Environment and Livelihoods, Casuarina, Northern Territory, 0909, Australia
| |
Collapse
|
2
|
Yu Z, Somasundaram S, Yan M. Rumen protozoa and viruses: New insights into their diversity and potential roles through omics lenses-A review. J Dairy Sci 2025:S0022-0302(25)00010-4. [PMID: 39824489 DOI: 10.3168/jds.2024-25780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Accepted: 12/13/2024] [Indexed: 01/20/2025]
Abstract
The rumen microbiome is essential for breaking down indigestible plant material, supplying ruminants with most of their metabolizable energy and protein. While research has primarily focused on bacteria and archaea, protozoa and viruses (phages) have only gained attention in recent years. Protozoa contribute to feed digestion and fermentation, but as predators, they regulate microbial populations by lysing large quantities of microbial cells (the primary protein source for ruminants) and influence the amount of microbial protein reaching the small intestines, along with other mechanisms of interactions. While rumen viruses (or phages) are abundant and diverse, they remain the least understood component of the rumen ecosystem. They can profoundly affect the rumen microbiome by directly lysing their hosts and reprogramming host metabolism through multiple mechanisms, including gene transfer and alteration of central carbon metabolism. Recent advances in omics technologies have deepened our understanding of these viruses, revealing their complex roles in rumen function. This review integrates current knowledge and recent discoveries from omics studies, highlighting the transformative impact of omics-based approaches. It also identifies critical knowledge gaps and outlines future research directions, including selective inhibition of rumen protozoa, development of phages as potential intervention tools to manage specific undesirable rumen microbes, and the causal impacts of rumen viruses on microbial dynamics and animal productivity.
Collapse
Affiliation(s)
- Zhongtang Yu
- Department of Animal Sciences, Center of Microbiome Science, The Ohio State University, Columbus, OH 43210.
| | - Sripoorna Somasundaram
- Department of Animal Sciences, Center of Microbiome Science, The Ohio State University, Columbus, OH 43210
| | - Ming Yan
- Department of Animal Sciences, Center of Microbiome Science, The Ohio State University, Columbus, OH 43210
| |
Collapse
|
3
|
Xian W, Bezrukov I, Bao Z, Vorbrugg S, Gautam A, Weigel D. TIPPo: A User-Friendly Tool for De Novo Assembly of Organellar Genomes with High-Fidelity Data. Mol Biol Evol 2025; 42:msae247. [PMID: 39800935 PMCID: PMC11725521 DOI: 10.1093/molbev/msae247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 11/15/2024] [Accepted: 11/18/2024] [Indexed: 01/16/2025] Open
Abstract
Plant cells have two major organelles with their own genomes: chloroplasts and mitochondria. While chloroplast genomes tend to be structurally conserved, the mitochondrial genomes of plants, which are much larger than those of animals, are characterized by complex structural variation. We introduce TIPPo, a user-friendly, reference-free assembly tool that uses PacBio high-fidelity long-read data and that does not rely on genomes from related species or nuclear genome information for the assembly of organellar genomes. TIPPo employs a deep learning model for initial read classification and leverages k-mer counting for further refinement, significantly reducing the impact of nuclear insertions of organellar DNA on the assembly process. We used TIPPo to completely assemble a set of 54 complete chloroplast genomes. No other tool was able to completely assemble this set. TIPPo is comparable with PMAT in assembling mitochondrial genomes from most species but does achieve even higher completeness for several species. We also used the assembled organelle genomes to identify instances of nuclear plastid DNA (NUPTs) and nuclear mitochondrial DNA (NUMTs) insertions. The cumulative length of NUPTs/NUMTs positively correlates with the size of the nuclear genome, suggesting that insertions occur stochastically. NUPTs/NUMTs show predominantly C:G to T:A changes, with the mutated cytosines typically found in CG and CHG contexts, suggesting that degradation of NUPT and NUMT sequences is driven by the known elevated mutation rate of methylated cytosines. Small interfering RNA loci are enriched in NUPTs and NUMTs, consistent with the RdDM pathway mediating DNA methylation in these sequences.
Collapse
Affiliation(s)
- Wenfei Xian
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Ilja Bezrukov
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Zhigui Bao
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Sebastian Vorbrugg
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Anupam Gautam
- Algorithms in Bioinformatics, Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
- International Max Planck Research School “From Molecules to Organisms”, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
| | - Detlef Weigel
- Department of Molecular Biology, Max Planck Institute for Biology Tübingen, 72076 Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, 72076 Tübingen, Germany
| |
Collapse
|
4
|
Záhonová K, Kaur H, Furgason CC, Smirnova AV, Dunfield PF, Dacks JB. Comparative Analysis of Protist Communities in Oilsands Tailings Using Amplicon Sequencing and Metagenomics. Environ Microbiol 2025; 27:e70029. [PMID: 39797470 PMCID: PMC11724239 DOI: 10.1111/1462-2920.70029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 11/05/2024] [Accepted: 11/29/2024] [Indexed: 01/13/2025]
Abstract
The Canadian province of Alberta contains substantial oilsands reservoirs, consisting of bitumen, clay and sand. Extracting oil involves separating bitumen from inorganic particles using hot water and chemical diluents, resulting in liquid tailings waste with ecotoxicologically significant compounds. Ongoing efforts aim to reclaim tailings-affected areas, with protist colonisation serving as one assessment method of reclamation progress. Oilsands-associated protist communities have mainly been evaluated using amplicon sequencing of the 18S rRNA V4 region; however, this barcode may overlook important protist groups. This study examined how community assessment methods between the V4 and V9 regions differ in representing protist diversity across four oilsands-associated environments. The V9 barcode identified more operational taxonomical units (OTUs) for Discoba, Metamonada and Amoebozoa compared with the V4. A comparative shotgun metagenomics approach revealed few eukaryotic contigs but did recover a complete Paramicrosporidia mitochondrial genome, only the second publicly available from microsporidians. Both V4 and V9 markers were informative for assessing community diversity in oilsands-associated environments and are most effective when combined for a comprehensive taxonomic estimate, particularly in anoxic environments.
Collapse
Affiliation(s)
- Kristína Záhonová
- Division of Infectious Diseases, Department of Medicine, and Department of Biological SciencesUniversity of AlbertaEdmontonAlbertaCanada
- Institute of Parasitology, Biology CentreCzech Academy of SciencesČeské BudějoviceCzech Republic
- Department of Parasitology, Faculty of ScienceCharles UniversityVestecCzech Republic
- Life Science Research Centre, Faculty of ScienceUniversity of OstravaOstravaCzech Republic
| | - Harpreet Kaur
- Division of Infectious Diseases, Department of Medicine, and Department of Biological SciencesUniversity of AlbertaEdmontonAlbertaCanada
| | | | - Angela V. Smirnova
- Department of Biological SciencesUniversity of CalgaryCalgaryAlbertaCanada
| | - Peter F. Dunfield
- Department of Biological SciencesUniversity of CalgaryCalgaryAlbertaCanada
| | - Joel B. Dacks
- Division of Infectious Diseases, Department of Medicine, and Department of Biological SciencesUniversity of AlbertaEdmontonAlbertaCanada
- Institute of Parasitology, Biology CentreCzech Academy of SciencesČeské BudějoviceCzech Republic
| |
Collapse
|
5
|
Wei G. Insights into gut fungi in pigs: A comprehensive review. J Anim Physiol Anim Nutr (Berl) 2025; 109:96-112. [PMID: 39154229 DOI: 10.1111/jpn.14036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 06/17/2024] [Accepted: 08/04/2024] [Indexed: 08/19/2024]
Abstract
Fungi in the gut microbiota of mammals play a crucial role in host physiological regulation, including intestinal homeostasis and host immune regulation. However, our understanding of gut fungi in mammals remains limited, especially in economically valuable animals, such as pigs. Therefore, this review first describes the classification and characterisation of fungi, provides insights into the methods used to study gut fungi, and summarises the recent progress on pig gut fungi. Additionally, it discusses the challenges in the study of pig gut fungi and highlights potential perspectives. The aim of this review is to serve as a valuable reference for advancing our knowledge of gut fungi in animals.
Collapse
Affiliation(s)
- Guanyue Wei
- National Key Laboratory of Pig Genetic Improvement and Germplasm Innovation, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
6
|
Maciszewski K, Wilga G, Jagielski T, Bakuła Z, Gawor J, Gromadka R, Karnkowska A. Reduced plastid genomes of colorless facultative pathogens Prototheca (Chlorophyta) are retained for membrane transport genes. BMC Biol 2024; 22:294. [PMID: 39696433 DOI: 10.1186/s12915-024-02089-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Accepted: 12/03/2024] [Indexed: 12/20/2024] Open
Abstract
BACKGROUND Plastids are usually involved in photosynthesis, but the secondary loss of this function is a widespread phenomenon in various lineages of algae and plants. In addition to the loss of genes associated with photosynthesis, the plastid genomes of colorless algae are frequently reduced further. To understand the pathways of reductive evolution associated with the loss of photosynthesis, it is necessary to study a number of closely related strains. Prototheca, a chlorophyte genus of facultative pathogens, provides an excellent opportunity to study this process with its well-sampled array of diverse colorless strains. RESULTS We have sequenced the plastid genomes of 13 Prototheca strains and reconstructed a comprehensive phylogeny that reveals evolutionary patterns within the genus and among its closest relatives. Our phylogenomic analysis revealed three independent losses of photosynthesis among the Prototheca strains and varied protein-coding gene content in their ptDNA. Despite this diversity, all Prototheca strains retain the same key plastid functions. These include processes related to gene expression, as well as crucial roles in fatty acid and cysteine biosynthesis, and membrane transport. CONCLUSIONS The retention of vestigial genomes in colorless plastids is typically associated with the biosynthesis of secondary metabolites. In contrast, the remarkable conservation of plastid membrane transport system components in the nonphotosynthetic genera Prototheca and Helicosporidium provides an additional constraint against the loss of ptDNA in this lineage. Furthermore, these genes can potentially serve as targets for therapeutic intervention, indicating their importance beyond the evolutionary context.
Collapse
Affiliation(s)
- Kacper Maciszewski
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
- Institute of Parasitology, Biology Centre, Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Gabriela Wilga
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Tomasz Jagielski
- Department of Medical Microbiology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Zofia Bakuła
- Department of Medical Microbiology, Institute of Microbiology, Faculty of Biology, University of Warsaw, Warsaw, Poland
| | - Jan Gawor
- DNA Sequencing and Synthesis Facility, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Robert Gromadka
- DNA Sequencing and Synthesis Facility, Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Anna Karnkowska
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland.
| |
Collapse
|
7
|
McGowan J, Richards TA, Hall N, Swarbreck D. Multiple independent genetic code reassignments of the UAG stop codon in phyllopharyngean ciliates. PLoS Genet 2024; 20:e1011512. [PMID: 39689125 DOI: 10.1371/journal.pgen.1011512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 12/31/2024] [Accepted: 11/25/2024] [Indexed: 12/19/2024] Open
Abstract
The translation of nucleotide sequences into amino acid sequences, governed by the genetic code, is one of the most conserved features of molecular biology. The standard genetic code, which uses 61 sense codons to encode one of the 20 standard amino acids and 3 stop codons (UAA, UAG, and UGA) to terminate translation, is used by most extant organisms. The protistan phylum Ciliophora (the 'ciliates') are the most prominent exception to this norm, exhibiting the grfeatest diversity of nuclear genetic code variants and evidence of repeated changes in the code. In this study, we report the discovery of multiple independent genetic code changes within the Phyllopharyngea class of ciliates. By mining publicly available ciliate genome datasets, we discovered that three ciliate species from the TARA Oceans eukaryotic metagenome dataset use the UAG codon to putatively encode leucine. We identified novel suppressor tRNA genes in two of these genomes which are predicted to decode the reassigned UAG codon to leucine. Phylogenomics analysis revealed that these three uncultivated taxa form a monophyletic lineage within the Phyllopharyngea class. Expanding our analysis by reassembling published phyllopharyngean genome datasets led to the discovery that the UAG codon had also been reassigned to putatively code for glutamine in Hartmannula sinica and Trochilia petrani. Phylogenomics analysis suggests that this occurred via two independent genetic code change events. These data demonstrate that the reassigned UAG codons have widespread usage as sense codons within the phyllopharyngean ciliates. Furthermore, we show that the function of UAA is firmly fixed as the preferred stop codon. These findings shed light on the evolvability of the genetic code in understudied microbial eukaryotes.
Collapse
Affiliation(s)
- Jamie McGowan
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | | | - Neil Hall
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
- School of Biological Sciences, University of East Anglia, Norwich, United Kingdom
| | - David Swarbreck
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| |
Collapse
|
8
|
Tobias PA, Downs J, Epaina P, Singh G, Park RF, Edwards RJ, Brugman E, Zulkifli A, Muhammad J, Purwantara A, Guest DI. Parental assigned chromosomes for cultivated cacao provides insights into genetic architecture underlying resistance to vascular streak dieback. THE PLANT GENOME 2024; 17:e20524. [PMID: 39406693 PMCID: PMC11628906 DOI: 10.1002/tpg2.20524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 08/25/2024] [Accepted: 09/20/2024] [Indexed: 12/11/2024]
Abstract
Diseases of Theobroma cacao L. (Malvaceae) disrupt cocoa bean supply and economically impact growers. Vascular streak dieback (VSD), caused by Ceratobasidium theobromae, is a new encounter disease of cacao currently contained to southeast Asia and Melanesia. Resistance to VSD has been tested with large progeny trials in Sulawesi, Indonesia, and in Papua New Guinea with the identification of informative quantitative trait loci (QTLs). Using a VSD susceptible progeny tree (clone 26), derived from a resistant and susceptible parental cross, we assembled the genome to chromosome-level and discriminated alleles inherited from either resistant or susceptible parents. The parentally phased genomes were annotated for all predicted genes and then specifically for resistance genes of the nucleotide-binding site leucine-rich repeat class (NLR). On investigation, we determined the presence of NLR clusters and other potential disease response gene candidates in proximity to informative QTLs. We identified structural variants within NLRs inherited from parentals. We present the first diploid, fully scaffolded, and parentally phased genome resource for T. cacao L. and provide insights into the genetics underlying resistance and susceptibility to VSD.
Collapse
Affiliation(s)
- Peri A. Tobias
- School of Life and Environmental SciencesThe University of SydneyCamperdownNew South WalesAustralia
| | - Jacob Downs
- School of Life and Environmental SciencesThe University of SydneyCamperdownNew South WalesAustralia
| | - Peter Epaina
- School of Life and Environmental SciencesThe University of SydneyCamperdownNew South WalesAustralia
- Cocoa Board of Papua New GuineaKokopoPapua New Guinea
| | - Gurpreet Singh
- School of Life and Environmental SciencesThe University of SydneyCamperdownNew South WalesAustralia
| | - Robert F. Park
- School of Life and Environmental SciencesThe University of SydneyCamperdownNew South WalesAustralia
| | - Richard J. Edwards
- Minderoo OceanOmics Centre at UWA, Oceans InstituteThe University of Western AustraliaPerthWestern AustraliaAustralia
| | - Eirene Brugman
- Cocoa Research Center Faculty of AgricultureUniversitas HasanuddinMakassarIndonesia
| | | | - Junaid Muhammad
- Cocoa Research Center Faculty of AgricultureUniversitas HasanuddinMakassarIndonesia
| | | | - David I. Guest
- School of Life and Environmental SciencesThe University of SydneyCamperdownNew South WalesAustralia
| |
Collapse
|
9
|
Fields AT, Conway KW, Dolan EP, Swift DG, Monroe AA, Hollenbeck CM, Bean PT, Anderson JD, Portnoy DS. Complete mitochondrial genomes of Notropis oxyrhynchus and Notropis buccula (Cypriniformes: Leuciscidae). Mitochondrial DNA B Resour 2024; 9:1569-1574. [PMID: 39568716 PMCID: PMC11578407 DOI: 10.1080/23802359.2024.2429632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2024] [Accepted: 11/08/2024] [Indexed: 11/22/2024] Open
Abstract
The Leuciscidae (minnows, shiners and relatives) is a diverse family of freshwater fishes with many species endangered due to anthropogenic stressors. Notropis oxyrhynchus and Notropis buccula are two shiners found only in the upper Brazos River basin in Texas, USA and listed as endangered due to contracted habitat. The complete mitochondrial genome was sequenced for two vouchered specimens for each species; Notropis oxyrhynchus having a total mitogenome length of 16,711 bp and N. buccula having a total mitogenome length 16685-16686 bp, with both including 13 protein-coding genes, 22 transfer RNAs genes, and 2 ribosomal RNA genes. Phylogenetic analysis supports previous hypotheses regarding placement of these species.
Collapse
Affiliation(s)
- A T Fields
- Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
| | - K W Conway
- Department of Ecology and Conservation Biology and Biodiversity Research and Teaching Collections, Texas A&M University, College Station, TX, USA
| | - E P Dolan
- Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
| | - D G Swift
- Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
| | - A A Monroe
- Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
| | - C M Hollenbeck
- Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
| | - P T Bean
- Heart of the Hills Fisheries Science Center, Texas Parks and Wildlife Department, Inland Fisheries, TX, USA
| | - J D Anderson
- Perry R. Bass Marine Fisheries Research Station and Hatchery, Texas Parks and Wildlife Department, Coastal Fisheries, Palacios, TX, USA
| | - D S Portnoy
- Marine Genomics Laboratory, Department of Life Sciences, Texas A&M University-Corpus Christi, Corpus Christi, TX, USA
| |
Collapse
|
10
|
Krinos AI, Mars Brisbin M, Hu SK, Cohen NR, Rynearson TA, Follows MJ, Schulz F, Alexander H. Missing microbial eukaryotes and misleading meta-omic conclusions. Nat Commun 2024; 15:9873. [PMID: 39543100 PMCID: PMC11564645 DOI: 10.1038/s41467-024-52212-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Accepted: 08/23/2024] [Indexed: 11/17/2024] Open
Abstract
Meta-omics is commonly used for large-scale analyses of microbial eukaryotes, including species or taxonomic group distribution mapping, gene catalog construction, and inference on the functional roles and activities of microbial eukaryotes in situ. Here, we explore the potential pitfalls of common approaches to taxonomic annotation of protistan meta-omic datasets. We re-analyze three environmental datasets at three levels of taxonomic hierarchy in order to illustrate the crucial importance of database completeness and curation in enabling accurate environmental interpretation. We show that taxonomic membership of sequence clusters estimates community composition more accurately than returning exact sequence labels, and overlap between clusters can address database shortcomings. Clustering approaches can be applied to diverse environments while continuing to exploit the wealth of annotation data collated in databases, and selecting and evaluating these databases is a critical part of correctly annotating protistan taxonomy in environmental datasets. We argue that ongoing curation of genetic resources is crucial in accurately annotating protists in in situ meta-omic datasets. Moreover, we propose that precise taxonomic annotation of meta-omic data is a clustering problem rather than a feasible alignment problem.
Collapse
Affiliation(s)
- Arianna I Krinos
- MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Cambridge and Woods Hole, Cambridge, MA, USA.
- Department of Earth, Atmospheric, and Planetary Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA.
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Margaret Mars Brisbin
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
- Department of Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
- College of Marine Science, University of South Florida, St. Petersburg, FL, USA
| | - Sarah K Hu
- Department of Oceanography, Texas A&M University, College Station, TX, USA
| | - Natalie R Cohen
- Skidaway Institute of Oceanography, University of Georgia, Savannah, GA, USA
| | - Tatiana A Rynearson
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA
| | - Michael J Follows
- Department of Earth, Atmospheric, and Planetary Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Frederik Schulz
- Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Harriet Alexander
- Department of Biology, Woods Hole Oceanographic Institution, Woods Hole, MA, USA.
| |
Collapse
|
11
|
Chen SH, Jones A, Lu-Irving P, Yap JYS, van der Merwe M, Bragg JG, Edwards RJ. Chromosome-Level Genome Assembly of the Australian Rainforest Tree Rhodamnia argentea (Malletwood). Genome Biol Evol 2024; 16:evae238. [PMID: 39487819 PMCID: PMC11604068 DOI: 10.1093/gbe/evae238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2024] [Revised: 09/23/2024] [Accepted: 10/04/2024] [Indexed: 11/04/2024] Open
Abstract
Myrtaceae are a large family of woody plants, including hundreds that are currently under threat from the global spread of a fungal pathogen, Austropuccinia psidii (G. Winter) Beenken, which causes myrtle rust. A reference genome for the Australian native rainforest tree Rhodamnia argentea Benth. (malletwood) was assembled from Oxford Nanopore Technologies long-reads, 10x Genomics Chromium linked-reads, and Hi-C data (N50 = 32.3 Mb and BUSCO completeness 98.0%) with 99.0% of the 347 Mb assembly anchored to 11 chromosomes (2n = 22). The R. argentea genome will inform conservation efforts for Myrtaceae species threatened by myrtle rust, against which it shows variable resistance. We observed contamination in the sequencing data, and further investigation revealed an arthropod source. This study emphasizes the importance of checking sequencing data for contamination, especially when working with nonmodel organisms. It also enhances our understanding of a tree that faces conservation challenges, contributing to broader biodiversity initiatives.
Collapse
Affiliation(s)
- Stephanie H Chen
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, NSW 2052, Australia
- Research Centre for Ecosystem Resilience, Botanic Gardens of Sydney, Sydney, NSW 2000, Australia
- Centre for Australian National Biodiversity Research (a joint venture between Parks Australia and CSIRO), Canberra, ACT 2601, Australia
| | - Ashley Jones
- Research School of Biology, Australian National University, Canberra, ACT 2601, Australia
| | - Patricia Lu-Irving
- Research Centre for Ecosystem Resilience, Botanic Gardens of Sydney, Sydney, NSW 2000, Australia
| | - Jia-Yee S Yap
- Research Centre for Ecosystem Resilience, Botanic Gardens of Sydney, Sydney, NSW 2000, Australia
| | - Marlien van der Merwe
- Research Centre for Ecosystem Resilience, Botanic Gardens of Sydney, Sydney, NSW 2000, Australia
| | - Jason G Bragg
- Research Centre for Ecosystem Resilience, Botanic Gardens of Sydney, Sydney, NSW 2000, Australia
| | - Richard J Edwards
- School of Biotechnology and Biomolecular Sciences, University of New South Wales, Kensington, NSW 2052, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, University of Western Australia, Perth, WA 6009, Australia
| |
Collapse
|
12
|
Pu L, Shamir R. 4CAC: 4-class classifier of metagenome contigs using machine learning and assembly graphs. Nucleic Acids Res 2024; 52:e94. [PMID: 39287139 PMCID: PMC11514454 DOI: 10.1093/nar/gkae799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 07/13/2024] [Accepted: 09/02/2024] [Indexed: 09/19/2024] Open
Abstract
Microbial communities usually harbor a mix of bacteria, archaea, plasmids, viruses and microeukaryotes. Within these communities, viruses, plasmids, and microeukaryotes coexist in relatively low abundance, yet they engage in intricate interactions with bacteria. Moreover, viruses and plasmids, as mobile genetic elements, play important roles in horizontal gene transfer and the development of antibiotic resistance within microbial populations. However, due to the difficulty of identifying viruses, plasmids, and microeukaryotes in microbial communities, our understanding of these minor classes lags behind that of bacteria and archaea. Recently, several classifiers have been developed to separate one or more minor classes from bacteria and archaea in metagenome assemblies. However, these classifiers often overlook the issue of class imbalance, leading to low precision in identifying the minor classes. Here, we developed a classifier called 4CAC that is able to identify viruses, plasmids, microeukaryotes, and prokaryotes simultaneously from metagenome assemblies. 4CAC generates an initial four-way classification using several sequence length-adjusted XGBoost models and further improves the classification using the assembly graph. Evaluation on simulated and real metagenome datasets demonstrates that 4CAC substantially outperforms existing classifiers and combinations thereof on short reads. On long reads, it also shows an advantage unless the abundance of the minor classes is very low. 4CAC runs 1-2 orders of magnitude faster than the other classifiers. The 4CAC software is available at https://github.com/Shamir-Lab/4CAC.
Collapse
Affiliation(s)
- Lianrong Pu
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- School of Computer Science and Technology, Shandong University, Qingdao, China
| | - Ron Shamir
- The Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
13
|
Silva JM, Almeida JR. Enhancing metagenomic classification with compression-based features. Artif Intell Med 2024; 156:102948. [PMID: 39173422 DOI: 10.1016/j.artmed.2024.102948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 06/12/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
Metagenomics is a rapidly expanding field that uses next-generation sequencing technology to analyze the genetic makeup of environmental samples. However, accurately identifying the organisms in a metagenomic sample can be complex, and traditional reference-based methods may need to be more effective in some instances. In this study, we present a novel approach for metagenomic identification, using data compressors as a feature for taxonomic classification. By evaluating a comprehensive set of compressors, including both general-purpose and genomic-specific, we demonstrate the effectiveness of this method in accurately identifying organisms in metagenomic samples. The results indicate that using features from multiple compressors can help identify taxonomy. An overall accuracy of 95% was achieved using this method using an imbalanced dataset with classes with limited samples. The study also showed that the correlation between compression and classification is insignificant, highlighting the need for a multi-faceted approach to metagenomic identification. This approach offers a significant advancement in the field of metagenomics, providing a reference-less method for taxonomic identification that is both effective and efficient while revealing insights into the statistical and algorithmic nature of genomic data. The code to validate this study is publicly available at https://github.com/ieeta-pt/xgTaxonomy.
Collapse
|
14
|
Karačić S, Suarez C, Hagelia P, Persson F, Modin O, Martins PD, Wilén BM. Microbial acidification by N, S, Fe and Mn oxidation as a key mechanism for deterioration of subsea tunnel sprayed concrete. Sci Rep 2024; 14:22742. [PMID: 39349736 PMCID: PMC11442690 DOI: 10.1038/s41598-024-73911-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Accepted: 09/23/2024] [Indexed: 10/04/2024] Open
Abstract
The deterioration of fibre-reinforced sprayed concrete was studied in the Oslofjord subsea tunnel (Norway). At sites with intrusion of saline groundwater resulting in biofilm growth, the concrete exhibited significant concrete deterioration and steel fibre corrosion. Using amplicon sequencing and shotgun metagenomics, the microbial taxa and surveyed potential microbial mechanisms of concrete degradation at two sites over five years were identified. The concrete beneath the biofilm was investigated with polarised light microscopy, scanning electron microscopy and X-ray diffraction. The oxic environment in the tunnel favoured aerobic oxidation processes in nitrogen, sulfur and metal biogeochemical cycling as evidenced by large abundances of metagenome-assembled genomes (MAGs) with potential for oxidation of nitrogen, sulfur, manganese and iron, observed mild acidification of the concrete, and the presence of manganese- and iron oxides. These results suggest that autotrophic microbial populations involved in the cycling of several elements contributed to the corrosion of steel fibres and acidification causing concrete deterioration.
Collapse
Affiliation(s)
- Sabina Karačić
- Department of Architecture and Civil Engineering, Chalmers University of Technology, Göteborg, 41296, Sweden
- Institute of Medical Microbiology, Immunology and Parasitology, Medical Faculty, Rheinische Friedrich-Wilhelms Universität, 53127, Bonn, Germany
| | - Carolina Suarez
- Division of Water Resources Engineering, Faculty of Engineering LTH, Lund University, Lund, 221 00, Sweden
- Sweden Water Research AB, Lund, 222 35, Sweden
| | - Per Hagelia
- Construction Division, The Norwegian Public Roads Administration, Oslo, 0030, Norway
- Müller-Sars Biological Station, Ørje, NO-1871, Norway
| | - Frank Persson
- Department of Architecture and Civil Engineering, Chalmers University of Technology, Göteborg, 41296, Sweden
| | - Oskar Modin
- Department of Architecture and Civil Engineering, Chalmers University of Technology, Göteborg, 41296, Sweden
| | - Paula Dalcin Martins
- Department of Ecosystem and Landscape Dynamics, University of Amsterdam, Amsterdam, 1090 GE, Netherlands
- Microbial Ecology Cluster, GELIFES, University of Groningen, Groningen, 9747 AG, Netherlands
| | - Britt-Marie Wilén
- Department of Architecture and Civil Engineering, Chalmers University of Technology, Göteborg, 41296, Sweden.
| |
Collapse
|
15
|
Wang W, Song W, Majzoub ME, Feng X, Xu B, Tao J, Zhu Y, Li Z, Qian PY, Webster NS, Thomas T, Fan L. Decoupling of strain- and intrastrain-level interactions of microbiomes in a sponge holobiont. Nat Commun 2024; 15:8205. [PMID: 39294150 PMCID: PMC11410982 DOI: 10.1038/s41467-024-52464-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 09/07/2024] [Indexed: 09/20/2024] Open
Abstract
Holobionts are highly organized assemblages of eukaryotic hosts, cellular microbial symbionts, and viruses, whose interactions and evolution involve complex biological processes. It is largely unknown which specific determinants drive similarity or individuality in genetic diversity between holobionts. Here, we combine short- and long-read sequencing and DNA-proximity-linkage technologies to investigate intraspecific diversity of the microbiomes, including host-resolved viruses, in individuals of a model marine sponge. We find strong impacts of the sponge host and the cellular hosts of viruses on strain-level organization of the holobiont, whereas substantial overlap in nucleotide diversity between holobionts suggests frequent exchanges of microbial cells and viruses at intrastrain level in the local sponge population. Immune-evasive arms races likely restricted virus-host co-evolution at the intrastrain level, generated holobiont-specific genome variations, and linked virus-host genetics through recombination. Our work shows that a decoupling of strain- and intrastrain-level interactions is a key factor in the genetic diversification of holobionts.
Collapse
Affiliation(s)
- Wenxiu Wang
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Weizhi Song
- Center for Marine Science and Innovation, University of New South Wales, Sydney, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Marwan E Majzoub
- Center for Marine Science and Innovation, University of New South Wales, Sydney, New South Wales, Australia
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia
| | - Xiaoyuan Feng
- Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, Guangdong, China
| | - Bu Xu
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Jianchang Tao
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Yuanqing Zhu
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China
| | - Zhiyong Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Minhang, Shanghai, China
| | - Pei-Yuan Qian
- Department of Ocean Science, The Hong Kong University of Science and Technology, Kowloon, Hong Kong, China
- Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, Guangdong, China
| | - Nicole S Webster
- The Australian Antarctic Division, Kingston, Tasmania, Australia
- Australian Centre for Ecogenomics, University of Queensland, Brisbane, Queensland, Australia
- Australian Institute of Marine Science, Townsville, Queensland, Australia
| | - Torsten Thomas
- Center for Marine Science and Innovation, University of New South Wales, Sydney, New South Wales, Australia.
- School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, New South Wales, Australia.
| | - Lu Fan
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China.
| |
Collapse
|
16
|
Espinoza JL, Phillips A, Prentice MB, Tan GS, Kamath PL, Lloyd KG, Dupont CL. Unveiling the microbial realm with VEBA 2.0: a modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic and viral multi-omics from either short- or long-read sequencing. Nucleic Acids Res 2024; 52:e63. [PMID: 38909293 DOI: 10.1093/nar/gkae528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 05/21/2024] [Accepted: 06/10/2024] [Indexed: 06/24/2024] Open
Abstract
The microbiome is a complex community of microorganisms, encompassing prokaryotic (bacterial and archaeal), eukaryotic, and viral entities. This microbial ensemble plays a pivotal role in influencing the health and productivity of diverse ecosystems while shaping the web of life. However, many software suites developed to study microbiomes analyze only the prokaryotic community and provide limited to no support for viruses and microeukaryotes. Previously, we introduced the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite to address this critical gap in microbiome research by extending genome-resolved analysis beyond prokaryotes to encompass the understudied realms of eukaryotes and viruses. Here we present VEBA 2.0 with key updates including a comprehensive clustered microeukaryotic protein database, rapid genome/protein-level clustering, bioprospecting, non-coding/organelle gene modeling, genome-resolved taxonomic/pathway profiling, long-read support, and containerization. We demonstrate VEBA's versatile application through the analysis of diverse case studies including marine water, Siberian permafrost, and white-tailed deer lung tissues with the latter showcasing how to identify integrated viruses. VEBA represents a crucial advancement in microbiome research, offering a powerful and accessible software suite that bridges the gap between genomics and biotechnological solutions.
Collapse
Affiliation(s)
- Josh L Espinoza
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Allan Phillips
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Melanie B Prentice
- School of Food and Agriculture, University of Maine, Orono, ME 04469, USA
| | - Gene S Tan
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Pauline L Kamath
- School of Food and Agriculture, University of Maine, Orono, ME 04469, USA
- Maine Center for Genetics in the Environment, University of Maine, Orono, ME 04469, USA
| | - Karen G Lloyd
- Microbiology Department, University of Tennessee, Knoxville, TN 37917, USA
| | - Chris L Dupont
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| |
Collapse
|
17
|
Burlakoti RR, Sapkota S, Lubberts M, Sharifi M. First Report and Genome Resource of Monilinia vaccinii-corymbosi, causal agent of Mummy Berry Disease of Black Huckleberry ( Vaccinium membranaceum). J Genomics 2024; 12:71-74. [PMID: 39135665 PMCID: PMC11317209 DOI: 10.7150/jgen.97432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 07/26/2024] [Indexed: 08/15/2024] Open
Abstract
Monilinia vacccinii-corymbosi (phylum Ascomycota, family Sclerotiniaceae) causes fruit disease 'mummy berry' on berry crops and responsible for yield losses and quality of fruits. We reported mummy berry disease of black huckleberry (Vaccinium membranaceum) first time in British Columbia, Canada. We have performed sequencing and genome assembly of M. vacccinii-corymbosi from infected fruits of huckleberry. The resulting genome was 33.8 Mbp in size and consisted of 2,437 scaffolds with an N50 of 33,816 bp. To our best knowledge, this is the first report of resource announcement of whole genome sequence of mummy berry pathogen (M. vacccinii-corymbosi) infecting black huckleberry. The genome resource will be valuable for future studies to understand the genomic structure of pathogen, and mechanisms associated with black huckleberry-M. vacccinii-corymbosi interactions.
Collapse
Affiliation(s)
- Rishi R. Burlakoti
- Science and Technology Branch, Agassiz Research and Development Centre, Agriculture and Agri-Food Canada, Agassiz, BC, Canada
| | - Sanjib Sapkota
- Science and Technology Branch, Agassiz Research and Development Centre, Agriculture and Agri-Food Canada, Agassiz, BC, Canada
| | - Mark Lubberts
- Science and Technology Branch, Summerland Research and Development Centre, Agriculture and Agri-Food Canada, Summerland, BC, Canada
| | - Mehdi Sharifi
- Science and Technology Branch, Summerland Research and Development Centre, Agriculture and Agri-Food Canada, Summerland, BC, Canada
| |
Collapse
|
18
|
Matos A, Vilas-Arrondo N, Gomes-dos-Santos A, Veríssimo A, Román-Marcote E, Baldó F, Moreno-Aguilar J, Pérez M, Lopes-Lima M, Froufe E, Castro LFC. The complete mitogenome of the Atlantic longnose chimaera Rhinochimaera atlantica (Holt & Byrne, 1909). Mitochondrial DNA B Resour 2024; 9:886-891. [PMID: 39027115 PMCID: PMC11257016 DOI: 10.1080/23802359.2024.2378127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 07/04/2024] [Indexed: 07/20/2024] Open
Abstract
Holocephali is a subclass of chondrichthyans with ample geographic distribution in marine ecosystems. Holocephalan species are organized into three families: Callorhinchidae, Chimaeridae, and Rhinochimaeridae. Despite the critical ecological and evolutionary importance, genomic information from holocephalans is still scarce, particularly from rhinochimaerids. The present study provides the first complete mitogenome of the Atlantic longnose chimaera Rhinochimaera atlantica (Holt & Byrne, 1909). The whole mitogenome was sequenced from an R. atlantica specimen, collected on the Porcupine Bank (NE Atlantic), by Illumina high-throughput sequencing. The R. atlantica mitogenome has 17,852 nucleotides with 13 protein-coding genes, 22 transfer RNA, and two ribosomal RNA genes. Nine of these genes are in the complementary strand. This mitogenome has a GC content of 41.5% and an AT content of 58.5%. The phylogenetic reconstruction provided here, using all the available complete and partial Holocephali mitogenomes, places R. atlantica in the Rhinochimaeridae family, as expected. This genomic resource will be useful in the genomic characterization of this species.
Collapse
Affiliation(s)
- Ana Matos
- CIIMAR/CIMAR – Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
| | - Nair Vilas-Arrondo
- Programa de Doctorado “Ciencias marinas, Tecnología y Gestión” (Do*MAR), Universidad de Vigo, Vigo, Spain
- Centro Oceanográfico de Vigo (COV), Instituto Español de Oceanografía (IEO), CSIC, Vigo, Spain
| | - André Gomes-dos-Santos
- CIIMAR/CIMAR – Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
| | - Ana Veríssimo
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Esther Román-Marcote
- Centro Oceanográfico de Vigo (COV), Instituto Español de Oceanografía (IEO), CSIC, Vigo, Spain
| | - Francisco Baldó
- Centro Oceanográfico de Cádiz (COCAD), Instituto Español de Oceanografía (IEO), CSIC, Cádiz, Spain
| | - Jaime Moreno-Aguilar
- Tecnologías y Servicios Agrarios, S.A. (TRAGSATEC), C/ Orient, Ciutadella, Spain
| | - Montse Pérez
- Centro Oceanográfico de Vigo (COV), Instituto Español de Oceanografía (IEO), CSIC, Vigo, Spain
| | - Manuel Lopes-Lima
- CIBIO, Centro de Investigação em Biodiversidade e Recursos Genéticos, InBIO Laboratório Associado, Universidade do Porto, Vairão, Portugal
- BIOPOLIS Program in Genomics, Biodiversity and Land Planning, CIBIO, Vairão, Portugal
| | - Elsa Froufe
- CIIMAR/CIMAR – Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
| | - L. Filipe C. Castro
- CIIMAR/CIMAR – Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Matosinhos, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| |
Collapse
|
19
|
Hou S, Tang T, Cheng S, Liu Y, Xia T, Chen T, Fuhrman J, Sun F. DeepMicroClass sorts metagenomic contigs into prokaryotes, eukaryotes and viruses. NAR Genom Bioinform 2024; 6:lqae044. [PMID: 38711860 PMCID: PMC11071121 DOI: 10.1093/nargab/lqae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 03/18/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
Sequence classification facilitates a fundamental understanding of the structure of microbial communities. Binary metagenomic sequence classifiers are insufficient because environmental metagenomes are typically derived from multiple sequence sources. Here we introduce a deep-learning based sequence classifier, DeepMicroClass, that classifies metagenomic contigs into five sequence classes, i.e. viruses infecting prokaryotic or eukaryotic hosts, eukaryotic or prokaryotic chromosomes, and prokaryotic plasmids. DeepMicroClass achieved high performance for all sequence classes at various tested sequence lengths ranging from 500 bp to 100 kbps. By benchmarking on a synthetic dataset with variable sequence class composition, we showed that DeepMicroClass obtained better performance for eukaryotic, plasmid and viral contig classification than other state-of-the-art predictors. DeepMicroClass achieved comparable performance on viral sequence classification with geNomad and VirSorter2 when benchmarked on the CAMI II marine dataset. Using a coastal daily time-series metagenomic dataset as a case study, we showed that microbial eukaryotes and prokaryotic viruses are integral to microbial communities. By analyzing monthly metagenomes collected at HOT and BATS, we found relatively higher viral read proportions in the subsurface layer in late summer, consistent with the seasonal viral infection patterns prevalent in these areas. We expect DeepMicroClass will promote metagenomic studies of under-appreciated sequence types.
Collapse
Affiliation(s)
- Shengwei Hou
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Tianqi Tang
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Siliangyu Cheng
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Yuanhao Liu
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Tian Xia
- Department of Ocean Science and Engineering, Southern University of Science and Technology, Shenzhen 518055, China
| | - Ting Chen
- Department of Computer Science and Technology, Institute of Artificial Intelligence & BNRist, Tsinghua University, Beijing 100084, China
| | - Jed A Fuhrman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Fengzhu Sun
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
20
|
Halstead-Nussloch G, Signorini SG, Giulio M, Crocetta F, Munari M, Della Torre C, Weber AAT. The genome of the rayed Mediterranean limpet Patella caerulea (Linnaeus, 1758). Genome Biol Evol 2024; 16:evae070. [PMID: 38546725 PMCID: PMC11003540 DOI: 10.1093/gbe/evae070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2024] [Indexed: 04/11/2024] Open
Abstract
Patella caerulea (Linnaeus, 1758) is a mollusc limpet species of the class Gastropoda. Endemic to the Mediterranean Sea, it is considered a keystone species due to its primary role in structuring and regulating the ecological balance of tidal and subtidal habitats. It is currently being used as a bioindicator to assess the environmental quality of coastal marine waters and as a model species to understand adaptation to ocean acidification. Here, we provide a high-quality reference genome assembly and annotation for P. caerulea. We generated ∼30 Gb of Pacific Biosciences high-fidelity data from a single individual and provide a final 749.8 Mb assembly containing 62 contigs, including the mitochondrial genome (14,938 bp). With an N50 of 48.8 Mb and 98% of the assembly contained in the 18 largest contigs, this assembly is near chromosome-scale. Benchmarking Universal Single-Copy Orthologs scores were high (Mollusca, 87.8% complete; Metazoa, 97.2% complete) and similar to metrics observed for other chromosome-level Patella genomes, highlighting a possible bias in the Mollusca database for Patellids. We generated transcriptomic Illumina data from a second individual collected at the same locality and used it together with protein evidence to annotate the genome. A total of 23,938 protein-coding gene models were found. By comparing this annotation with other published Patella annotations, we found that the distribution and median values of exon and gene lengths was comparable with other Patella species despite different annotation approaches. The present high-quality P. caerulea reference genome, available on GenBank (BioProject: PRJNA1045377; assembly: GCA_036850965.1), is an important resource for future ecological and evolutionary studies.
Collapse
Affiliation(s)
| | - Silvia Giorgia Signorini
- Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
- Department of Biosciences, University of Milan, Milan, Italy
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Marco Giulio
- Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
| | - Fabio Crocetta
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
- National Biodiversity Future Center (NBFC), Palermo, Italy
| | - Marco Munari
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
- Department of Biology, Stazione Idrobiologica ‘Umberto d’Ancona’, University of Padova, Chioggia, Italy
| | - Camilla Della Torre
- Department of Biosciences, University of Milan, Milan, Italy
- Department of Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy
| | - Alexandra Anh-Thu Weber
- Department of Aquatic Ecology, Swiss Federal Institute of Aquatic Science and Technology (Eawag), Dübendorf, Switzerland
| |
Collapse
|
21
|
Hollender M, Sałek M, Karlicki M, Karnkowska A. Single-cell genomics revealed Candidatus Grellia alia sp. nov. as an endosymbiont of Eutreptiella sp. (Euglenophyceae). Protist 2024; 175:126018. [PMID: 38325049 DOI: 10.1016/j.protis.2024.126018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/22/2024] [Accepted: 01/26/2024] [Indexed: 02/09/2024]
Abstract
Though endosymbioses between protists and prokaryotes are widespread, certain host lineages have received disproportionate attention what may indicate either a predisposition to such interactions or limited studies on certain protist groups due to lack of cultures. The euglenids represent one such group in spite of microscopic observations showing intracellular bacteria in some strains. Here, we perform a comprehensive molecular analysis of a previously identified endosymbiont in the Eutreptiella sp. CCMP3347 using a single cell approach and bulk culture sequencing. The genome reconstruction of this endosymbiont allowed the description of a new endosymbiont Candidatus Grellia alia sp. nov. from the family Midichloriaceae. Comparative genomics revealed a remarkably complete conjugative type IV secretion system present in three copies on the plasmid sequences of the studied endosymbiont, a feature missing in the closely related Grellia incantans. This study addresses the challenge of limited host cultures with endosymbionts by showing that the genomes of endosymbionts reconstructed from single host cells have the completeness and contiguity that matches or exceeds those coming from bulk cultures. This paves the way for further studies of endosymbionts in euglenids and other protist groups. The research also provides the opportunity to study the diversity of endosymbionts in natural populations.
Collapse
Affiliation(s)
- Metody Hollender
- Institute of Evolutionary Biology, Biological and Chemical Research Centre, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland
| | - Marta Sałek
- Institute of Evolutionary Biology, Biological and Chemical Research Centre, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland
| | - Michał Karlicki
- Institute of Evolutionary Biology, Biological and Chemical Research Centre, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland
| | - Anna Karnkowska
- Institute of Evolutionary Biology, Biological and Chemical Research Centre, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland.
| |
Collapse
|
22
|
Zhong J, Osborn T, Del Rosario Hernández T, Kyrysyuk O, Tully BJ, Anderson RE. Increasing transposase abundance with ocean depth correlates with a particle-associated lifestyle. mSystems 2024; 9:e0006724. [PMID: 38380923 PMCID: PMC10949469 DOI: 10.1128/msystems.00067-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/25/2024] [Indexed: 02/22/2024] Open
Abstract
Transposases are mobile genetic elements that move within and between genomes, promoting genomic plasticity in microorganisms. In marine microbial communities, the abundance of transposases increases with depth, but the reasons behind this trend remain unclear. Our analysis of metagenomes from the Tara Oceans and Malaspina Expeditions suggests that a particle-associated lifestyle is the main covariate for the high occurrence of transposases in the deep ocean, and this trend holds true for individual genomes as well as in a community-wide sense. We observed a strong and depth-independent correlation between transposase abundance and the presence of biofilm-associated genes, as well as the prevalence of secretory enzymes. This suggests that mobile genetic elements readily propagate among microbial communities within crowded biofilms. Furthermore, we show that particle association positively correlates with larger genome size, which is in turn associated with higher transposase abundance. Cassette sequences associated with transposons are enriched with genes related to defense mechanisms, which are more highly expressed in the deep sea. Thus, while transposons spread at the expense of their microbial hosts, they also introduce novel genes and potentially benefit the hosts in helping to compete for limited resources. Overall, our results suggest a new understanding of deep ocean particles as highways for gene sharing among defensively oriented microbial genomes.IMPORTANCEGenes can move within and between microbial genomes via mobile genetic elements, which include transposases and transposons. In the oceans, there is a puzzling increase in transposase abundance in microbial genomes as depth increases. To gain insight into this trend, we conducted an extensive analysis of marine microbial metagenomes and metatranscriptomes. We found a significant correlation between transposase abundance and a particle-associated lifestyle among marine microbes at both the metagenome and genome-resolved levels. We also observed a link between transposase abundance and genes related to defense mechanisms. These results suggest that as microbes become densely packed into crowded particles, mobile genes are more likely to spread and carry genetic material that provides a competitive advantage in crowded habitats. This may enable deep sea microbes to effectively compete in such environments.
Collapse
Affiliation(s)
- Juntao Zhong
- Carleton College, Northfield, Minnesota, USA
- Department of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Troy Osborn
- Carleton College, Northfield, Minnesota, USA
| | - Thais Del Rosario Hernández
- Carleton College, Northfield, Minnesota, USA
- Department of Molecular Biology, Cell Biology and Biochemistry, Brown University, Providence, Rhode Island, USA
| | - Oleksandr Kyrysyuk
- Carleton College, Northfield, Minnesota, USA
- Yale School of Medicine, Yale University, New Haven, Connecticut, USA
| | - Benjamin J. Tully
- Marine & Environmental Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | | |
Collapse
|
23
|
Suarez C, Rosenqvist T, Dimitrova I, Sedlacek CJ, Modin O, Paul CJ, Hermansson M, Persson F. Biofilm colonization and succession in a full-scale partial nitritation-anammox moving bed biofilm reactor. MICROBIOME 2024; 12:51. [PMID: 38475926 DOI: 10.1186/s40168-024-01762-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 01/09/2024] [Indexed: 03/14/2024]
Abstract
BACKGROUND Partial nitritation-anammox (PNA) is a biological nitrogen removal process commonly used in wastewater treatment plants for the treatment of warm and nitrogen-rich sludge liquor from anaerobic digestion, often referred to as sidestream wastewater. In these systems, biofilms are frequently used to retain biomass with aerobic ammonia-oxidizing bacteria (AOB) and anammox bacteria, which together convert ammonium to nitrogen gas. Little is known about how these biofilm communities develop, and whether knowledge about the assembly of biofilms in natural communities can be applied to PNA biofilms. RESULTS We followed the start-up of a full-scale PNA moving bed biofilm reactor for 175 days using shotgun metagenomics. Environmental filtering likely restricted initial biofilm colonization, resulting in low phylogenetic diversity, with the initial microbial community comprised mainly of Proteobacteria. Facilitative priority effects allowed further biofilm colonization, with the growth of initial aerobic colonizers promoting the arrival and growth of anaerobic taxa like methanogens and anammox bacteria. Among the early colonizers were known 'oligotrophic' ammonia oxidizers including comammox Nitrospira and Nitrosomonas cluster 6a AOB. Increasing the nitrogen load in the bioreactor allowed colonization by 'copiotrophic' Nitrosomonas cluster 7 AOB and resulted in the exclusion of the initial ammonia- and nitrite oxidizers. CONCLUSIONS We show that complex dynamic processes occur in PNA microbial communities before a stable bioreactor process is achieved. The results of this study not only contribute to our knowledge about biofilm assembly and PNA bioreactor start-up but could also help guide strategies for the successful implementation of PNA bioreactors. Video Abstract.
Collapse
Affiliation(s)
- Carolina Suarez
- Division of Water Resources Engineering, Faculty of Engineering LTH, Lund University, Lund, Sweden.
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.
| | - Tage Rosenqvist
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
| | | | - Christopher J Sedlacek
- Division of Microbial Ecology, Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria
| | - Oskar Modin
- Division of Water Environment Technology, Department of Architecture and Civil Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Catherine J Paul
- Division of Water Resources Engineering, Faculty of Engineering LTH, Lund University, Lund, Sweden
- Division of Applied Microbiology, Department of Chemistry, Lund University, Lund, Sweden
| | - Malte Hermansson
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
| | - Frank Persson
- Division of Water Environment Technology, Department of Architecture and Civil Engineering, Chalmers University of Technology, Gothenburg, Sweden
| |
Collapse
|
24
|
Espinoza JL, Phillips A, Prentice MB, Tan GS, Kamath PL, Lloyd KG, Dupont CL. Unveiling the Microbial Realm with VEBA 2.0: A modular bioinformatics suite for end-to-end genome-resolved prokaryotic, (micro)eukaryotic, and viral multi-omics from either short- or long-read sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.08.583560. [PMID: 38559265 PMCID: PMC10979853 DOI: 10.1101/2024.03.08.583560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
The microbiome is a complex community of microorganisms, encompassing prokaryotic (bacterial and archaeal), eukaryotic, and viral entities. This microbial ensemble plays a pivotal role in influencing the health and productivity of diverse ecosystems while shaping the web of life. However, many software suites developed to study microbiomes analyze only the prokaryotic community and provide limited to no support for viruses and microeukaryotes. Previously, we introduced the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite to address this critical gap in microbiome research by extending genome-resolved analysis beyond prokaryotes to encompass the understudied realms of eukaryotes and viruses. Here we present VEBA 2.0 with key updates including a comprehensive clustered microeukaryotic protein database, rapid genome/protein-level clustering, bioprospecting, non-coding/organelle gene modeling, genome-resolved taxonomic/pathway profiling, long-read support, and containerization. We demonstrate VEBA's versatile application through the analysis of diverse case studies including marine water, Siberian permafrost, and white-tailed deer lung tissues with the latter showcasing how to identify integrated viruses. VEBA represents a crucial advancement in microbiome research, offering a powerful and accessible platform that bridges the gap between genomics and biotechnological solutions.
Collapse
Affiliation(s)
- Josh L. Espinoza
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Allan Phillips
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | | | - Gene S. Tan
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| | - Pauline L. Kamath
- School of Food and Agriculture, University of Maine, Orono, ME 04469, USA
| | - Karen G. Lloyd
- Microbiology Department, University of Tennessee, Knoxville, TN 37917, USA
| | - Chris L. Dupont
- Department of Environment and Sustainability, J. Craig Venter Institute, La Jolla, CA 92037, USA
- Department of Genomic Medicine and Infectious Diseases, J. Craig Venter Institute, La Jolla, CA 92037, USA
| |
Collapse
|
25
|
Sami A, El-Metwally S, Rashad MZ. MAC-ErrorReads: machine learning-assisted classifier for filtering erroneous NGS reads. BMC Bioinformatics 2024; 25:61. [PMID: 38321434 PMCID: PMC10848413 DOI: 10.1186/s12859-024-05681-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/29/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND The rapid advancement of next-generation sequencing (NGS) machines in terms of speed and affordability has led to the generation of a massive amount of biological data at the expense of data quality as errors become more prevalent. This introduces the need to utilize different approaches to detect and filtrate errors, and data quality assurance is moved from the hardware space to the software preprocessing stages. RESULTS We introduce MAC-ErrorReads, a novel Machine learning-Assisted Classifier designed for filtering Erroneous NGS Reads. MAC-ErrorReads transforms the erroneous NGS read filtration process into a robust binary classification task, employing five supervised machine learning algorithms. These models are trained on features extracted through the computation of Term Frequency-Inverse Document Frequency (TF_IDF) values from various datasets such as E. coli, GAGE S. aureus, H. Chr14, Arabidopsis thaliana Chr1 and Metriaclima zebra. Notably, Naive Bayes demonstrated robust performance across various datasets, displaying high accuracy, precision, recall, F1-score, MCC, and ROC values. The MAC-ErrorReads NB model accurately classified S. aureus reads, surpassing most error correction tools with a 38.69% alignment rate. For H. Chr14, tools like Lighter, Karect, CARE, Pollux, and MAC-ErrorReads showed rates above 99%. BFC and RECKONER exceeded 98%, while Fiona had 95.78%. For the Arabidopsis thaliana Chr1, Pollux, Karect, RECKONER, and MAC-ErrorReads demonstrated good alignment rates of 92.62%, 91.80%, 91.78%, and 90.87%, respectively. For the Metriaclima zebra, Pollux achieved a high alignment rate of 91.23%, despite having the lowest number of mapped reads. MAC-ErrorReads, Karect, and RECKONER demonstrated good alignment rates of 83.76%, 83.71%, and 83.67%, respectively, while also producing reasonable numbers of mapped reads to the reference genome. CONCLUSIONS This study demonstrates that machine learning approaches for filtering NGS reads effectively identify and retain the most accurate reads, significantly enhancing assembly quality and genomic coverage. The integration of genomics and artificial intelligence through machine learning algorithms holds promise for enhancing NGS data quality, advancing downstream data analysis accuracy, and opening new opportunities in genetics, genomics, and personalized medicine research.
Collapse
Affiliation(s)
- Amira Sami
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| | - Sara El-Metwally
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt.
- Biomedical Informatics Department, Faculty of Computer Science and Engineering, New Mansoura University, Gamasa, 35712, Egypt.
| | - M Z Rashad
- Department of Computer Science, Faculty of Computers and Information, Mansoura University, P.O. Box: 35516, Mansoura, Egypt
| |
Collapse
|
26
|
Sánchez P, Coutinho FH, Sebastián M, Pernice MC, Rodríguez-Martínez R, Salazar G, Cornejo-Castillo FM, Pesant S, López-Alforja X, López-García EM, Agustí S, Gojobori T, Logares R, Sala MM, Vaqué D, Massana R, Duarte CM, Acinas SG, Gasol JM. Marine picoplankton metagenomes and MAGs from eleven vertical profiles obtained by the Malaspina Expedition. Sci Data 2024; 11:154. [PMID: 38302528 PMCID: PMC10834958 DOI: 10.1038/s41597-024-02974-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Accepted: 01/16/2024] [Indexed: 02/03/2024] Open
Abstract
The Ocean microbiome has a crucial role in Earth's biogeochemical cycles. During the last decade, global cruises such as Tara Oceans and the Malaspina Expedition have expanded our understanding of the diversity and genetic repertoire of marine microbes. Nevertheless, there are still knowledge gaps regarding their diversity patterns throughout depth gradients ranging from the surface to the deep ocean. Here we present a dataset of 76 microbial metagenomes (MProfile) of the picoplankton size fraction (0.2-3.0 µm) collected in 11 vertical profiles covering contrasting ocean regions sampled during the Malaspina Expedition circumnavigation (7 depths, from surface to 4,000 m deep). The MProfile dataset produced 1.66 Tbp of raw DNA sequences from which we derived: 17.4 million genes clustered at 95% sequence similarity (M-GeneDB-VP), 2,672 metagenome-assembled genomes (MAGs) of Archaea and Bacteria (Malaspina-VP-MAGs), and over 100,000 viral genomic sequences. This dataset will be a valuable resource for exploring the functional and taxonomic connectivity between the photic and bathypelagic tropical and sub-tropical ocean, while increasing our general knowledge of the Ocean microbiome.
Collapse
Affiliation(s)
- Pablo Sánchez
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain.
| | - Felipe H Coutinho
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Marta Sebastián
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Massimo C Pernice
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Raquel Rodríguez-Martínez
- Departamento de Biotecnología, Facultad de Ciencias del Mar y Recursos Biológicos, Universidad de Antofagasta, Antofagasta, Chile
- Laboratorio de Complejidad Microbiana y Ecología Funcional, Instituto Antofagasta, Universidad de Antofagasta, Antofagasta, Chile
- Centre for Biotechnology & Bioengineering (CeBiB), Santiago, Chile
| | - Guillem Salazar
- Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Stéphane Pesant
- EMBL's European Bioinformatics Institute (EMBL-EBI), Hinxton, UK
| | - Xabier López-Alforja
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Ester María López-García
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
- Centre National de la Recherche Scientifique (CNRS), UMR5254, IPREM, Pau, France
| | - Susana Agustí
- King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC) and Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Takashi Gojobori
- King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC) and Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Ramiro Logares
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Maria Montserrat Sala
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Dolors Vaqué
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Ramon Massana
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain
| | - Carlos M Duarte
- King Abdullah University of Science and Technology (KAUST), Red Sea Research Center (RSRC) and Computational Bioscience Research Center (CBRC), Thuwal, Saudi Arabia
| | - Silvia G Acinas
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain.
| | - Josep M Gasol
- Institut de Ciències del Mar, CSIC, Passeig Marítim de la Barceloneta 37-49, 08003, Barcelona, Spain.
| |
Collapse
|
27
|
Cerk K, Ugalde‐Salas P, Nedjad CG, Lecomte M, Muller C, Sherman DJ, Hildebrand F, Labarthe S, Frioux C. Community-scale models of microbiomes: Articulating metabolic modelling and metagenome sequencing. Microb Biotechnol 2024; 17:e14396. [PMID: 38243750 PMCID: PMC10832553 DOI: 10.1111/1751-7915.14396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/27/2023] [Accepted: 12/20/2023] [Indexed: 01/21/2024] Open
Abstract
Building models is essential for understanding the functions and dynamics of microbial communities. Metabolic models built on genome-scale metabolic network reconstructions (GENREs) are especially relevant as a means to decipher the complex interactions occurring among species. Model reconstruction increasingly relies on metagenomics, which permits direct characterisation of naturally occurring communities that may contain organisms that cannot be isolated or cultured. In this review, we provide an overview of the field of metabolic modelling and its increasing reliance on and synergy with metagenomics and bioinformatics. We survey the means of assigning functions and reconstructing metabolic networks from (meta-)genomes, and present the variety and mathematical fundamentals of metabolic models that foster the understanding of microbial dynamics. We emphasise the characterisation of interactions and the scaling of model construction to large communities, two important bottlenecks in the applicability of these models. We give an overview of the current state of the art in metagenome sequencing and bioinformatics analysis, focusing on the reconstruction of genomes in microbial communities. Metagenomics benefits tremendously from third-generation sequencing, and we discuss the opportunities of long-read sequencing, strain-level characterisation and eukaryotic metagenomics. We aim at providing algorithmic and mathematical support, together with tool and application resources, that permit bridging the gap between metagenomics and metabolic modelling.
Collapse
Affiliation(s)
- Klara Cerk
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | | | - Chabname Ghassemi Nedjad
- Inria, University of Bordeaux, INRAETalenceFrance
- University of Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800TalenceFrance
| | - Maxime Lecomte
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE STLO¸University of RennesRennesFrance
| | | | | | - Falk Hildebrand
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Simon Labarthe
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE, University of Bordeaux, BIOGECO, UMR 1202CestasFrance
| | | |
Collapse
|
28
|
Minch B, Chakraborty M, Purkis S, Rodrigue M, Moniruzzaman M. Active prokaryotic and eukaryotic viral ecology across spatial scale in a deep-sea brine pool. ISME COMMUNICATIONS 2024; 4:ycae084. [PMID: 39021441 PMCID: PMC11252502 DOI: 10.1093/ismeco/ycae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/03/2024] [Accepted: 06/12/2024] [Indexed: 07/20/2024]
Abstract
Deep-sea brine pools represent rare, extreme environments, providing unique insight into the limits of life on Earth, and by analogy, the plausibility of life beyond it. A distinguishing feature of many brine pools is presence of thick microbial mats that develop at the brine-seawater interface. While these bacterial and archaeal communities have received moderate attention, viruses and their host interactions in these environments remain underexplored. To bridge this knowledge gap, we leveraged metagenomic and metatranscriptomic data from three distinct zones within the NEOM brine pool system (Gulf of Aqaba) to reveal the active viral ecology around the pools. We report a remarkable diversity and activity of viruses infecting microbial hosts in this environment, including giant viruses, RNA viruses, jumbo phages, and Polinton-like viruses. Many of these form distinct clades-suggesting presence of untapped viral diversity in this ecosystem. Brine pool viral communities exhibit zone-specific differences in infection strategy-with lysogeny dominating the bacterial mat further away from the pool's center. We linked viruses to metabolically important prokaryotes-including association between a jumbo phage and a key manganese-oxidizing and arsenic-metabolizing bacterium. These foundational results illuminate the role of viruses in modulating brine pool microbial communities and biogeochemistry through revealing novel viral diversity, host associations, and spatial heterogeneity in viral dynamics.
Collapse
Affiliation(s)
- Benjamin Minch
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, FL 33149, United States
| | - Morgan Chakraborty
- Department of Marine Geosciences, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, FL 33149, United States
| | - Sam Purkis
- Department of Marine Geosciences, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, FL 33149, United States
| | | | - Mohammad Moniruzzaman
- Department of Marine Biology and Ecology, Rosenstiel School of Marine, Atmospheric, and Earth Science, University of Miami, Miami, FL 33149, United States
| |
Collapse
|
29
|
Green DH, Rad-Menéndez C, Campbell C, Kilias ES. The genome sequence of Pycnococcus provasolii (CCAP190/2) (Guillard, 1991). Wellcome Open Res 2023; 8:520. [PMID: 38808318 PMCID: PMC11130579 DOI: 10.12688/wellcomeopenres.20345.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/02/2023] [Indexed: 05/30/2024] Open
Abstract
We present a genome assembly from cultured Pycnococcus provasolii (a marine green alga; Chlorophyta; None; Pseudoscourfieldiales; Pycnococcaceae). The genome sequence is 32.2 megabases in span. Most of the assembly is scaffolded into 44 chromosomal pseudomolecules (99.67%). The mitochondrial and plastid genomes have also been assembled, and the length of the mitochondrial scaffold is 24.3 kilobases and of the plastid genome has been assembled and is 80.2 kilobases in length.
Collapse
Affiliation(s)
- David H. Green
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
| | - Cecilia Rad-Menéndez
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
| | - Christine Campbell
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
| | | | | | | | | | - Darwin Tree of Life Barcoding collective
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
- Department of Biology, University of Oxford, Oxford, England, UK
| | | | | | - Tree of Life Core Informatics collective
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
- Department of Biology, University of Oxford, Oxford, England, UK
| | | |
Collapse
|
30
|
Green DH, Rad-Menéndez C. The genome sequence of the Heterolobosean amoeboflagellate, Tetramitus jugosus CCAP 1588/3C. Wellcome Open Res 2023; 8:513. [PMID: 38774489 PMCID: PMC11106597 DOI: 10.12688/wellcomeopenres.20189.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/13/2023] [Indexed: 05/24/2024] Open
Abstract
We present a genome assembly from cultivated Tetramitus jugosus (Heterolobosea; Schizopyrenida; Vahlkampfiidae). The genome sequence is 26.3 megabases in span. Most of the assembly (99.3%) is scaffolded into 52 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 49.46 kilobases in length.
Collapse
Affiliation(s)
- David H. Green
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
| | - Cecilia Rad-Menéndez
- Culture Collection of Algae and Protozoa, The Scottish Association for Marine Science, Oban, Scotland, UK
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
McGowan J, Kilias ES, Alacid E, Lipscombe J, Jenkins BH, Gharbi K, Kaithakottil GG, Macaulay IC, McTaggart S, Warring SD, Richards TA, Hall N, Swarbreck D. Identification of a non-canonical ciliate nuclear genetic code where UAA and UAG code for different amino acids. PLoS Genet 2023; 19:e1010913. [PMID: 37796765 PMCID: PMC10553269 DOI: 10.1371/journal.pgen.1010913] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 08/10/2023] [Indexed: 10/07/2023] Open
Abstract
The genetic code is one of the most highly conserved features across life. Only a few lineages have deviated from the "universal" genetic code. Amongst the few variants of the genetic code reported to date, the codons UAA and UAG virtually always have the same translation, suggesting that their evolution is coupled. Here, we report the genome and transcriptome sequencing of a novel uncultured ciliate, belonging to the Oligohymenophorea class, where the translation of the UAA and UAG stop codons have changed to specify different amino acids. Genomic and transcriptomic analyses revealed that UAA has been reassigned to encode lysine, while UAG has been reassigned to encode glutamic acid. We identified multiple suppressor tRNA genes with anticodons complementary to the reassigned codons. We show that the retained UGA stop codon is enriched in the 3'UTR immediately downstream of the coding region of genes, suggesting that there is functional drive to maintain tandem stop codons. Using a phylogenomics approach, we reconstructed the ciliate phylogeny and mapped genetic code changes, highlighting the remarkable number of independent genetic code changes within the Ciliophora group of protists. According to our knowledge, this is the first report of a genetic code variant where UAA and UAG encode different amino acids.
Collapse
Affiliation(s)
- Jamie McGowan
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | | | - Elisabet Alacid
- Department of Biology, University of Oxford, Oxford, United Kingdom
| | - James Lipscombe
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | | | - Karim Gharbi
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | | | - Iain C. Macaulay
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Seanna McTaggart
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Sally D. Warring
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | | | - Neil Hall
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
- School of Biological Sciences, University of East Anglia, Norwich, United Kingdom
| | - David Swarbreck
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| |
Collapse
|
32
|
Sapkota S, Burlakoti RR, Lubberts M, Lamour K. Genome resources and whole genome resequencing of Phytophthora rubi isolates from red raspberry. FRONTIERS IN PLANT SCIENCE 2023; 14:1161864. [PMID: 37457337 PMCID: PMC10339809 DOI: 10.3389/fpls.2023.1161864] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 05/15/2023] [Indexed: 07/18/2023]
Abstract
Phytophthora rubi is a primary causal agent of Phytophthora root rot and wilting of raspberry (Rubus idaeus L.) worldwide. The disease is a major concern for raspberry growers in Canada and USA. To date, no information is available on genomic diversity of P. rubi population from raspberry in Canada. Using a PCR-free library prep with dual-indexing for an Illumina HiSEQX running a 2x150 bp configuration, we generated whole genome sequence data of P. rubi isolates (n = 25) recovered during 2018 to 2020 from nine fields, four locations and four cultivars of raspberry growing areas of British Columbia, Canada. The assembled genome of 24 isolates of P. rubi averaged 8,541 scaffolds, 309× coverage, and 65,960,000 bp. We exploited single nucleotide polymorphisms (SNPs) obtained from whole genome sequence data to analyze the genome structure and genetic diversity of the P. rubi isolates. Low heterozygosity among the 72% of pathogen isolates and standardized index of association revealed that those isolates were clonal. Principal component analysis, discriminant analysis of principal component, and phylogenetic tree revealed that P. rubi isolates clustered with the raspberry specific cultivars. This study provides novel resources and insight into genome structure, genetic diversity, and reproductive biology of P rubi isolated from red raspberry. The availability of the P. rubi genomes also provides valuable resources for future comparative genomic and evolutionary studies for oomycetes pathogens.
Collapse
Affiliation(s)
- Sanjib Sapkota
- Agassiz Research and Development Centre, Agriculture and Agri-Food Canada, Agassiz, BC, Canada
| | - Rishi R. Burlakoti
- Agassiz Research and Development Centre, Agriculture and Agri-Food Canada, Agassiz, BC, Canada
| | - Mark Lubberts
- Summerland Research and Development Centre, Agriculture and Agri-Food Canada, Summerland, BC, Canada
| | - Kurt Lamour
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, United States
| |
Collapse
|
33
|
Zhao L, Walkowiak S, Fernando WGD. Artificial Intelligence: A Promising Tool in Exploring the Phytomicrobiome in Managing Disease and Promoting Plant Health. PLANTS (BASEL, SWITZERLAND) 2023; 12:plants12091852. [PMID: 37176910 PMCID: PMC10180744 DOI: 10.3390/plants12091852] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/25/2023] [Accepted: 04/27/2023] [Indexed: 05/15/2023]
Abstract
There is increasing interest in harnessing the microbiome to improve cropping systems. With the availability of high-throughput and low-cost sequencing technologies, gathering microbiome data is becoming more routine. However, the analysis of microbiome data is challenged by the size and complexity of the data, and the incomplete nature of many microbiome databases. Further, to bring microbiome data value, it often needs to be analyzed in conjunction with other complex data that impact on crop health and disease management, such as plant genotype and environmental factors. Artificial intelligence (AI), boosted through deep learning (DL), has achieved significant breakthroughs and is a powerful tool for managing large complex datasets such as the interplay between the microbiome, crop plants, and their environment. In this review, we aim to provide readers with a brief introduction to AI techniques, and we introduce how AI has been applied to areas of microbiome sequencing taxonomy, the functional annotation for microbiome sequences, associating the microbiome community with host traits, designing synthetic communities, genomic selection, field phenotyping, and disease forecasting. At the end of this review, we proposed further efforts that are required to fully exploit the power of AI in studying phytomicrobiomes.
Collapse
Affiliation(s)
- Liang Zhao
- Department of Plant Science, University of Manitoba, Winnipeg, MB R3T 2N2, Canada
| | | | | |
Collapse
|
34
|
Gabrielli M, Dai Z, Delafont V, Timmers PHA, van der Wielen PWJJ, Antonelli M, Pinto AJ. Identifying Eukaryotes and Factors Influencing Their Biogeography in Drinking Water Metagenomes. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:3645-3660. [PMID: 36827617 PMCID: PMC9996835 DOI: 10.1021/acs.est.2c09010] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
The biogeography of eukaryotes in drinking water systems is poorly understood relative to that of prokaryotes or viruses, limiting the understanding of their role and management. A challenge with studying complex eukaryotic communities is that metagenomic analysis workflows are currently not as mature as those that focus on prokaryotes or viruses. In this study, we benchmarked different strategies to recover eukaryotic sequences and genomes from metagenomic data and applied the best-performing workflow to explore the factors affecting the relative abundance and diversity of eukaryotic communities in drinking water distribution systems (DWDSs). We developed an ensemble approach exploiting k-mer- and reference-based strategies to improve eukaryotic sequence identification and identified MetaBAT2 as the best-performing binning approach for their clustering. Applying this workflow to the DWDS metagenomes showed that eukaryotic sequences typically constituted small proportions (i.e., <1%) of the overall metagenomic data with higher relative abundances in surface water-fed or chlorinated systems with high residuals. The α and β diversities of eukaryotes were correlated with those of prokaryotic and viral communities, highlighting the common role of environmental/management factors. Finally, a co-occurrence analysis highlighted clusters of eukaryotes whose members' presence and abundance in DWDSs were affected by disinfection strategies, climate conditions, and source water types.
Collapse
Affiliation(s)
- Marco Gabrielli
- Dipartimento
di Ingegneria Civile e Ambientale—Sezione Ambientale, Politecnico di Milano, Milan 20133, Italy
| | - Zihan Dai
- Research
Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Vincent Delafont
- Laboratoire
Ecologie et Biologie des Interactions (EBI), Equipe Microorganismes,
Hôtes, Environnements, Université
de Poitiers, Poitiers 86073, France
| | - Peer H. A. Timmers
- KWR
Watercycle Research Institute, 3433 PE Nieuwegein, The Netherlands
- Department
of Microbiology, Radboud University, Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands
| | - Paul W. J. J. van der Wielen
- KWR
Watercycle Research Institute, 3433 PE Nieuwegein, The Netherlands
- Laboratory
of Microbiology, Wageningen University, 6700 HB Wageningen, The Netherlands
| | - Manuela Antonelli
- Dipartimento
di Ingegneria Civile e Ambientale—Sezione Ambientale, Politecnico di Milano, Milan 20133, Italy
| | - Ameet J. Pinto
- School
of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| |
Collapse
|
35
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
| |
Collapse
|
36
|
Salazar VW, Shaban B, Quiroga MDM, Turnbull R, Tescari E, Rossetto Marcelino V, Verbruggen H, Lê Cao KA. Metaphor-A workflow for streamlined assembly and binning of metagenomes. Gigascience 2022; 12:giad055. [PMID: 37522759 PMCID: PMC10388702 DOI: 10.1093/gigascience/giad055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 06/05/2023] [Accepted: 07/04/2023] [Indexed: 08/01/2023] Open
Abstract
Recent advances in bioinformatics and high-throughput sequencing have enabled the large-scale recovery of genomes from metagenomes. This has the potential to bring important insights as researchers can bypass cultivation and analyze genomes sourced directly from environmental samples. There are, however, technical challenges associated with this process, most notably the complexity of computational workflows required to process metagenomic data, which include dozens of bioinformatics software tools, each with their own set of customizable parameters that affect the final output of the workflow. At the core of these workflows are the processes of assembly-combining the short-input reads into longer, contiguous fragments (contigs)-and binning, clustering these contigs into individual genome bins. The limitations of assembly and binning algorithms also pose different challenges depending on the selected strategy to execute them. Both of these processes can be done for each sample separately or by pooling together multiple samples to leverage information from a combination of samples. Here we present Metaphor, a fully automated workflow for genome-resolved metagenomics (GRM). Metaphor differs from existing GRM workflows by offering flexible approaches for the assembly and binning of the input data and by combining multiple binning algorithms with a bin refinement step to achieve high-quality genome bins. Moreover, Metaphor generates reports to evaluate the performance of the workflow. We showcase the functionality of Metaphor on different synthetic datasets and the impact of available assembly and binning strategies on the final results.
Collapse
Affiliation(s)
- Vinícius W Salazar
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Parkville, VIC 3052, Victoria, Australia
| | - Babak Shaban
- Melbourne Data Analytics Platform (MDAP), University of Melbourne, Carlton, VIC 3053, Victoria, Australia
| | - Maria del Mar Quiroga
- Melbourne Data Analytics Platform (MDAP), University of Melbourne, Carlton, VIC 3053, Victoria, Australia
| | - Robert Turnbull
- Melbourne Data Analytics Platform (MDAP), University of Melbourne, Carlton, VIC 3053, Victoria, Australia
| | - Edoardo Tescari
- Melbourne Data Analytics Platform (MDAP), University of Melbourne, Carlton, VIC 3053, Victoria, Australia
| | - Vanessa Rossetto Marcelino
- Department of Molecular and Translational Sciences, Monash University, Clayton, VIC 3168, Victoria, Australia
- Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, VIC 3168, Victoria, Australia
- School of BioSciences, University of Melbourne, Parkville, VIC 3052, Victoria, Australia
- Department of Microbiology and Immunology, The University of Melbourne at the Peter Doherty Institute for Infection and Immunity, Parkville, VIC 3052, Victoria, Australia
| | - Heroen Verbruggen
- School of BioSciences, University of Melbourne, Parkville, VIC 3052, Victoria, Australia
| | - Kim-Anh Lê Cao
- Melbourne Integrative Genomics, School of Mathematics & Statistics, University of Melbourne, Parkville, VIC 3052, Victoria, Australia
| |
Collapse
|
37
|
Breusing C, Osborn KJ, Girguis PR, Reese AT. Composition and metabolic potential of microbiomes associated with mesopelagic animals from Monterey Canyon. ISME COMMUNICATIONS 2022; 2:117. [PMID: 37938735 PMCID: PMC9723714 DOI: 10.1038/s43705-022-00195-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 10/24/2022] [Accepted: 10/31/2022] [Indexed: 11/09/2023]
Abstract
There is growing recognition that microbiomes play substantial roles in animal eco-physiology and evolution. To date, microbiome research has largely focused on terrestrial animals, with far fewer studies on aquatic organisms, especially pelagic marine species. Pelagic animals are critical for nutrient cycling, yet are also subject to nutrient limitation and might thus rely strongly on microbiome digestive functions to meet their nutritional requirements. To better understand the composition and metabolic potential of midwater host-associated microbiomes, we applied amplicon and shotgun metagenomic sequencing to eleven mesopelagic animal species. Our analyses reveal that mesopelagic animal microbiomes are typically composed of bacterial taxa from the phyla Proteobacteria, Firmicutes, Bacteroidota and, in some cases, Campylobacterota. Overall, compositional and functional microbiome variation appeared to be primarily governed by host taxon and depth and, to a lesser extent, trophic level and diel vertical migratory behavior, though the impact of host specificity seemed to differ between migrating and non-migrating species. Vertical migrators generally showed lower intra-specific microbiome diversity (i.e., higher host specificity) than their non-migrating counterparts. These patterns were not linked to host phylogeny but may reflect differences in feeding behaviors, microbial transmission mode, environmental adaptations and other ecological traits among groups. The results presented here further our understanding of the factors shaping mesopelagic animal microbiomes and also provide some novel, genetically informed insights into their diets.
Collapse
Affiliation(s)
- Corinna Breusing
- Graduate School of Oceanography, University of Rhode Island, Narragansett, RI, USA
| | - Karen J Osborn
- Smithsonian National Museum of Natural History, Washington, DC, USA
- Monterey Bay Aquarium Research Institute, Moss Landing, CA, USA
| | - Peter R Girguis
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA
| | - Aspen T Reese
- Division of Biological Sciences, University of California San Diego, San Diego, CA, USA.
- Center for Microbiome Innovation, University of California San Diego, San Diego, CA, USA.
| |
Collapse
|
38
|
Espinoza JL, Dupont CL. VEBA: a modular end-to-end suite for in silico recovery, clustering, and analysis of prokaryotic, microeukaryotic, and viral genomes from metagenomes. BMC Bioinformatics 2022; 23:419. [PMID: 36224545 PMCID: PMC9554839 DOI: 10.1186/s12859-022-04973-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 09/27/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND With the advent of metagenomics, the importance of microorganisms and how their interactions are relevant to ecosystem resilience, sustainability, and human health has become evident. Cataloging and preserving biodiversity is paramount not only for the Earth's natural systems but also for discovering solutions to challenges that we face as a growing civilization. Metagenomics pertains to the in silico study of all microorganisms within an ecological community in situ, however, many software suites recover only prokaryotes and have limited to no support for viruses and eukaryotes. RESULTS In this study, we introduce the Viral Eukaryotic Bacterial Archaeal (VEBA) open-source software suite developed to recover genomes from all domains. To our knowledge, VEBA is the first end-to-end metagenomics suite that can directly recover, quality assess, and classify prokaryotic, eukaryotic, and viral genomes from metagenomes. VEBA implements a novel iterative binning procedure and hybrid sample-specific/multi-sample framework that yields more genomes than any existing methodology alone. VEBA includes a consensus microeukaryotic database containing proteins from existing databases to optimize microeukaryotic gene modeling and taxonomic classification. VEBA also provides a unique clustering-based dereplication strategy allowing for sample-specific genomes and genes to be directly compared across non-overlapping biological samples. Finally, VEBA is the only pipeline that automates the detection of candidate phyla radiation bacteria and implements the appropriate genome quality assessments. VEBA's capabilities are demonstrated by reanalyzing 3 existing public datasets which recovered a total of 948 MAGs (458 prokaryotic, 8 eukaryotic, and 482 viral) including several uncharacterized organisms and organisms with no public genome representatives. CONCLUSIONS The VEBA software suite allows for the in silico recovery of microorganisms from all domains of life by integrating cutting edge algorithms in novel ways. VEBA fully integrates both end-to-end and task-specific metagenomic analysis in a modular architecture that minimizes dependencies and maximizes productivity. The contributions of VEBA to the metagenomics community includes seamless end-to-end metagenomics analysis but also provides users with the flexibility to perform specific analytical tasks. VEBA allows for the automation of several metagenomics steps and shows that new information can be recovered from existing datasets.
Collapse
Affiliation(s)
- Josh L. Espinoza
- Department of Environment and Sustainability, J. Craig Venter Institute, 4120 Capricorn Ln, La Jolla, CA 92037 USA
- Department of Human Biology and Genomic Medicine, J. Craig Venter Institute, La Jolla, CA 92037 USA
| | - Chris L. Dupont
- Department of Environment and Sustainability, J. Craig Venter Institute, 4120 Capricorn Ln, La Jolla, CA 92037 USA
- Department of Human Biology and Genomic Medicine, J. Craig Venter Institute, La Jolla, CA 92037 USA
| |
Collapse
|
39
|
Vollmers J, Wiegand S, Lenk F, Kaster AK. How clear is our current view on microbial dark matter? (Re-)assessing public MAG & SAG datasets with MDMcleaner. Nucleic Acids Res 2022; 50:e76. [PMID: 35536293 PMCID: PMC9303271 DOI: 10.1093/nar/gkac294] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 04/11/2022] [Accepted: 04/13/2022] [Indexed: 11/12/2022] Open
Abstract
As of today, the majority of environmental microorganisms remain uncultured and is therefore referred to as 'microbial dark matter' (MDM). Hence, genomic insights into these organisms are limited to cultivation-independent approaches such as single-cell- and metagenomics. However, without access to cultured representatives for verifying correct taxon-assignments, MDM genomes may cause potentially misleading conclusions based on misclassified or contaminant contigs, thereby obfuscating our view on the uncultured microbial majority. Moreover, gradual database contaminations by past genome submissions can cause error propagations which affect present as well as future comparative genome analyses. Consequently, strict contamination detection and filtering need to be applied, especially in the case of uncultured MDM genomes. Current genome reporting standards, however, emphasize completeness over purity and the de facto gold standard genome assessment tool, checkM, discriminates against uncultured taxa and fragmented genomes. To tackle these issues, we present a novel contig classification, screening, and filtering workflow and corresponding open-source python implementation called MDMcleaner, which was tested and compared to other tools on mock and real datasets. MDMcleaner revealed substantial contaminations overlooked by current screening approaches and sensitively detects misattributed contigs in both novel genomes and the underlying reference databases, thereby greatly improving our view on 'microbial dark matter'.
Collapse
Affiliation(s)
- John Vollmers
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Sandra Wiegand
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Florian Lenk
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| | - Anne-Kristin Kaster
- Institute for Biological Interfaces 5 (Institut für Biologische Grenzflächen IBG 5), Karlsruhe Institute of Technology (KIT) 76344, Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
40
|
Pronk LJU, Medema MH. Whokaryote: distinguishing eukaryotic and prokaryotic contigs in metagenomes based on gene structure. Microb Genom 2022; 8. [PMID: 35503723 PMCID: PMC9465069 DOI: 10.1099/mgen.0.000823] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Metagenomics has become a prominent technology to study the functional potential of all organisms in a microbial community. Most studies focus on the bacterial content of these communities, while ignoring eukaryotic microbes. Indeed, many metagenomics analysis pipelines silently assume that all contigs in a metagenome are prokaryotic, likely resulting in less accurate annotation of eukaryotes in metagenomes. Early detection of eukaryotic contigs allows for eukaryote-specific gene prediction and functional annotation. Here, we developed a classifier that distinguishes eukaryotic from prokaryotic contigs based on foundational differences between these taxa in terms of gene structure. We first developed Whokaryote, a random forest classifier that uses intergenic distance, gene density and gene length as the most important features. We show that, with an estimated recall, precision and accuracy of 94, 96 and 95 %, respectively, this classifier with features grounded in biology can perform almost as well as the classifiers EukRep and Tiara, which use k-mer frequencies as features. By retraining our classifier with Tiara predictions as an additional feature, the weaknesses of both types of classifiers are compensated; the result is Whokaryote+Tiara, an enhanced classifier that outperforms all individual classifiers, with an F1 score of 0.99 for both eukaryotes and prokaryotes, while still being fast. In a reanalysis of metagenome data from a disease-suppressive plant endospheric microbial community, we show how using Whokaryote+Tiara to select contigs for eukaryotic gene prediction facilitates the discovery of several biosynthetic gene clusters that were missed in the original study. Whokaryote (+Tiara) is wrapped in an easily installable package and is freely available from https://github.com/LottePronk/whokaryote.
Collapse
Affiliation(s)
- Lotte J U Pronk
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| |
Collapse
|
41
|
Silva JM, Pratas D, Caetano T, Matos S. Feature-Based Classification of Archaeal Sequences Using Compression-Based Methods. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1007/978-3-031-04881-4_25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|