1
|
Valentin-Alvarado LE, Appler KE, De Anda V, Schoelmerich MC, West-Roberts J, Kivenson V, Crits-Christoph A, Ly L, Sachdeva R, Greening C, Savage DF, Baker BJ, Banfield JF. Asgard archaea modulate potential methanogenesis substrates in wetland soil. Nat Commun 2024; 15:6384. [PMID: 39085194 PMCID: PMC11291895 DOI: 10.1038/s41467-024-49872-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 06/20/2024] [Indexed: 08/02/2024] Open
Abstract
The roles of Asgard archaea in eukaryogenesis and marine biogeochemical cycles are well studied, yet their contributions in soil ecosystems remain unknown. Of particular interest are Asgard archaeal contributions to methane cycling in wetland soils. To investigate this, we reconstructed two complete genomes for soil-associated Atabeyarchaeia, a new Asgard lineage, and a complete genome of Freyarchaeia, and predicted their metabolism in situ. Metatranscriptomics reveals expression of genes for [NiFe]-hydrogenases, pyruvate oxidation and carbon fixation via the Wood-Ljungdahl pathway. Also expressed are genes encoding enzymes for amino acid metabolism, anaerobic aldehyde oxidation, hydrogen peroxide detoxification and carbohydrate breakdown to acetate and formate. Overall, soil-associated Asgard archaea are predicted to include non-methanogenic acetogens, highlighting their potential role in carbon cycling in terrestrial environments.
Collapse
Affiliation(s)
- Luis E Valentin-Alvarado
- Innovative Genomics Institute, University of California, Berkeley, California, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
| | - Kathryn E Appler
- Department of Marine Science, University of Texas at Austin; Marine Science Institute, Port Aransas, TX, USA
| | - Valerie De Anda
- Department of Marine Science, University of Texas at Austin; Marine Science Institute, Port Aransas, TX, USA
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA
| | - Marie C Schoelmerich
- Innovative Genomics Institute, University of California, Berkeley, California, USA
- Department of Environmental Systems Sciences; ETH Zürich, Zürich, Switzerland
| | - Jacob West-Roberts
- Environmental Science, Policy and Management, University of California, Berkeley, CA, USA
| | - Veronika Kivenson
- Innovative Genomics Institute, University of California, Berkeley, California, USA
| | - Alexander Crits-Christoph
- Innovative Genomics Institute, University of California, Berkeley, California, USA
- Department of Plant and Microbial Biology, University of California, Berkeley, CA, USA
- Cultivarium, Watertown, MA, USA
| | - Lynn Ly
- Oxford Nanopore Technologies Inc, New York, NY, USA
| | - Rohan Sachdeva
- Innovative Genomics Institute, University of California, Berkeley, California, USA
| | - Chris Greening
- Department of Microbiology, Biomedicine Discovery Institute; Monash University, Clayton, VIC, Australia
- Securing Antarctica's Environmental Future, Monash University, Clayton, VIC, Australia
| | - David F Savage
- Innovative Genomics Institute, University of California, Berkeley, California, USA
- Howard Hughes Medical Institute, University of California, Berkeley, California, USA
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, USA
| | - Brett J Baker
- Department of Marine Science, University of Texas at Austin; Marine Science Institute, Port Aransas, TX, USA.
- Department of Integrative Biology, University of Texas at Austin, Austin, TX, USA.
| | - Jillian F Banfield
- Innovative Genomics Institute, University of California, Berkeley, California, USA.
- Environmental Science, Policy and Management, University of California, Berkeley, CA, USA.
- Department of Microbiology, Biomedicine Discovery Institute; Monash University, Clayton, VIC, Australia.
- Earth and Planetary Science, University of California, Berkeley, CA, USA.
| |
Collapse
|
2
|
Liu C, Wu P, Wu X, Zhao X, Chen F, Cheng X, Zhu H, Wang O, Xu M. AsmMix: an efficient haplotype-resolved hybrid de novo genome assembling pipeline. Front Genet 2024; 15:1421565. [PMID: 39130747 PMCID: PMC11310137 DOI: 10.3389/fgene.2024.1421565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/05/2024] [Indexed: 08/13/2024] Open
Abstract
Accurate haplotyping facilitates distinguishing allele-specific expression, identifying cis-regulatory elements, and characterizing genomic variations, which enables more precise investigations into the relationship between genotype and phenotype. Recent advances in third-generation single-molecule long read and synthetic co-barcoded read sequencing techniques have harnessed long-range information to simplify the assembly graph and improve assembly genomic sequence. However, it remains methodologically challenging to reconstruct the complete haplotypes due to high sequencing error rates of long reads and limited capturing efficiency of co-barcoded reads. We here present a pipeline, AsmMix, for generating both contiguous and accurate diploid genomes. It first assembles co-barcoded reads to generate accurate haplotype-resolved assemblies that may contain many gaps, while the long-read assembly is contiguous but susceptible to errors. Then two assembly sets are integrated into haplotype-resolved assemblies with reduced misassembles. Through extensive evaluation on multiple synthetic datasets, AsmMix consistently demonstrates high precision and recall rates for haplotyping across diverse sequencing platforms, coverage depths, read lengths, and read accuracies, significantly outperforming other existing tools in the field. Furthermore, we validate the effectiveness of our pipeline using a human whole genome dataset (HG002), and produce highly contiguous, accurate, and haplotype-resolved assemblies. These assemblies are evaluated using the GIAB benchmarks, confirming the accuracy of variant calling. Our results demonstrate that AsmMix offers a straightforward yet highly efficient approach that effectively leverages both long reads and co-barcoded reads for haplotype-resolved assembly.
Collapse
Affiliation(s)
- Chao Liu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Pei Wu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Xue Wu
- BGI Research, Shenzhen, China
| | | | | | | | - Hongmei Zhu
- BGI, Tianjin, China
- BGI Research, Shenzhen, China
| | - Ou Wang
- BGI Research, Shenzhen, China
| | - Mengyang Xu
- BGI Research, Shenzhen, China
- BGI Research, Qingdao, China
| |
Collapse
|
3
|
Luan T, Commichaux S, Hoffmann M, Jayeola V, Jang JH, Pop M, Rand H, Luo Y. Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates. BMC Genomics 2024; 25:679. [PMID: 38978005 PMCID: PMC11232133 DOI: 10.1186/s12864-024-10582-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 07/01/2024] [Indexed: 07/10/2024] Open
Abstract
BACKGROUND Oxford Nanopore provides high throughput sequencing platforms able to reconstruct complete bacterial genomes with 99.95% accuracy. However, even small levels of error can obscure the phylogenetic relationships between closely related isolates. Polishing tools have been developed to correct these errors, but it is uncertain if they obtain the accuracy needed for the high-resolution source tracking of foodborne illness outbreaks. RESULTS We tested 132 combinations of assembly and short- and long-read polishing tools to assess their accuracy for reconstructing the genome sequences of 15 highly similar Salmonella enterica serovar Newport isolates from a 2020 onion outbreak. While long-read polishing alone improved accuracy, near perfect accuracy (99.9999% accuracy or ~ 5 nucleotide errors across the 4.8 Mbp genome, excluding low confidence regions) was only obtained by pipelines that combined both long- and short-read polishing tools. Notably, medaka was a more accurate and efficient long-read polisher than Racon. Among short-read polishers, NextPolish showed the highest accuracy, but Pilon, Polypolish, and POLCA performed similarly. Among the 5 best performing pipelines, polishing with medaka followed by NextPolish was the most common combination. Importantly, the order of polishing tools mattered i.e., using less accurate tools after more accurate ones introduced errors. Indels in homopolymers and repetitive regions, where the short reads could not be uniquely mapped, remained the most challenging errors to correct. CONCLUSIONS Short reads are still needed to correct errors in nanopore sequenced assemblies to obtain the accuracy required for source tracking investigations. Our granular assessment of the performance of the polishing pipelines allowed us to suggest best practices for tool users and areas for improvement for tool developers.
Collapse
Affiliation(s)
- Tu Luan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Seth Commichaux
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, Laurel, MD, 20708, USA.
| | - Maria Hoffmann
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Victor Jayeola
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Jae Hee Jang
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Hugh Rand
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| | - Yan Luo
- Center for Food Safety and Applied Nutrition, Food and Drug Administration, College Park, MD, 20740, USA
| |
Collapse
|
4
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
5
|
Kim J, Steinegger M. Metabuli: sensitive and specific metagenomic classification via joint analysis of amino acid and DNA. Nat Methods 2024; 21:971-973. [PMID: 38769467 DOI: 10.1038/s41592-024-02273-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 04/11/2024] [Indexed: 05/22/2024]
Abstract
Metagenomic taxonomic classifiers analyze either DNA or amino acid (AA) sequences. Metabuli ( https://metabuli.steineggerlab.com ), however, jointly analyzes both DNA and AA to leverage AA conservation for sensitive homology detection and DNA mutations for specific differentiation of closely related taxa. In the Critical Assessment of Metagenome Interpretation 2 plant-associated dataset, Metabuli covered 99% and 98% of classifications of state-of-the-art DNA- and AA-based classifiers, respectively.
Collapse
Affiliation(s)
- Jaebeom Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
| | - Martin Steinegger
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Institute of Molecular Biology and Genetics, Seoul National University, Seoul, Republic of Korea.
- Artificial Intelligence Institute, Seoul National University, Seoul, Republic of Korea.
| |
Collapse
|
6
|
Liu-Wei W, van der Toorn W, Bohn P, Hölzer M, Smyth RP, von Kleist M. Sequencing accuracy and systematic errors of nanopore direct RNA sequencing. BMC Genomics 2024; 25:528. [PMID: 38807060 PMCID: PMC11134706 DOI: 10.1186/s12864-024-10440-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/21/2024] [Indexed: 05/30/2024] Open
Abstract
BACKGROUND Direct RNA sequencing (dRNA-seq) on the Oxford Nanopore Technologies (ONT) platforms can produce reads covering up to full-length gene transcripts, while containing decipherable information about RNA base modifications and poly-A tail lengths. Although many published studies have been expanding the potential of dRNA-seq, its sequencing accuracy and error patterns remain understudied. RESULTS We present the first comprehensive evaluation of sequencing accuracy and characterisation of systematic errors in dRNA-seq data from diverse organisms and synthetic in vitro transcribed RNAs. We found that for sequencing kits SQK-RNA001 and SQK-RNA002, the median read accuracy ranged from 87% to 92% across species, and deletions significantly outnumbered mismatches and insertions. Due to their high abundance in the transcriptome, heteropolymers and short homopolymers were the major contributors to the overall sequencing errors. We also observed systematic biases across all species at the levels of single nucleotides and motifs. In general, cytosine/uracil-rich regions were more likely to be erroneous than guanines and adenines. By examining raw signal data, we identified the underlying signal-level features potentially associated with the error patterns and their dependency on sequence contexts. While read quality scores can be used to approximate error rates at base and read levels, failure to detect DNA adapters may be a source of errors and data loss. By comparing distinct basecallers, we reason that some sequencing errors are attributable to signal insufficiency rather than algorithmic (basecalling) artefacts. Lastly, we generated dRNA-seq data using the latest SQK-RNA004 sequencing kit released at the end of 2023 and found that although the overall read accuracy increased, the systematic errors remain largely identical compared to the previous kits. CONCLUSIONS As the first systematic investigation of dRNA-seq errors, this study offers a comprehensive overview of reproducible error patterns across diverse datasets, identifies potential signal-level insufficiency, and lays the foundation for error correction methods.
Collapse
Affiliation(s)
- Wang Liu-Wei
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany.
- International Max-Planck Research School 'Biology and Computation', Max-Planck Institute for Molecular Genetics, Berlin, Germany.
- Department of Mathematics and Computer Science, Freie Universität, Berlin, Germany.
| | - Wiep van der Toorn
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany
- Department of Mathematics and Computer Science, Freie Universität, Berlin, Germany
| | - Patrick Bohn
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
| | - Martin Hölzer
- Genome Competence Center (MF1), Robert Koch Institute, Berlin, Germany
| | - Redmond P Smyth
- Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
- Faculty of Medicine, University of Würzburg, Würzburg, Germany
| | - Max von Kleist
- Systems Medicine of Infectious Disease (P5), Robert Koch Institute, Berlin, Germany.
- Department of Mathematics and Computer Science, Freie Universität, Berlin, Germany.
| |
Collapse
|
7
|
de Jong TV, Pan Y, Rastas P, Munro D, Tutaj M, Akil H, Benner C, Chen D, Chitre AS, Chow W, Colonna V, Dalgard CL, Demos WM, Doris PA, Garrison E, Geurts AM, Gunturkun HM, Guryev V, Hourlier T, Howe K, Huang J, Kalbfleisch T, Kim P, Li L, Mahaffey S, Martin FJ, Mohammadi P, Ozel AB, Polesskaya O, Pravenec M, Prins P, Sebat J, Smith JR, Solberg Woods LC, Tabakoff B, Tracey A, Uliano-Silva M, Villani F, Wang H, Sharp BM, Telese F, Jiang Z, Saba L, Wang X, Murphy TD, Palmer AA, Kwitek AE, Dwinell MR, Williams RW, Li JZ, Chen H. A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats. CELL GENOMICS 2024; 4:100527. [PMID: 38537634 PMCID: PMC11019364 DOI: 10.1016/j.xgen.2024.100527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/26/2023] [Accepted: 02/29/2024] [Indexed: 04/09/2024]
Abstract
The seventh iteration of the reference genome assembly for Rattus norvegicus-mRatBN7.2-corrects numerous misplaced segments and reduces base-level errors by approximately 9-fold and increases contiguity by 290-fold compared with its predecessor. Gene annotations are now more complete, improving the mapping precision of genomic, transcriptomic, and proteomics datasets. We jointly analyzed 163 short-read whole-genome sequencing datasets representing 120 laboratory rat strains and substrains using mRatBN7.2. We defined ∼20.0 million sequence variations, of which 18,700 are predicted to potentially impact the function of 6,677 genes. We also generated a new rat genetic map from 1,893 heterogeneous stock rats and annotated transcription start sites and alternative polyadenylation sites. The mRatBN7.2 assembly, along with the extensive analysis of genomic variations among rat strains, enhances our understanding of the rat genome, providing researchers with an expanded resource for studies involving rats.
Collapse
Affiliation(s)
- Tristan V de Jong
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Yanchao Pan
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Pasi Rastas
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Department of Integrative Structural and Computational Biology, Scripps Research, San Diego, CA, USA
| | - Monika Tutaj
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Huda Akil
- Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI, USA
| | - Chris Benner
- Department of Medicine, University of California San Diego, San Diego, CA, USA
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Vincenza Colonna
- Institute of Genetics and Biophysics, National Research Council, Naples, Italy; Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Clifton L Dalgard
- Department of Anatomy, Physiology & Genetics, The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, USA
| | - Wendy M Demos
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Peter A Doris
- The Brown Foundation Institute of Molecular Medicine, Center for Human Genetics, University of Texas Health Science Center, Houston, TX, USA
| | - Erik Garrison
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Aron M Geurts
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Hakan M Gunturkun
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Victor Guryev
- Genome Structure and Ageing, University of Groningen, UMC, Groningen, the Netherlands
| | - Thibaut Hourlier
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | - Jun Huang
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ted Kalbfleisch
- Gluck Equine Research Center, Department of Veterinary Science, University of Kentucky, Louisville, KY, USA
| | - Panjun Kim
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Ling Li
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Spencer Mahaffey
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fergal J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus in Hinxton, Cambridgeshire, UK
| | - Pejman Mohammadi
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA; Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Ayse Bilge Ozel
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Michal Pravenec
- Institute of Physiology, Czech Academy of Sciences, Prague, Czechia
| | - Pjotr Prins
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Jennifer R Smith
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Section on Molecular Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Boris Tabakoff
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Alan Tracey
- Tree of Life, Wellcome Sanger Institute, Cambridge, UK
| | | | - Flavia Villani
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Hongyang Wang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Burt M Sharp
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Francesca Telese
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| | - Zhihua Jiang
- Department of Animal Sciences, Washington State University, Pullman, WA, USA
| | - Laura Saba
- Department of Pharmaceutical Sciences, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Xusheng Wang
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA; Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, San Diego, CA, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA
| | - Anne E Kwitek
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Melinda R Dwinell
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, USA; Rat Genome Database, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Jun Z Li
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
| | - Hao Chen
- Department of Pharmacology, Addiction Science, and Toxicology, University of Tennessee Health Science Center, Memphis, TN, USA.
| |
Collapse
|
8
|
Cook R, Telatin A, Hsieh SY, Newberry F, Tariq MA, Baker DJ, Carding SR, Adriaenssens EM. Nanopore and Illumina sequencing reveal different viral populations from human gut samples. Microb Genom 2024; 10. [PMID: 38683195 DOI: 10.1099/mgen.0.001236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024] Open
Abstract
The advent of viral metagenomics, or viromics, has improved our knowledge and understanding of global viral diversity. High-throughput sequencing technologies enable explorations of the ecological roles, contributions to host metabolism, and the influence of viruses in various environments, including the human intestinal microbiome. However, bacterial metagenomic studies frequently have the advantage. The adoption of advanced technologies like long-read sequencing has the potential to be transformative in refining viromics and metagenomics. Here, we examined the effectiveness of long-read and hybrid sequencing by comparing Illumina short-read and Oxford Nanopore Technology (ONT) long-read sequencing technologies and different assembly strategies on recovering viral genomes from human faecal samples. Our findings showed that if a single sequencing technology is to be chosen for virome analysis, Illumina is preferable due to its superior ability to recover fully resolved viral genomes and minimise erroneous genomes. While ONT assemblies were effective in recovering viral diversity, the challenges related to input requirements and the necessity for amplification made it less ideal as a standalone solution. However, using a combined, hybrid approach enabled a more authentic representation of viral diversity to be obtained within samples.
Collapse
Affiliation(s)
- Ryan Cook
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | | | | | - Fiona Newberry
- Department of Biosciences, Nottingham Trent University, Nottingham, NG11 8NS, UK
| | - Mohammad A Tariq
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | - Dave J Baker
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
| | - Simon R Carding
- Quadram Institute Bioscience, Norwich, NR4 7UQ, UK
- Norwich Medical School, University of East Anglia, Norwich, NR4 7TJ, UK
| | | |
Collapse
|
9
|
Menzel P. Snakemake workflows for long-read bacterial genome assembly and evaluation. GIGABYTE 2024; 2024:gigabyte116. [PMID: 38591001 PMCID: PMC11000499 DOI: 10.46471/gigabyte.116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 03/22/2024] [Indexed: 04/10/2024] Open
Abstract
With the advancement of long-read sequencing technologies and their increasing use for bacterial genomics, several methods for generating genome assemblies from error-prone long reads have been developed. These are complemented by various tools for assembly polishing using either long reads, short reads, or reference genomes. End users are therefore left with a plethora of possible combinations of programs for obtaining a final trusted assembly. Hence, there is also a need to measure the completeness and accuracy of such assemblies, for which, again, several evaluation methods implemented in various programs are available. In order to automatically run multiple genome assembly and evaluation programs at once, I developed two workflows for the workflow management system Snakemake, which provide end users with an easy-to-run solution for testing various genome assemblies from their sequencing data. Both workflows use the conda packaging system, so there is no need for manual installation of each program. Availability & Implementation The workflows are available as open source software under the MIT license at github.com/pmenzel/ont-assembly-snake and github.com/pmenzel/score-assemblies.
Collapse
Affiliation(s)
- Peter Menzel
- Labor Berlin - Charité Vivantes GmbH, Sylter Str. 2, 13353, Berlin, Germany
| |
Collapse
|
10
|
Ángeles-Argáiz RE, Aguirre-Beltrán LFL, Hernández-Oaxaca D, Quintero-Corrales C, Trujillo-Roldán MA, Castillo-Ramírez S, Garibay-Orijel R. Assembly collapsing versus heterozygosity oversizing: detection of homokaryotic and heterokaryotic Laccaria trichodermophora strains by hybrid genome assembly. Microb Genom 2024; 10:001218. [PMID: 38529901 PMCID: PMC10995626 DOI: 10.1099/mgen.0.001218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Accepted: 03/01/2024] [Indexed: 03/27/2024] Open
Abstract
Genome assembly and annotation using short-paired reads is challenging for eukaryotic organisms due to their large size, variable ploidy and large number of repetitive elements. However, the use of single-molecule long reads improves assembly quality (completeness and contiguity), but haplotype duplications still pose assembly challenges. To address the effect of read length on genome assembly quality, gene prediction and annotation, we compared genome assemblers and sequencing technologies with four strains of the ectomycorrhizal fungus Laccaria trichodermophora. By analysing the predicted repertoire of carbohydrate enzymes, we investigated the effects of assembly quality on functional inferences. Libraries were generated using three different sequencing platforms (Illumina Next-Seq, Mi-Seq and PacBio Sequel), and genomes were assembled using single and hybrid assemblies/libraries. Long reads or hybrid assemby resolved the collapsing of repeated regions, but the nuclear heterozygous versions remained unresolved. In dikaryotic fungi, each cell includes two nuclei and each nucleus has differences not only in allelic gene version but also in gene composition and synteny. These heterokaryotic cells produce fragmentation and size overestimation of the genome assembly of each nucleus. Hybrid assembly revealed a wider functional diversity of genomes. Here, several predicted oxidizing activities on glycosyl residues of oligosaccharides and several chitooligosaccharide acetylase activities would have passed unnoticed in short-read assemblies. Also, the size and fragmentation of the genome assembly, in combination with heterozygosity analysis, allowed us to distinguish homokaryotic and heterokaryotic strains isolated from L. trichodermophora fruit bodies.
Collapse
Affiliation(s)
- Rodolfo Enrique Ángeles-Argáiz
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Circuito de los Posgrados s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Red de Manejo Biotecnológico de Recursos, Instituto de Ecología A. C. Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz, México, C.P. 91612, Mexico
| | - Luis Fernando Lozano Aguirre-Beltrán
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
| | - Diana Hernández-Oaxaca
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
- Red de Biodiversidad y Sistemática, Instituto de Ecología A. C. Carretera antigua a Coatepec 351, Col. El Haya, Xalapa, Veracruz, México, C.P. 91073, Mexico
| | - Christian Quintero-Corrales
- Posgrado en Ciencias Biológicas, Universidad Nacional Autónoma de México, Circuito de los Posgrados s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
| | - Mauricio A. Trujillo-Roldán
- Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
- Centro de Nanociencias y Nanotecnología, Universidad Nacional Autónoma de México, Km 107 carretera Tijuana-Ensenada, Ensenada, Baja California, Mexico, C.P. 22860, Mexico
| | - Santiago Castillo-Ramírez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos, México, C.P. 62210, Mexico
| | - Roberto Garibay-Orijel
- Instituto de Biología, Universidad Nacional Autónoma de México, Tercer Circuito s/n, Ciudad Universitaria, Delegación Coyoacán, Ciudad de México, México, C.P. 04510, Mexico
| |
Collapse
|
11
|
Li O, Hackney JA, Choy DF, Chang D, Nersesian R, Staton TL, Cai F, Toghi Eshghi S. A targeted amplicon next-generation sequencing assay for tryptase genotyping to support personalized therapy in mast cell-related disorders. PLoS One 2024; 19:e0291947. [PMID: 38335181 PMCID: PMC10857577 DOI: 10.1371/journal.pone.0291947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 09/09/2023] [Indexed: 02/12/2024] Open
Abstract
Tryptase, the most abundant mast cell granule protein, is elevated in severe asthma patients independent of type 2 inflammation status. Higher active β tryptase allele counts are associated with higher levels of peripheral tryptase and lower clinical benefit from anti-IgE therapies. Tryptase is a therapeutic target of interest in severe asthma and chronic spontaneous urticaria. Active and inactive allele counts may enable stratification to assess response to therapies in asthmatic patient subpopulations. Tryptase gene loci TPSAB1 and TPSB2 have high levels of sequence identity, which makes genotyping a challenging task. Here, we report a targeted next-generation sequencing (NGS) assay and downstream bioinformatics analysis for determining polymorphisms at tryptase TPSAB1 and TPSB2 loci. Machine learning modeling using multiple polymorphisms in the tryptase loci was used to improve the accuracy of genotyping calls. The assay was tested and qualified on DNA extracted from whole blood of healthy donors and asthma patients, achieving accuracy of 96%, 96% and 94% for estimation of inactive α and βΙΙΙFS tryptase alleles and α duplication on TPSAB1, respectively. The reported NGS assay is a cost-effective method that is more efficient than Sanger sequencing and provides coverage to evaluate known as well as unreported tryptase polymorphisms.
Collapse
Affiliation(s)
- Olga Li
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - Jason A. Hackney
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - David F. Choy
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - Diana Chang
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - Rhea Nersesian
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - Tracy L. Staton
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - Fang Cai
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| | - Shadi Toghi Eshghi
- Genentech Research and Early Development, Genentech, Inc, South San Francisco, CA, United States of America
| |
Collapse
|
12
|
Silva-Pereira TT, Soler-Camargo NC, Guimarães AMS. Diversification of gene content in the Mycobacterium tuberculosis complex is determined by phylogenetic and ecological signatures. Microbiol Spectr 2024; 12:e0228923. [PMID: 38230932 PMCID: PMC10871547 DOI: 10.1128/spectrum.02289-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 12/19/2023] [Indexed: 01/18/2024] Open
Abstract
We analyzed the pan-genome and gene content modulation of the most diverse genome data set of the Mycobacterium tuberculosis complex (MTBC) gathered to date. The closed pan-genome of the MTBC was characterized by reduced accessory and strain-specific genomes, compatible with its clonal nature. However, significantly fewer gene families were shared between MTBC genomes as their phylogenetic distance increased. This effect was only observed in inter-species comparisons, not within-species, which suggests that species-specific ecological characteristics are associated with changes in gene content. Gene loss, resulting from genomic deletions and pseudogenization, was found to drive the variation in gene content. This gene erosion differed among MTBC species and lineages, even within M. tuberculosis, where L2 showed more gene loss than L4. We also show that phylogenetic proximity is not always a good proxy for gene content relatedness in the MTBC, as the gene repertoire of Mycobacterium africanum L6 deviated from its expected phylogenetic niche conservatism. Gene disruptions of virulence factors, represented by pseudogene annotations, are mostly not conserved, being poor predictors of MTBC ecotypes. Each MTBC ecotype carries its own accessory genome, likely influenced by distinct selective pressures such as host and geography. It is important to investigate how gene loss confer new adaptive traits to MTBC strains; the detected heterogeneous gene loss poses a significant challenge in elucidating genetic factors responsible for the diverse phenotypes observed in the MTBC. By detailing specific gene losses, our study serves as a resource for researchers studying the MTBC phenotypes and their immune evasion strategies.IMPORTANCEIn this study, we analyzed the gene content of different ecotypes of the Mycobacterium tuberculosis complex (MTBC), the pathogens of tuberculosis. We found that changes in their gene content are associated with their ecological features, such as host preference. Gene loss was identified as the primary driver of these changes, which can vary even among different strains of the same ecotype. Our study also revealed that the gene content relatedness of these bacteria does not always mirror their evolutionary relationships. In addition, some genes of virulence can be variably lost among strains of the same MTBC ecotype, likely helping them to evade the immune system. Overall, our study highlights the importance of understanding how gene loss can lead to new adaptations in these bacteria and how different selective pressures may influence their genetic makeup.
Collapse
Affiliation(s)
- Taiana Tainá Silva-Pereira
- Laboratory of Applied Research in Mycobacteria, Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
| | - Naila Cristina Soler-Camargo
- Laboratory of Applied Research in Mycobacteria, Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
- Department of Preventive Veterinary Medicine and Animal Health, School of Veterinary Medicine and Animal Sciences, University of São Paulo, São Paulo, Brazil
| | - Ana Marcia Sá Guimarães
- Laboratory of Applied Research in Mycobacteria, Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo, São Paulo, Brazil
| |
Collapse
|
13
|
Cook R, Brown N, Rihtman B, Michniewski S, Redgwell T, Clokie M, Stekel DJ, Chen Y, Scanlan DJ, Hobman JL, Nelson A, Jones MA, Smith D, Millard A. The long and short of it: benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies. Microb Genom 2024; 10:001198. [PMID: 38376377 PMCID: PMC10926689 DOI: 10.1099/mgen.0.001198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Accepted: 01/25/2024] [Indexed: 02/21/2024] Open
Abstract
Viral metagenomics has fuelled a rapid change in our understanding of global viral diversity and ecology. Long-read sequencing and hybrid assembly approaches that combine long- and short-read technologies are now being widely implemented in bacterial genomics and metagenomics. However, the use of long-read sequencing to investigate viral communities is still in its infancy. While Nanopore and PacBio technologies have been applied to viral metagenomics, it is not known to what extent different technologies will impact the reconstruction of the viral community. Thus, we constructed a mock bacteriophage community of previously sequenced phage genomes and sequenced them using Illumina, Nanopore and PacBio sequencing technologies and tested a number of different assembly approaches. When using a single sequencing technology, Illumina assemblies were the best at recovering phage genomes. Nanopore- and PacBio-only assemblies performed poorly in comparison to Illumina in both genome recovery and error rates, which both varied with the assembler used. The best Nanopore assembly had errors that manifested as SNPs and INDELs at frequencies 41 and 157 % higher than found in Illumina only assemblies, respectively. While the best PacBio assemblies had SNPs at frequencies 12 and 78 % higher than found in Illumina-only assemblies, respectively. Despite high-read coverage, long-read-only assemblies recovered a maximum of one complete genome from any assembly, unless reads were down-sampled prior to assembly. Overall the best approach was assembly by a combination of Illumina and Nanopore reads, which reduced error rates to levels comparable with short-read-only assemblies. When using a single technology, Illumina only was the best approach. The differences in genome recovery and error rates between technology and assembler had downstream impacts on gene prediction, viral prediction, and subsequent estimates of diversity within a sample. These findings will provide a starting point for others in the choice of reads and assembly algorithms for the analysis of viromes.
Collapse
Affiliation(s)
- Ryan Cook
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
| | - Nathan Brown
- Centre for Phage Research, Dept Genetics and Genome Biology, University of Leicester, University Road, Leicester, Leicestershire, LE1 7RH, UK
| | - Branko Rihtman
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Slawomir Michniewski
- Warwick Medical School, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Tamsin Redgwell
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820, Gentofte, Denmark
| | - Martha Clokie
- Centre for Phage Research, Dept Genetics and Genome Biology, University of Leicester, University Road, Leicester, Leicestershire, LE1 7RH, UK
| | - Dov J. Stekel
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
- Department of Mathematics and Applied Mathematics, University of Johannesburg, Rossmore 2029, South Africa
| | - Yin Chen
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - David J. Scanlan
- School of Life Sciences, University of Warwick, Gibbet Hill Road, Coventry, CV4 7AL, UK
| | - Jon L. Hobman
- School of Biosciences, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
| | - Andrew Nelson
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | - Michael A. Jones
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, College Road, Loughborough, Leicestershire, LE12 5RD, UK
| | - Darren Smith
- Faculty of Health and Life Sciences, University of Northumbria, Newcastle upon Tyne, NE1 8ST, UK
| | - Andrew Millard
- Centre for Phage Research, Dept Genetics and Genome Biology, University of Leicester, University Road, Leicester, Leicestershire, LE1 7RH, UK
| |
Collapse
|
14
|
Cerk K, Ugalde‐Salas P, Nedjad CG, Lecomte M, Muller C, Sherman DJ, Hildebrand F, Labarthe S, Frioux C. Community-scale models of microbiomes: Articulating metabolic modelling and metagenome sequencing. Microb Biotechnol 2024; 17:e14396. [PMID: 38243750 PMCID: PMC10832553 DOI: 10.1111/1751-7915.14396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/27/2023] [Accepted: 12/20/2023] [Indexed: 01/21/2024] Open
Abstract
Building models is essential for understanding the functions and dynamics of microbial communities. Metabolic models built on genome-scale metabolic network reconstructions (GENREs) are especially relevant as a means to decipher the complex interactions occurring among species. Model reconstruction increasingly relies on metagenomics, which permits direct characterisation of naturally occurring communities that may contain organisms that cannot be isolated or cultured. In this review, we provide an overview of the field of metabolic modelling and its increasing reliance on and synergy with metagenomics and bioinformatics. We survey the means of assigning functions and reconstructing metabolic networks from (meta-)genomes, and present the variety and mathematical fundamentals of metabolic models that foster the understanding of microbial dynamics. We emphasise the characterisation of interactions and the scaling of model construction to large communities, two important bottlenecks in the applicability of these models. We give an overview of the current state of the art in metagenome sequencing and bioinformatics analysis, focusing on the reconstruction of genomes in microbial communities. Metagenomics benefits tremendously from third-generation sequencing, and we discuss the opportunities of long-read sequencing, strain-level characterisation and eukaryotic metagenomics. We aim at providing algorithmic and mathematical support, together with tool and application resources, that permit bridging the gap between metagenomics and metabolic modelling.
Collapse
Affiliation(s)
- Klara Cerk
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | | | - Chabname Ghassemi Nedjad
- Inria, University of Bordeaux, INRAETalenceFrance
- University of Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800TalenceFrance
| | - Maxime Lecomte
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE STLO¸University of RennesRennesFrance
| | | | | | - Falk Hildebrand
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Simon Labarthe
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE, University of Bordeaux, BIOGECO, UMR 1202CestasFrance
| | | |
Collapse
|
15
|
Cui FJ, Fu X, Sun L, Zan XY, Meng LJ, Sun WJ. Recent insights into glucans biosynthesis and engineering strategies in edible fungi. Crit Rev Biotechnol 2023:1-18. [PMID: 38105513 DOI: 10.1080/07388551.2023.2289341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Accepted: 04/21/2023] [Indexed: 12/19/2023]
Abstract
Fungal α/β-glucans have significant importance in cellular functions including cell wall structure, host-pathogen interactions and energy storage, and wide application in high-profile fields, including food, nutrition, and pharmaceuticals. Fungal species and their growth/developmental stages result in a diversity of glucan contents, structures and bioactivities. Substantial progresses have been made to elucidate the fine structures and functions, and reveal the potential molecular synthesis pathway of fungal α/β-glucans. Herein, we review the current knowledge about the biosynthetic machineries, including: precursor UDP-glucose synthesis, initiation, elongation/termination and remodeling of α/β-glucan chains, and molecular regulation to maximally produce glucans in edible fungi. This review would provide future perspectives to biosynthesize the targeted glucans and reveal the catalytic mechanism of enzymes associated with glucan synthesis, including: UDP-glucose pyrophosphate phosphorylases (UGP), glucan synthases, and glucanosyltransferases in edible fungi.
Collapse
Affiliation(s)
- Feng-Jie Cui
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, P. R. China
- Jiangxi Provincial Engineering and Technology Center for Food Additives Bio-production, Dexing, P. R. China
| | - Xin Fu
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, P. R. China
| | - Lei Sun
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, P. R. China
| | - Xin-Yi Zan
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, P. R. China
| | - Li-Juan Meng
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, P. R. China
| | - Wen-Jing Sun
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang, P. R. China
- Jiangxi Provincial Engineering and Technology Center for Food Additives Bio-production, Dexing, P. R. China
| |
Collapse
|
16
|
Nestor BJ, Bayer PE, Fernandez CGT, Edwards D, Finnegan PM. Approaches to increase the validity of gene family identification using manual homology search tools. Genetica 2023; 151:325-338. [PMID: 37817002 PMCID: PMC10692271 DOI: 10.1007/s10709-023-00196-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 10/01/2023] [Indexed: 10/12/2023]
Abstract
Identifying homologs is an important process in the analysis of genetic patterns underlying traits and evolutionary relationships among species. Analysis of gene families is often used to form and support hypotheses on genetic patterns such as gene presence, absence, or functional divergence which underlie traits examined in functional studies. These analyses often require precise identification of all members in a targeted gene family. Manual pipelines where homology search and orthology assignment tools are used separately are the most common approach for identifying small gene families where accurate identification of all members is important. The ability to curate sequences between steps in manual pipelines allows for simple and precise identification of all possible gene family members. However, the validity of such manual pipeline analyses is often decreased by inappropriate approaches to homology searches including too relaxed or stringent statistical thresholds, inappropriate query sequences, homology classification based on sequence similarity alone, and low-quality proteome or genome sequences. In this article, we propose several approaches to mitigate these issues and allow for precise identification of gene family members and support for hypotheses linking genetic patterns to functional traits.
Collapse
Affiliation(s)
- Benjamin J Nestor
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia.
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia.
| | - Philipp E Bayer
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Cassandria G Tay Fernandez
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - David Edwards
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| | - Patrick M Finnegan
- School of Biological Sciences, University of Western Australia, Perth, WA, 6009, Australia
- Centre for Applied Bioinformatics, University of Western Australia, Perth, WA, 6009, Australia
| |
Collapse
|
17
|
Koo H, Lee GW, Ko SR, Go S, Kwon SY, Kim YM, Shin AY. Two long read-based genome assembly and annotation of polyploidy woody plants, Hibiscus syriacus L. using PacBio and Nanopore platforms. Sci Data 2023; 10:713. [PMID: 37853021 PMCID: PMC10584963 DOI: 10.1038/s41597-023-02631-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/11/2023] [Indexed: 10/20/2023] Open
Abstract
Improvements in long read DNA sequencing and related techniques facilitated the generation of complex eukaryotic genomes. Despite these advances, the quality of constructed plant reference genomes remains relatively poor due to the large size of genomes, high content of repetitive sequences, and wide variety of ploidy. Here, we developed the de novo sequencing and assembly of high polyploid plant genome, Hibiscus syriacus, a flowering plant species of the Malvaceae family, using the Oxford Nanopore Technologies and Pacific Biosciences Sequel sequencing platforms. We investigated an efficient combination of high-quality and high-molecular-weight DNA isolation procedure and suitable assembler to achieve optimal results using long read sequencing data. We found that abundant ultra-long reads allow for large and complex polyploid plant genome assemblies with great recovery of repetitive sequences and error correction even at relatively low depth Nanopore sequencing data and polishing compared to previous studies. Collectively, our combination provides cost effective methods to improve genome continuity and quality compared to the previously reported reference genome by accessing highly repetitive regions. The application of this combination may enable genetic research and breeding of polyploid crops, thus leading to improvements in crop production.
Collapse
Affiliation(s)
- Hyunjin Koo
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
| | - Gir-Won Lee
- SML Genetree Co. Ltd., Seoul, 05855, Republic of Korea
| | - Seo-Rin Ko
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Biosystems and Bioengineering Program, University of Science and Technology, Daejeon, 34113, Korea
| | - Sangjin Go
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Biosystems and Bioengineering Program, University of Science and Technology, Daejeon, 34113, Korea
| | - Suk-Yoon Kwon
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea
- Biosystems and Bioengineering Program, University of Science and Technology, Daejeon, 34113, Korea
| | - Yong-Min Kim
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioinformatics, KRIBB School of Bioscience, Korea University of Science and Technology (UST), Daejeon, 34141, Republic of Korea.
- Digital Biotech Innovation Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
| | - Ah-Young Shin
- Plant Systems Engineering Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon, 34141, Republic of Korea.
- Department of Bioinformatics, KRIBB School of Bioscience, Korea University of Science and Technology (UST), Daejeon, 34141, Republic of Korea.
| |
Collapse
|
18
|
Li K, Xu P, Wang J, Yi X, Jiao Y. Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement. Nat Commun 2023; 14:6556. [PMID: 37848433 PMCID: PMC10582259 DOI: 10.1038/s41467-023-42336-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 10/05/2023] [Indexed: 10/19/2023] Open
Abstract
Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.
Collapse
Affiliation(s)
- Kunpeng Li
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Peng Xu
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jinpeng Wang
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xin Yi
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China
- China National Botanical Garden, Beijing, China
| | - Yuannian Jiao
- State Key Laboratory of Plant Diversity and Specialty Crops, Institute of Botany, the Chinese Academy of Sciences, Beijing, China.
- University of Chinese Academy of Sciences, Beijing, China.
- China National Botanical Garden, Beijing, China.
| |
Collapse
|
19
|
Mochizuki T, Sakamoto M, Tanizawa Y, Nakayama T, Tanifuji G, Kamikawa R, Nakamura Y. A practical assembly guideline for genomes with various levels of heterozygosity. Brief Bioinform 2023; 24:bbad337. [PMID: 37798248 PMCID: PMC10555665 DOI: 10.1093/bib/bbad337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/06/2023] [Accepted: 09/03/2023] [Indexed: 10/07/2023] Open
Abstract
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.
Collapse
Affiliation(s)
| | - Mika Sakamoto
- Genome Informatics Laboratory, National Institute of Genetics
| | | | - Takuro Nakayama
- Division of Life Sciences Center for Computational Sciences, University of Tsukuba, Japan
| | - Goro Tanifuji
- Department of Zoology, National Museum of Nature and Science
| | | | | |
Collapse
|
20
|
Wang J, Veldsman WP, Fang X, Huang Y, Xie X, Lyu A, Zhang L. Benchmarking multi-platform sequencing technologies for human genome assembly. Brief Bioinform 2023; 24:bbad300. [PMID: 37594299 DOI: 10.1093/bib/bbad300] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/12/2023] [Accepted: 07/26/2023] [Indexed: 08/19/2023] Open
Abstract
Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Werner Pieter Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | | | | | | | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
21
|
Yu R, Abdullah SMU, Sun Y. HMMPolish: a coding region polishing tool for TGS-sequenced RNA viruses. Brief Bioinform 2023; 24:bbad264. [PMID: 37478372 PMCID: PMC10516367 DOI: 10.1093/bib/bbad264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/05/2023] [Accepted: 06/29/2023] [Indexed: 07/23/2023] Open
Abstract
Access to accurate viral genomes is important to downstream data analysis. Third-generation sequencing (TGS) has recently become a popular platform for virus sequencing because of its long read length. However, its per-base error rate, which is higher than next-generation sequencing, can lead to genomes with errors. Polishing tools are thus needed to correct errors either before or after sequence assembly. Despite promising results of available polishing tools, there is still room to improve the error correction performance to perform more accurate genome assembly. The errors, particularly those in coding regions, can hamper analysis such as linage identification and variant monitoring. In this work, we developed a novel pipeline, HMMPolish, for correcting (polishing) errors in protein-coding regions of known RNA viruses. This tool can be applied to either raw TGS reads or the assembled sequences of the target virus. By utilizing profile Hidden Markov Models of protein families/domains in known viruses, HMMPolish can correct errors that are ignored by available polishers. We extensively validated HMMPolish on 34 datasets that covered four clinically important viruses, including HIV-1, influenza-A, norovirus, and severe acute respiratory syndrome coronavirus 2. These datasets contain reads with different properties, such as sequencing depth and platforms (PacBio or Nanopore). The benchmark results against popular/representative polishers show that HMMPolish competes favorably on error correction in coding regions of known RNA viruses.
Collapse
Affiliation(s)
- Runzhou Yu
- Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| | | | - Yanni Sun
- Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China
| |
Collapse
|
22
|
Baker JL. Illuminating the oral microbiome and its host interactions: recent advancements in omics and bioinformatics technologies in the context of oral microbiome research. FEMS Microbiol Rev 2023; 47:fuad051. [PMID: 37667515 PMCID: PMC10503653 DOI: 10.1093/femsre/fuad051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/02/2023] [Accepted: 09/01/2023] [Indexed: 09/06/2023] Open
Abstract
The oral microbiota has an enormous impact on human health, with oral dysbiosis now linked to many oral and systemic diseases. Recent advancements in sequencing, mass spectrometry, bioinformatics, computational biology, and machine learning are revolutionizing oral microbiome research, enabling analysis at an unprecedented scale and level of resolution using omics approaches. This review contains a comprehensive perspective of the current state-of-the-art tools available to perform genomics, metagenomics, phylogenomics, pangenomics, transcriptomics, proteomics, metabolomics, lipidomics, and multi-omics analysis on (all) microbiomes, and then provides examples of how the techniques have been applied to research of the oral microbiome, specifically. Key findings of these studies and remaining challenges for the field are highlighted. Although the methods discussed here are placed in the context of their contributions to oral microbiome research specifically, they are pertinent to the study of any microbiome, and the intended audience of this includes researchers would simply like to get an introduction to microbial omics and/or an update on the latest omics methods. Continued research of the oral microbiota using omics approaches is crucial and will lead to dramatic improvements in human health, longevity, and quality of life.
Collapse
Affiliation(s)
- Jonathon L Baker
- Department of Oral Rehabilitation & Biosciences, School of Dentistry, Oregon Health & Science University, 3181 Sam Jackson Park Road, Portland, OR 97202, United States
- Genomic Medicine Group, J. Craig Venter Institute, La Jolla, CA 92037, United States
- Department of Pediatrics, UC San Diego School of Medicine, La Jolla, CA 92093, United States
| |
Collapse
|
23
|
Arredondo-Alonso S, Gladstone R, Pöntinen A, Gama J, Schürch A, Lanza V, Johnsen P, Samuelsen Ø, Tonkin-Hill G, Corander J. Mge-cluster: a reference-free approach for typing bacterial plasmids. NAR Genom Bioinform 2023; 5:lqad066. [PMID: 37435357 PMCID: PMC10331934 DOI: 10.1093/nargab/lqad066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 06/08/2023] [Accepted: 06/26/2023] [Indexed: 07/13/2023] Open
Abstract
Extrachromosomal elements of bacterial cells such as plasmids are notorious for their importance in evolution and adaptation to changing ecology. However, high-resolution population-wide analysis of plasmids has only become accessible recently with the advent of scalable long-read sequencing technology. Current typing methods for the classification of plasmids remain limited in their scope which motivated us to develop a computationally efficient approach to simultaneously recognize novel types and classify plasmids into previously identified groups. Here, we introduce mge-cluster that can easily handle thousands of input sequences which are compressed using a unitig representation in a de Bruijn graph. Our approach offers a faster runtime than existing algorithms, with moderate memory usage, and enables an intuitive visualization, classification and clustering scheme that users can explore interactively within a single framework. Mge-cluster platform for plasmid analysis can be easily distributed and replicated, enabling a consistent labelling of plasmids across past, present, and future sequence collections. We underscore the advantages of our approach by analysing a population-wide plasmid data set obtained from the opportunistic pathogen Escherichia coli, studying the prevalence of the colistin resistance gene mcr-1.1 within the plasmid population, and describing an instance of resistance plasmid transmission within a hospital environment.
Collapse
Affiliation(s)
| | | | - Anna K Pöntinen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
| | - João A Gama
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Anita C Schürch
- Department of Medical Microbiology, UMC Utrecht, Utrecht, The Netherlands
| | - Val F Lanza
- CIBERINFEC, Madrid, Spain
- Bioinformatics Unit, University Hospital Ramón y Cajal, IRYCIS, Madrid, Spain
| | - Pål Jarle Johnsen
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Ørjan Samuelsen
- Norwegian National Advisory Unit on Detection of Antimicrobial Resistance, Department of Microbiology and Infection Control, University Hospital of North Norway, Tromsø, Norway
- Department of Pharmacy, Faculty of Health Sciences, UiT The Arctic University of Norway, Tromsø, Norway
| | - Gerry Tonkin-Hill
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, UK
| | - Jukka Corander
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Parasites and Microbes, Wellcome Sanger Institute, Cambridge, UK
- Department of Mathematics and Statistics, Helsinki Institute of Information Technology (HIIT), FI-00014 University of Helsinki, Helsinki, Finland
| |
Collapse
|
24
|
Zhang Z, Li C, Li Q, Su X, Li J, Zhu L, Lin XJ, Shen J. Structure prediction of novel isoforms from uveal melanoma by AlphaFold. Sci Data 2023; 10:513. [PMID: 37542084 PMCID: PMC10403560 DOI: 10.1038/s41597-023-02429-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 07/28/2023] [Indexed: 08/06/2023] Open
Abstract
Alternative splicing is an important mechanism that enhances protein functional diversity. To date, our understanding of alternative splicing variants has been based on mRNA transcript data, but due to the difficulty in predicting protein structures, protein tertiary structures have been largely unexplored. However, with the release of AlphaFold, which predicts three-dimensional models of proteins, this challenge is rapidly being overcome. Here, we present a dataset of 315 predicted structures of abnormal isoforms in 18 uveal melanoma patients based on second- and third-generation transcriptome-sequencing data. This information comprises a high-quality set of structural data on recurrent aberrant isoforms that can be used in multiple types of studies, from those aimed at revealing potential therapeutic targets to those aimed at recognizing of cancer neoantigens at the atomic level.
Collapse
Affiliation(s)
- Zhe Zhang
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, 200025, China.
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Chen Li
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Qian Li
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, 200025, China
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xiaoming Su
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jiayi Li
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences, School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Lili Zhu
- Songjiang Research Institute and Songjiang Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201600, China
| | - Xinhua James Lin
- High Performance Computing Center, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Jianfeng Shen
- Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
- Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, 200025, China.
- Institute of Translational Medicine, National Facility for Translational Medicine, Shanghai Jiao Tong University, Shanghai, 200240, China.
| |
Collapse
|
25
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
26
|
Vuruputoor VS, Monyak D, Fetter KC, Webster C, Bhattarai A, Shrestha B, Zaman S, Bennett J, McEvoy SL, Caballero M, Wegrzyn JL. Welcome to the big leaves: Best practices for improving genome annotation in non-model plant genomes. APPLICATIONS IN PLANT SCIENCES 2023; 11:e11533. [PMID: 37601314 PMCID: PMC10439824 DOI: 10.1002/aps3.11533] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Revised: 02/04/2023] [Accepted: 02/10/2023] [Indexed: 08/22/2023]
Abstract
Premise Robust standards to evaluate quality and completeness are lacking in eukaryotic structural genome annotation, as genome annotation software is developed using model organisms and typically lacks benchmarking to comprehensively evaluate the quality and accuracy of the final predictions. The annotation of plant genomes is particularly challenging due to their large sizes, abundant transposable elements, and variable ploidies. This study investigates the impact of genome quality, complexity, sequence read input, and method on protein-coding gene predictions. Methods The impact of repeat masking, long-read and short-read inputs, and de novo and genome-guided protein evidence was examined in the context of the popular BRAKER and MAKER workflows for five plant genomes. The annotations were benchmarked for structural traits and sequence similarity. Results Benchmarks that reflect gene structures, reciprocal similarity search alignments, and mono-exonic/multi-exonic gene counts provide a more complete view of annotation accuracy. Transcripts derived from RNA-read alignments alone are not sufficient for genome annotation. Gene prediction workflows that combine evidence-based and ab initio approaches are recommended, and a combination of short and long reads can improve genome annotation. Adding protein evidence from de novo assemblies, genome-guided transcriptome assemblies, or full-length proteins from OrthoDB generates more putative false positives as implemented in the current workflows. Post-processing with functional and structural filters is highly recommended. Discussion While the annotation of non-model plant genomes remains complex, this study provides recommendations for inputs and methodological approaches. We discuss a set of best practices to generate an optimal plant genome annotation and present a more robust set of metrics to evaluate the resulting predictions.
Collapse
Affiliation(s)
- Vidya S. Vuruputoor
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Daniel Monyak
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Karl C. Fetter
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Cynthia Webster
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Akriti Bhattarai
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Bikash Shrestha
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Sumaira Zaman
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Jeremy Bennett
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Susan L. McEvoy
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Madison Caballero
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| | - Jill L. Wegrzyn
- Department of Ecology and Evolutionary BiologyUniversity of ConnecticutStorrsConnecticut06269USA
| |
Collapse
|
27
|
Velasco-Amo MP, Arias-Giraldo LF, Román-Écija M, Fuente LDL, Marco-Noales E, Moralejo E, Navas-Cortés JA, Landa BB. Complete Circularized Genome Resources of Seven Strains of Xylella fastidiosa subsp. fastidiosa Using Hybrid Assembly Reveals Unknown Plasmids. PHYTOPATHOLOGY 2023; 113:1128-1132. [PMID: 36441872 DOI: 10.1094/phyto-10-22-0396-a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Xylella fastidiosa is a vascular plant pathogenic bacterium native to the Americas that is causing significant epidemics and economic losses in olive and almonds in Europe, where it is a quarantine pathogen. Since its first detection in 2013 in Italy, mandatory surveys across Europe revealed the presence of the bacterium also in France, Spain, and Portugal. Combining Oxford Nanopore Technologies and Illumina sequencing data, we assembled high-quality complete genomes of seven X. fastidiosa subsp. fastidiosa strains isolated from different plants in Spain, the United States, and Mexico. Comparative genomic analyses discovered differences in plasmid content among strains, including plasmids that had been overlooked previously when using the Illumina sequencing platform alone. Interestingly, in strain CFBP8073, intercepted in France from plants imported from Mexico, three plasmids were identified, including two (plasmids pXF-P1.CFBP8073 and pXF-P2.CFBP8073) not previously described in X. fastidiosa and one (pXF5823.CFBP8073) almost identical to a plasmid described in a X. fastidiosa strain from citrus. Plasmids found in the Spanish strains here were similar to those described previously in other strains from the same subspecies and ST1 isolated in the Balearic Islands and the United States. The genome resources from this work will assist in further studies on the role of plasmids in the epidemiology, ecology, and evolution of this plant pathogen.
Collapse
Affiliation(s)
- María Pilar Velasco-Amo
- Instituto de Agricultura Sostenible, Consejo Superior de Investigaciones Científicas (CSIC), Córdoba, Spain
| | - Luis F Arias-Giraldo
- Instituto de Agricultura Sostenible, Consejo Superior de Investigaciones Científicas (CSIC), Córdoba, Spain
| | - Miguel Román-Écija
- Instituto de Agricultura Sostenible, Consejo Superior de Investigaciones Científicas (CSIC), Córdoba, Spain
| | - Leonardo De La Fuente
- Department of Entomology and Plant Pathology, Auburn University, Auburn, AL 36849, U.S.A
| | - Ester Marco-Noales
- Centro de Protección Vegetal y Biotecnología, Instituto Valenciano de Investigaciones Agrarias (IVIA), Moncada, Spain
| | - Eduardo Moralejo
- Tragsa, Empresa de Transformación Agraria, Delegación de Baleares, 07005 Palma, Spain
| | - Juan A Navas-Cortés
- Instituto de Agricultura Sostenible, Consejo Superior de Investigaciones Científicas (CSIC), Córdoba, Spain
| | - Blanca B Landa
- Instituto de Agricultura Sostenible, Consejo Superior de Investigaciones Científicas (CSIC), Córdoba, Spain
| |
Collapse
|
28
|
Wagner GE, Dabernig-Heinz J, Lipp M, Cabal A, Simantzik J, Kohl M, Scheiber M, Lichtenegger S, Ehricht R, Leitner E, Ruppitsch W, Steinmetz I. Real-Time Nanopore Q20+ Sequencing Enables Extremely Fast and Accurate Core Genome MLST Typing and Democratizes Access to High-Resolution Bacterial Pathogen Surveillance. J Clin Microbiol 2023; 61:e0163122. [PMID: 36988494 PMCID: PMC10117118 DOI: 10.1128/jcm.01631-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Accepted: 02/17/2023] [Indexed: 03/30/2023] Open
Abstract
Next-generation whole-genome sequencing is essential for high-resolution surveillance of bacterial pathogens, for example, during outbreak investigations or for source tracking and escape variant analysis. However, current global sequencing and bioinformatic bottlenecks and a long time to result with standard technologies demand new approaches. In this study, we investigated whether novel nanopore Q20+ long-read chemistry enables standardized and easily accessible high-resolution typing combined with core genome multilocus sequence typing (cgMLST). We set high requirements for discriminatory power by using the slowly evolving bacterium Bordetella pertussis as a model pathogen. Our results show that the increased raw read accuracy enables the description of epidemiological scenarios and phylogenetic linkages at the level of gold-standard short reads. The same was true for our variant analysis of vaccine antigens, resistance genes, and virulence factors, demonstrating that nanopore sequencing is a legitimate competitor in the area of next-generation sequencing (NGS)-based high-resolution bacterial typing. Furthermore, we evaluated the parameters for the fastest possible analysis of the data. By combining the optimized processing pipeline with real-time basecalling, we established a workflow that allows for highly accurate and extremely fast high-resolution typing of bacterial pathogens while sequencing is still in progress. Along with advantages such as low costs and portability, the approach suggested here might democratize modern bacterial typing, enabling more efficient infection control globally.
Collapse
Affiliation(s)
- Gabriel E. Wagner
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Johanna Dabernig-Heinz
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Michaela Lipp
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Adriana Cabal
- Austrian Agency for Health and Food Safety, Vienna, Austria
| | - Jonathan Simantzik
- Medical and Life Sciences Faculty, Furtwangen University, Villingen-Schwenningen, Germany
| | - Matthias Kohl
- Medical and Life Sciences Faculty, Furtwangen University, Villingen-Schwenningen, Germany
| | - Martina Scheiber
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Sabine Lichtenegger
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Ralf Ehricht
- InfectoGnostics Research Campus, Centre for Applied Research, Jena, Germany
- Leibniz-Institute of Photonic Technology (Leibniz-IPHT), Jena, Germany
- Friedrich Schiller University Jena, Institute of Physical Chemistry, Jena, Germany
| | - Eva Leitner
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | | | - Ivo Steinmetz
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| |
Collapse
|
29
|
Xia Y, Li X, Wu Z, Nie C, Cheng Z, Sun Y, Liu L, Zhang T. Strategies and tools in illumina and nanopore-integrated metagenomic analysis of microbiome data. IMETA 2023; 2:e72. [PMID: 38868337 PMCID: PMC10989838 DOI: 10.1002/imt2.72] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 11/10/2022] [Accepted: 11/28/2022] [Indexed: 06/14/2024]
Abstract
Metagenomic strategy serves as the foundation for the ecological exploration of novel bioresources (e.g., industrial enzymes and bioactive molecules) and biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. Recent advancements in sequencing technology have fostered rapid development in the field of microbiome research where an increasing number of studies have applied both illumina short reads (SRs) and nanopore long reads (LRs) sequencing in their metagenomic workflow. However, given the high complexity of an environmental microbiome data set and the bioinformatic challenges caused by the unique features of these sequencing technologies, integrating SRs and LRs is not as straightforward as one might assume. The fast renewal of existing tools and growing diversity of new algorithms make access to this field even more difficult. Therefore, here we systematically summarized the complete workflow from DNA extraction to data processing strategies for applying illumina and nanopore-integrated metagenomics in the investigation in environmental microbiomes. Overall, this review aims to provide a timely knowledge framework for researchers that are interested in or are struggling with the SRs and LRs integration in their metagenomic analysis. The discussions presented will facilitate improved ecological understanding of community functionalities and assembly of natural, engineered, and human microbiomes, benefiting researchers from multiple disciplines.
Collapse
Affiliation(s)
- Yu Xia
- School of Environmental Science and Engineering, College of EngineeringSouthern University of Science and TechnologyShenzhenChina
- State Environmental Protection Key Laboratory of Integrated Surface Water‐Groundwater Pollution Control, School of Environmental Science and EngineeringSouthern University of Science and TechnologyShenzhenChina
- Guangdong Provincial Key Laboratory of Soil and Groundwater Pollution Control, School of Environmental Science and EngineeringSouthern University of Science and TechnologyShenzhenChina
| | - Xiang Li
- School of Environmental Science and Engineering, College of EngineeringSouthern University of Science and TechnologyShenzhenChina
| | - Ziqi Wu
- School of Environmental Science and Engineering, College of EngineeringSouthern University of Science and TechnologyShenzhenChina
| | - Cailong Nie
- School of Environmental Science and Engineering, College of EngineeringSouthern University of Science and TechnologyShenzhenChina
| | - Zhanwen Cheng
- School of Environmental Science and Engineering, College of EngineeringSouthern University of Science and TechnologyShenzhenChina
| | - Yuhong Sun
- School of Environmental Science and Engineering, College of EngineeringSouthern University of Science and TechnologyShenzhenChina
| | - Lei Liu
- Environmental Microbiome Engineering and Biotechnology LaboratoryThe University of Hong KongHong Kong SARChina
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology LaboratoryThe University of Hong KongHong Kong SARChina
| |
Collapse
|
30
|
Kovaka S, Ou S, Jenike KM, Schatz MC. Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing. Nat Methods 2023; 20:12-16. [PMID: 36635537 PMCID: PMC10068675 DOI: 10.1038/s41592-022-01716-8] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The year 2022 will be remembered as the turning point for accurate long-read sequencing, which now establishes the gold standard for speed and accuracy at competitive costs. We discuss the key bioinformatics techniques needed to power long reads across application areas and close with our vision for long-read sequencing over the coming years.
Collapse
Affiliation(s)
- Sam Kovaka
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Shujun Ou
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Molecular Genetics, Ohio State University, Columbus, OH, USA
| | - Katharine M Jenike
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA.
| |
Collapse
|
31
|
Arumugam K, Bessarab I, Haryono MAS, Williams RBH. Recovery and Analysis of Long-Read Metagenome-Assembled Genomes. Methods Mol Biol 2023; 2649:235-259. [PMID: 37258866 DOI: 10.1007/978-1-0716-3072-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The development of long-read nucleic acid sequencing is beginning to make very substantive impact on the conduct of metagenome analysis, particularly in relation to the problem of recovering the genomes of member species of complex microbial communities. Here we outline bioinformatics workflows for the recovery and characterization of complete genomes from long-read metagenome data and some complementary procedures for comparison of cognate draft genomes and gene quality obtained from short-read sequencing and long-read sequencing.
Collapse
Affiliation(s)
- Krithika Arumugam
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Irina Bessarab
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| | - Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore.
| |
Collapse
|
32
|
Jin H, Quan K, He Q, Kwok LY, Ma T, Li Y, Zhao F, You L, Zhang H, Sun Z. A high-quality genome compendium of the human gut microbiome of Inner Mongolians. Nat Microbiol 2023; 8:150-161. [PMID: 36604505 DOI: 10.1038/s41564-022-01270-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Accepted: 10/13/2022] [Indexed: 01/07/2023]
Abstract
Metagenome-based resources have revealed the diversity and function of the human gut microbiome, but further understanding is limited by insufficient genome quality and a lack of samples from typically understudied populations. Here we used hybrid long-read PromethION and short-read HiSeq sequencing to characterize the faecal microbiota of 60 Inner Mongolian individuals (n = 180 samples over three time points) who were part of a probiotic yogurt intervention trial. We present the Inner Mongolian Gut Genome catalogue, comprising 802 closed and 5,927 high-quality metagenome-assembled genomes. This approach achieved high genome continuity and substantially increased the resolution of genomic elements, including ribosomal RNA operons, metabolic gene clusters, prophages and insertion sequences. Particularly, we report the ribosomal RNA operon copy numbers for uncultured species, over 12,000 previously undescribed gut prophages and the distribution of insertion sequence elements across gut bacteria. Overall, these data provide a high-quality, large-scale resource for studying the human gut microbiota.
Collapse
Affiliation(s)
- Hao Jin
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Keyu Quan
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Qiuwen He
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Lai-Yu Kwok
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Teng Ma
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Yalin Li
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Feiyan Zhao
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Lijun You
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China
| | - Heping Zhang
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China. .,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China. .,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.
| | - Zhihong Sun
- Inner Mongolia Key Laboratory of Dairy Biotechnology and Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China. .,Key Laboratory of Dairy Products Processing, Ministry of Agriculture and Rural Affairs, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China. .,Key Laboratory of Dairy Biotechnology and Engineering, Ministry of Education, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China.
| |
Collapse
|
33
|
Liu L, Yang Y, Deng Y, Zhang T. Nanopore long-read-only metagenomics enables complete and high-quality genome reconstruction from mock and complex metagenomes. MICROBIOME 2022; 10:209. [PMID: 36457010 PMCID: PMC9716684 DOI: 10.1186/s40168-022-01415-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 11/07/2022] [Indexed: 05/31/2023]
Abstract
BACKGROUND The accurate and comprehensive analyses of genome-resolved metagenomics largely depend on the reconstruction of reference-quality (complete and high-quality) genomes from diverse microbiomes. Closing gaps in draft genomes have been approaching with the inclusion of Nanopore long reads; however, genome quality improvement requires extensive and time-consuming high-accuracy short-read polishing. RESULTS Here, we introduce NanoPhase, an open-source tool to reconstruct reference-quality genomes from complex metagenomes using only Nanopore long reads. Using Kit 9 and Q20+ chemistries, we first evaluated the feasibility of NanoPhase using a ZymoBIOMICS gut microbiome standard (including 21 strains), then sequenced the complex activated sludge microbiome and reconstructed 275 MAGs with median completeness of ~ 90%. As a result, NanoPhase improved the MAG contiguity (median MAG N50: 735 Kb, 44-86X compared to conventional short-read-based methods) while maintaining high accuracy, allowing for a full and accurate investigation of target microbiomes. Additionally, leveraging these high-contiguity reference-quality genomes, we identified 165 prophages within 111 MAGs, with 5 as active prophages, indicating the prophage was a neglected source of genetic diversity within microbial populations and influencer in shaping microbial composition in the activated sludge microbiome. CONCLUSIONS Our results demonstrated that NanoPhase enables reference-quality genome reconstruction from complex metagenomes directly using only Nanopore long reads. Furthermore, besides the 16S rRNA genes and biosynthetic gene clusters, the generated high-accuracy and high-contiguity MAGs improved the host identification of critical mobile genetic elements, e.g., prophage, serving as a genomic blueprint to investigate the microbial potential and ecology in the activated sludge ecosystem. Video Abstract.
Collapse
Affiliation(s)
- Lei Liu
- Environmental Microbiome Engineering and Biotechnology Laboratory, Center for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Yu Yang
- Environmental Microbiome Engineering and Biotechnology Laboratory, Center for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Yu Deng
- Environmental Microbiome Engineering and Biotechnology Laboratory, Center for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology Laboratory, Center for Environmental Engineering Research, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
34
|
Iyengar BR, Wagner A. Bacterial Hsp90 predominantly buffers but does not potentiate the phenotypic effects of deleterious mutations during fluorescent protein evolution. Genetics 2022; 222:iyac154. [PMID: 36227141 PMCID: PMC9713429 DOI: 10.1093/genetics/iyac154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 09/26/2022] [Indexed: 12/13/2022] Open
Abstract
Chaperones facilitate the folding of other ("client") proteins and can thus affect the adaptive evolution of these clients. Specifically, chaperones affect the phenotype of proteins via two opposing mechanisms. On the one hand, they can buffer the effects of mutations in proteins and thus help preserve an ancestral, premutation phenotype. On the other hand, they can potentiate the effects of mutations and thus enhance the phenotypic changes caused by a mutation. We study that how the bacterial Hsp90 chaperone (HtpG) affects the evolution of green fluorescent protein. To this end, we performed directed evolution of green fluorescent protein under low and high cellular concentrations of Hsp90. Specifically, we evolved green fluorescent protein under both stabilizing selection for its ancestral (green) phenotype and directional selection toward a new (cyan) phenotype. While Hsp90 did only affect the rate of adaptive evolution transiently, it did affect the phenotypic effects of mutations that occurred during adaptive evolution. Specifically, Hsp90 allowed strongly deleterious mutations to accumulate in evolving populations by buffering their effects. Our observations show that the role of a chaperone for adaptive evolution depends on the organism and the trait being studied.
Collapse
Affiliation(s)
- Bharat Ravi Iyengar
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015 Lausanne, Switzerland
- Institute for Evolution and Biodiversity, Westfalian Wilhelms—University of Münster, 48149 Münster, Germany
| | - Andreas Wagner
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, 8057 Zurich, Switzerland
- Swiss Institute of Bioinformatics, Quartier Sorge-Batiment Genopode, 1015 Lausanne, Switzerland
- The Santa Fe Institute, Santa Fe, NM 87501, USA
- Stellenbosch Institute for Advanced Study (STIAS), Wallenberg Research Centre at Stellenbosch University, 7600 Stellenbosch, South Africa
| |
Collapse
|
35
|
Holt GS, Batty LE, Alobaidi BKS, Smith HE, Oud MS, Ramos L, Xavier MJ, Veltman JA. Phasing of de novo mutations using a scaled-up multiple amplicon long-read sequencing approach. Hum Mutat 2022; 43:1545-1556. [PMID: 36047340 PMCID: PMC9826063 DOI: 10.1002/humu.24450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/11/2022] [Accepted: 08/18/2022] [Indexed: 01/11/2023]
Abstract
De novo mutations (DNMs) play an important role in severe genetic disorders that reduce fitness. To better understand their role in disease, it is important to determine the parent-of-origin and timing of mutational events that give rise to these mutations, especially in sex-specific developmental disorders such as male infertility. However, currently available short-read sequencing approaches are not ideally suited for phasing, as this requires long continuous DNA strands that span both the DNM and one or more informative single-nucleotide polymorphisms. To overcome these challenges, we optimized and implemented a multiplexed long-read sequencing approach using Oxford Nanopore technologies MinION platform. We focused on improving target amplification, integrating long-read sequenced data with high-quality short-read sequence data, and developing an anchored phasing computational method. This approach handled the inherent phasing challenges of long-range target amplification and the normal accumulation of sequencing error associated with long-read sequencing. In total, 77 of 109 DNMs (71%) were successfully phased and parent-of-origin identified. The majority of phased DNMs were prezygotic (90%), the accuracy of which is highlighted by an average mutant allele frequency of 49.6% and standard error of 0.84%. This study demonstrates the benefits of employing an integrated short-read and long-read sequencing approach for large-scale DNM phasing.
Collapse
Affiliation(s)
- Giles S. Holt
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Lois E. Batty
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Bilal K. S. Alobaidi
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Hannah E. Smith
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Manon S. Oud
- Department of Human Genetics, Donders Institute for BrainCognition and Behaviour, RadboudumcNijmegenThe Netherlands
| | - Liliana Ramos
- Department of Obstetrics and Gynecology, Division of Reproductive MedicineRadboudumcNijmegenThe Netherlands
| | - Miguel J. Xavier
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| | - Joris A. Veltman
- Biosciences Institute, Faculty of Medical SciencesNewcastle UniversityNewcastle upon TyneUK
| |
Collapse
|
36
|
Srikakulam N, Sridevi G, Pandi G. High-quality reference transcriptome construction improves RNA-seq quantification in Oryza sativa indica. Front Genet 2022; 13:995072. [PMID: 36246658 PMCID: PMC9558114 DOI: 10.3389/fgene.2022.995072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/02/2022] [Indexed: 11/13/2022] Open
Abstract
The Reference Transcriptomic Dataset (RTD) is an accurate and comprehensive collection of transcripts originating from a given organism. It holds the key to precise transcript quantification and downstream analysis of differential expressions and regulations. Currently, transcriptome annotations for most crop plants are far from complete. For example, Oryza sativa indica (O. sativa indica) is reported to have 40,759 transcripts in the Ensembl database without alternative transcript isoforms and alternative splicing (AS) events. To generate a high-quality RTD, we conducted RNA sequencing of rice leaf samples collected at various time points during Rhizoctonia solani infection. The obtained reads were analyzed by adopting the recently developed computational analysis pipeline to assemble the RTD with increased transcript and AS diversity for O. sativa indica (IndicaRTD). After stringent quality filtering, the newly constructed transcriptome annotation was comprised of 122,968 non-redundant transcripts from 53,695 genes. This study identified many novel transcripts compared to Ensembl deposited data that are important for regulating molecular and physiological processes in the plant system. Currently, the assembled IndicaRTD must allow fast quantification of transcript and gene expression with high precision.
Collapse
Affiliation(s)
- Nagesh Srikakulam
- Laboratory of RNA Biology and Epigenomics, Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai, India
- *Correspondence: Nagesh Srikakulam, ; Gopal Pandi,
| | - Ganapathi Sridevi
- Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai, India
| | - Gopal Pandi
- Laboratory of RNA Biology and Epigenomics, Department of Plant Biotechnology, School of Biotechnology, Madurai Kamaraj University, Madurai, India
- *Correspondence: Nagesh Srikakulam, ; Gopal Pandi,
| |
Collapse
|
37
|
Fu X, Zan XY, Sun L, Tan M, Cui FJ, Liang YY, Meng LJ, Sun WJ. Functional Characterization and Structural Basis of the β-1,3-Glucan Synthase CMGLS from Mushroom Cordyceps militaris. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:8725-8737. [PMID: 35816703 DOI: 10.1021/acs.jafc.2c03410] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
β-1,3-Glucan synthases play key roles in glucan synthesis, cell wall assembly, and growth of fungi. However, their multi-transmembrane domains (over 14 TMHs) and large molecular masses (over 100 kDa) significantly hamper understanding of their catalytic characteristics and mechanisms. In the present study, the 5841-bp gene CMGLS encoding the 221.7 kDa membrane-bound β-1,3-glucan synthase CMGLS in Cordyceps militaris was cloned, identified, and structurally analyzed. CMGLS was partially purified with a specific activity of 87.72 pmol/min/μg, a purification fold of 121, and a yield of 10.16% using a product-entrapment purification method. CMGLS showed a strict specificity to UDP-glucose with a Km value of 84.28 μM at pH 7.0 and synthesized β-1,3-glucan with a maximum degree of polymerization (DP) of 70. With the assistance of AlphaFold and molecular docking, the 3D structure of CMGLS and its binding features with substrate UDP-glucose were proposed for the first time to our knowledge. UDP-glucose potentially bound to at least 11 residues via hydrogen bonds, π-stacking ,and salt bridges, and Arg 1436 was predicted as a key residue directly interacting with the moieties of glucose, phosphate, and the ribose ring on UDP-glucose. These findings would open an avenue to recognize and understand the glucan synthesis process and catalytic mechanism of β-1,3-glucan synthases in mushrooms.
Collapse
Affiliation(s)
- Xin Fu
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
| | - Xin-Yi Zan
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
| | - Lei Sun
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
| | - Ming Tan
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
| | - Feng-Jie Cui
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
- Jiangxi Provincial Engineering and Technology Center for Food Additives Bio-production, Dexing 334221, P.R. China
| | - Ying-Ying Liang
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
| | - Li-Juan Meng
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
| | - Wen-Jing Sun
- School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, P.R. China
- Jiangxi Provincial Engineering and Technology Center for Food Additives Bio-production, Dexing 334221, P.R. China
| |
Collapse
|
38
|
Zhang R, Kuo R, Coulter M, Calixto CPG, Entizne JC, Guo W, Marquez Y, Milne L, Riegler S, Matsui A, Tanaka M, Harvey S, Gao Y, Wießner-Kroh T, Paniagua A, Crespi M, Denby K, Hur AB, Huq E, Jantsch M, Jarmolowski A, Koester T, Laubinger S, Li QQ, Gu L, Seki M, Staiger D, Sunkar R, Szweykowska-Kulinska Z, Tu SL, Wachter A, Waugh R, Xiong L, Zhang XN, Conesa A, Reddy ASN, Barta A, Kalyna M, Brown JWS. A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis. Genome Biol 2022; 23:149. [PMID: 35799267 PMCID: PMC9264592 DOI: 10.1186/s13059-022-02711-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 06/15/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Accurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single-molecule long-read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation, or incomplete cDNA synthesis. RESULTS We present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 169,000 transcripts-twice that of the best current Arabidopsis transcriptome and including over 1500 novel genes. Seventy-eight percent of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We develop novel methods to determine splice junctions and transcription start and end sites accurately. Mismatch profiles around splice junctions provide a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identify high-confidence transcription start and end sites and remove fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provides higher resolution of transcript expression profiling and identifies cold-induced differential transcription start and polyadenylation site usage. CONCLUSIONS AtRTD3 is the most comprehensive Arabidopsis transcriptome currently. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage analysis from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single-molecule sequencing analysis from any species.
Collapse
Affiliation(s)
- Runxuan Zhang
- Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, EH25 9RG, UK
| | - Max Coulter
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
| | - Cristiane P G Calixto
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
- Present address: Institute of Biosciences, University of São Paulo, São Paulo, 05508-090, Brazil
| | - Juan Carlos Entizne
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
| | - Wenbin Guo
- Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| | - Yamile Marquez
- Centre for Genomic Regulation, C/ Dr. Aiguader 88, 08003, Barcelona, Spain
| | - Linda Milne
- Information and Computational Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| | - Stefan Riegler
- Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria
- Present address: Institute of Science and Technology Austria, Am Campus 1, 3400, Klosterneuburg, Austria
| | - Akihiro Matsui
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Maho Tanaka
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Sarah Harvey
- Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way, York, YO10 5DD, UK
| | - Yubang Gao
- College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Theresa Wießner-Kroh
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Auf der Morgenstelle 32, 72076, Tübingen, Germany
| | - Alejandro Paniagua
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
| | - Martin Crespi
- French National Centre for Scientific Research | CNRS INRAE-Universities of Paris Saclay and Paris, Institute of Plant Sciences Paris Saclay IPS2, Rue de Noetzlin, 91192, Gif sur Yvette, France
| | - Katherine Denby
- Centre for Novel Agricultural Products (CNAP), Department of Biology, University of York Wentworth Way, York, YO10 5DD, UK
| | - Asa Ben Hur
- Department of Computer Science, Colorado State University, 1873 Campus Delivery, Fort Collins, CO, 80523-1873, USA
| | - Enamul Huq
- Department of Molecular Biosciences, University of Texas at Austin, 100 East 24th St., Austin, TX, 78712-1095, USA
| | - Michael Jantsch
- Department of Cell and Developmental Biology, Center for Anatomy and Cell Biology, Medical University of Vienna, Schwarzspanierstrasse 17 A-1090, Vienna, Austria
| | - Artur Jarmolowski
- Department of Gene Expression, Adam Mickiewicz University, Poznań, Poland
| | - Tino Koester
- RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University, Universitaetsstrasse 25, 33615, Bielefeld, Germany
| | - Sascha Laubinger
- Institut für Biologie und Umweltwissenschaften (IBU), Carl von Ossietzky Universität Oldenburg, Carl von Ossietzky-Str. 9-11, 26111, Oldenburg, Germany
- Institute of Biology, Department of Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Qingshun Quinn Li
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA
- Key Laboratory of the Ministry of Education for Coastal and Wetland Ecosystems, College of the Environment and Ecology, Xiamen University, Xiamen, 361102, Fujian, China
| | - Lianfeng Gu
- College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Motoaki Seki
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa, 230-0045, Japan
| | - Dorothee Staiger
- RNA Biology and Molecular Physiology, Faculty for Biology, Bielefeld University, Universitaetsstrasse 25, 33615, Bielefeld, Germany
| | - Ramanjulu Sunkar
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | | | - Shih-Long Tu
- Institute of Plant and Microbial Biology, Academia Sinica, Taipei, Taiwan
| | - Andreas Wachter
- Center for Plant Molecular Biology (ZMBP), University of Tübingen, Auf der Morgenstelle 32, 72076, Tübingen, Germany
- Present address: Institute for Molecular Physiology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 17, 55128, Mainz, Germany
| | - Robbie Waugh
- Cell and Molecular Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| | - Liming Xiong
- Department of Biology, Hong Kong Baptist University, Hong Kong, China
| | - Xiao-Ning Zhang
- Biology Department, School of Arts and Sciences, St. Bonaventure University, 3261 West State Road, St. Bonaventure, NY, 14778, USA
| | - Ana Conesa
- Institute for Integrative Systems Biology (CSIC-UV), Spanish National Research Council, Paterna, Valencia, Spain
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO, 80523, USA
| | - Andrea Barta
- Max F. Perutz Laboratories, Medical University of Vienna, Center of Medical Biochemistry, Dr.-Bohr-Gasse 9/3, A-1030, Vienna, Austria
| | - Maria Kalyna
- Institute of Molecular Plant Biology, Department of Applied Genetics and Cell Biology, University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria
| | - John W S Brown
- Plant Sciences Division, School of Life Sciences, University of Dundee at The James Hutton Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK
- Cell and Molecular Sciences, James Hutton Institute, Dundee, DD2 5DA, Scotland, UK
| |
Collapse
|
39
|
Sereika M, Kirkegaard RH, Karst SM, Michaelsen TY, Sørensen EA, Wollenberg RD, Albertsen M. Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing. Nat Methods 2022; 19:823-826. [PMID: 35789207 PMCID: PMC9262707 DOI: 10.1038/s41592-022-01539-7] [Citation(s) in RCA: 139] [Impact Index Per Article: 69.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Accepted: 05/24/2022] [Indexed: 12/26/2022]
Abstract
Long-read Oxford Nanopore sequencing has democratized microbial genome sequencing and enables the recovery of highly contiguous microbial genomes from isolates or metagenomes. However, to obtain near-finished genomes it has been necessary to include short-read polishing to correct insertions and deletions derived from homopolymer regions. Here, we show that Oxford Nanopore R10.4 can be used to generate near-finished microbial genomes from isolates or metagenomes without short-read or reference polishing.
Collapse
Affiliation(s)
- Mantas Sereika
- Center for Microbial Communities, Aalborg University, Aalborg, Denmark
| | - Rasmus Hansen Kirkegaard
- Center for Microbial Communities, Aalborg University, Aalborg, Denmark.,Joint Microbiome Facility, University of Vienna, Vienna, Austria
| | | | | | | | | | - Mads Albertsen
- Center for Microbial Communities, Aalborg University, Aalborg, Denmark.
| |
Collapse
|
40
|
Arévalo MT, Karavis MA, Katoski SE, Harris JV, Hill JM, Deshpande SV, Roth PA, Liem AT, Bernhards RC. A Rapid, Whole Genome Sequencing Assay for Detection and Characterization of Novel Coronavirus (SARS-CoV-2) Clinical Specimens Using Nanopore Sequencing. Front Microbiol 2022; 13:910955. [PMID: 35733956 PMCID: PMC9207459 DOI: 10.3389/fmicb.2022.910955] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 05/09/2022] [Indexed: 12/22/2022] Open
Abstract
A new human coronavirus, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), emerged at the end of 2019 in Wuhan, China that caused a range of disease severities; including fever, shortness of breath, and coughing. This disease, now known as coronavirus disease 2019 (COVID-19), quickly spread throughout the world, and was declared a pandemic by the World Health Organization in March of 2020. As the disease continues to spread, providing rapid characterization has proven crucial to better inform the design and execution of control measures, such as decontamination methods, diagnostic tests, antiviral drugs, and prophylactic vaccines for long-term control. Our work at the United States Army’s Combat Capabilities Development Command Chemical Biological Center (DEVCOM CBC) is focused on engineering workflows to efficiently identify, characterize, and evaluate the threat level of any potential biological threat in the field and more remote, lower resource settings, such as forward operating bases. While we have successfully established untargeted sequencing approaches for detection of pathogens for rapid identification, our current work entails a more in-depth sequencing analysis for use in evolutionary monitoring. We are developing and validating a SARS-CoV-2 nanopore sequencing assay, based on the ARTIC protocol. The standard ARTIC, Illumina, and nanopore sequencing protocols for SARS-CoV-2 are elaborate and time consuming. The new protocol integrates Oxford Nanopore Technology’s Rapid Sequencing Kit following targeted RT-PCR of RNA extracted from human clinical specimens. This approach decreases sample manipulations and preparation times. Our current bioinformatics pipeline utilizes Centrifuge as the classifier for quick identification of SARS-CoV-2 and RAMPART software for verification and mapping of reads to the full SARS-CoV-2 genome. ARTIC rapid sequencing results, of previous RT-PCR confirmed patient samples, showed that the modified protocol produces high quality data, with up to 98.9% genome coverage at >1,000x depth for samples with presumably higher viral loads. Furthermore, whole genome assembly and subsequent mutational analysis of six of these sequences identified existing and unique mutations to this cluster, including three in the Spike protein: V308L, P521R, and D614G. This work suggests that an accessible, portable, and relatively fast sample-to-sequence process to characterize viral outbreaks is feasible and effective.
Collapse
Affiliation(s)
- Maria T. Arévalo
- Defense Threat Reduction Agency, Aberdeen Proving Ground, MD, United States
- United States Army Combat Capabilities Development Command Chemical Biological Center, Aberdeen Proving Ground, MD, United States
- *Correspondence: Maria T. Arévalo,
| | - Mark A. Karavis
- United States Army Combat Capabilities Development Command Chemical Biological Center, Aberdeen Proving Ground, MD, United States
| | - Sarah E. Katoski
- United States Army Combat Capabilities Development Command Chemical Biological Center, Aberdeen Proving Ground, MD, United States
| | - Jacquelyn V. Harris
- United States Army Combat Capabilities Development Command Chemical Biological Center, Aberdeen Proving Ground, MD, United States
| | | | - Samir V. Deshpande
- United States Army Combat Capabilities Development Command Chemical Biological Center, Aberdeen Proving Ground, MD, United States
| | | | | | - R. Cory Bernhards
- United States Army Combat Capabilities Development Command Chemical Biological Center, Aberdeen Proving Ground, MD, United States
| |
Collapse
|
41
|
Mc Cartney AM, Shafin K, Alonge M, Bzikadze AV, Formenti G, Fungtammasan A, Howe K, Jain C, Koren S, Logsdon GA, Miga KH, Mikheenko A, Paten B, Shumate A, Soto DC, Sović I, Wood JMD, Zook JM, Phillippy AM, Rhie A. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods 2022; 19:687-695. [PMID: 35361931 PMCID: PMC9812399 DOI: 10.1038/s41592-022-01440-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/04/2022] [Indexed: 01/07/2023]
Abstract
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
Collapse
Affiliation(s)
- Ann M. Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Department of Computational and Data Sciences, Indian Institute of Science, Bangalore KA, India
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, USA
| | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA,Digital BioLogic d.o.o., Ivanić-Grad, Croatia
| | | | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Correspondence: ,
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Correspondence: ,
| |
Collapse
|
42
|
Merfin: improved variant filtering, assembly evaluation and polishing via k-mer validation. Nat Methods 2022; 19:696-704. [PMID: 35361932 PMCID: PMC9745813 DOI: 10.1038/s41592-022-01445-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/07/2022] [Indexed: 12/15/2022]
Abstract
Variant calling has been widely used for genotyping and for improving the consensus accuracy of long-read assemblies. Variant calls are commonly hard-filtered with user-defined cutoffs. However, it is impossible to define a single set of optimal cutoffs, as the calls heavily depend on the quality of the reads, the variant caller of choice and the quality of the unpolished assembly. Here, we introduce Merfin, a k-mer based variant-filtering algorithm for improved accuracy in genotyping and genome assembly polishing. Merfin evaluates each variant based on the expected k-mer multiplicity in the reads, independently of the quality of the read alignment and variant caller's internal score. Merfin increased the precision of genotyped calls in several benchmarks, improved consensus accuracy and reduced frameshift errors when applied to human and nonhuman assemblies built from Pacific Biosciences HiFi and continuous long reads or Oxford Nanopore reads, including the first complete human genome. Moreover, we introduce assembly quality and completeness metrics that account for the expected genomic copy numbers.
Collapse
|
43
|
Ko KKK, Chng KR, Nagarajan N. Metagenomics-enabled microbial surveillance. Nat Microbiol 2022; 7:486-496. [PMID: 35365786 DOI: 10.1038/s41564-022-01089-w] [Citation(s) in RCA: 71] [Impact Index Per Article: 35.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 02/22/2022] [Indexed: 12/13/2022]
Abstract
Lessons learnt from the COVID-19 pandemic include increased awareness of the potential for zoonoses and emerging infectious diseases that can adversely affect human health. Although emergent viruses are currently in the spotlight, we must not forget the ongoing toll of morbidity and mortality owing to antimicrobial resistance in bacterial pathogens and to vector-borne, foodborne and waterborne diseases. Population growth, planetary change, international travel and medical tourism all contribute to the increasing frequency of infectious disease outbreaks. Surveillance is therefore of crucial importance, but the diversity of microbial pathogens, coupled with resource-intensive methods, compromises our ability to scale-up such efforts. Innovative technologies that are both easy to use and able to simultaneously identify diverse microorganisms (viral, bacterial or fungal) with precision are necessary to enable informed public health decisions. Metagenomics-enabled surveillance methods offer the opportunity to improve detection of both known and yet-to-emerge pathogens.
Collapse
Affiliation(s)
- Karrie K K Ko
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,Department of Microbiology, Singapore General Hospital, Singapore, Singapore.,Department of Molecular Pathology, Singapore General Hospital, Singapore, Singapore.,Duke-NUS Medical School, Singapore, Singapore.,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore
| | - Kern Rei Chng
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore.,National Centre for Food Science, Singapore Food Agency, Singapore, Singapore
| | - Niranjan Nagarajan
- Laboratory of Metagenomic Technologies and Microbial Systems, Genome Institute of Singapore, Singapore, Singapore. .,Yong Loo Lin School of Medicine, National Univerisity of Singapore, Singapore, Singapore.
| |
Collapse
|
44
|
Hönemann M, Viehweger A, Dietze N, Johnke J, Rodloff AC. Leclercia pneumoniae sp. nov., a bacterium isolated from clinical specimen in Leipzig, Germany. Int J Syst Evol Microbiol 2022; 72. [DOI: 10.1099/ijsem.0.005293] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Strain 49125T was isolated from an infant with pneumonia and septicaemia at the Leipzig University Hospital. Phenotypic and genomic traits were investigated. The strain's biochemical profile and its MALDI-TOF spectrogram did not differ from comparative samples of
Leclercia adecarboxylata
, thus far the sole member of the
Leclercia
species. A circular genome with a size of 4.4 Mbp and a G+C content of 55.0 mol% was reconstructed using hybrid Illumina and Nanopore sequencing. Phylogenetic analysis was based on 172 marker genes and validated using a k-mer-based search against a large genome collection including subsequent in silico DNA–DNA hybridization. Whole genome average nucleotide identity to any described species was below 95%, suggesting that strain 49125T represents a new species, for which we propose the name Leclercia pneumoniae sp. nov. with the type strain 49125T (=LMG 32245T=DSM 112336T).
Collapse
Affiliation(s)
- Mario Hönemann
- Institute Medical Microbiology and Virology, Virology Section, Leipzig University, Johannisallee 30, 04103 Leipzig, Germany
| | - Adrian Viehweger
- Institute for Medical Microbiology and Virology, Microbiology Section, Leipzig University, Liebigstraße 21, 04103 Leipzig, Germany
| | - Nadine Dietze
- Institute for Medical Microbiology and Virology, Microbiology Section, Leipzig University, Liebigstraße 21, 04103 Leipzig, Germany
| | - Julia Johnke
- Department of Evolutionary Ecology and Genetics, Zoological Institute, CAU Kiel, Am Botanischen Garten 1-9, 24118, Kiel, Germany
| | - Arne C. Rodloff
- Institute for Medical Microbiology and Virology, Microbiology Section, Leipzig University, Liebigstraße 21, 04103 Leipzig, Germany
| |
Collapse
|
45
|
Gabed N, Verret F, Peticca A, Kryvoruchko I, Gastineau R, Bosson O, Séveno J, Davidovich O, Davidovich N, Witkowski A, Kristoffersen JB, Benali A, Ioannou E, Koutsaviti A, Roussis V, Gâteau H, Phimmaha S, Leignel V, Badawi M, Khiar F, Francezon N, Fodil M, Pasetto P, Mouget JL. What Was Old Is New Again: The Pennate Diatom Haslea ostrearia (Gaillon) Simonsen in the Multi-Omic Age. Mar Drugs 2022; 20:md20040234. [PMID: 35447907 PMCID: PMC9033121 DOI: 10.3390/md20040234] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/08/2022] [Accepted: 03/18/2022] [Indexed: 02/04/2023] Open
Abstract
The marine pennate diatom Haslea ostrearia has long been known for its characteristic blue pigment marennine, which is responsible for the greening of invertebrate gills, a natural phenomenon of great importance for the oyster industry. For two centuries, this taxon was considered unique; however, the recent description of a new blue Haslea species revealed unsuspected biodiversity. Marennine-like pigments are natural blue dyes that display various biological activities—e.g., antibacterial, antioxidant and antiproliferative—with a great potential for applications in the food, feed, cosmetic and health industries. Regarding fundamental prospects, researchers use model organisms as standards to study cellular and physiological processes in other organisms, and there is a growing and crucial need for more, new and unconventional model organisms to better correspond to the diversity of the tree of life. The present work, thus, advocates for establishing H. ostrearia as a new model organism by presenting its pros and cons—i.e., the interesting aspects of this peculiar diatom (representative of benthic-epiphytic phytoplankton, with original behavior and chemodiversity, controlled sexual reproduction, fundamental and applied-oriented importance, reference genome, and transcriptome will soon be available); it will also present the difficulties encountered before this becomes a reality as it is for other diatom models (the genetics of the species in its infancy, the transformation feasibility to be explored, the routine methods needed to cryopreserve strains of interest).
Collapse
Affiliation(s)
- Noujoud Gabed
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research (HCMR), Gournes Pediados, 71003 Heraklion, Greece; (N.G.); (J.B.K.); (A.B.)
- Oran High School of Biological Sciences (ESSBO), Cellular and Molecular Biology Department, Oran 31000, Algeria
- Laboratoire d’Aquaculture et Bioremediation AquaBior, Université d’Oran 1, Oran 31000, Algeria
| | - Frédéric Verret
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research (HCMR), Gournes Pediados, 71003 Heraklion, Greece; (N.G.); (J.B.K.); (A.B.)
- Correspondence: ; Tel.: +30-2810-337-852
| | - Aurélie Peticca
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Igor Kryvoruchko
- Department of Biology, United Arab Emirates University (UAEU), Al Ain P.O. Box 15551, United Arab Emirates;
| | - Romain Gastineau
- Institute of Marine and Environmental Sciences, University of Szczecin, Mickiewicza 16, 70-383 Szczecin, Poland; (R.G.); (N.D.); (A.W.)
| | - Orlane Bosson
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Julie Séveno
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Olga Davidovich
- Karadag Scientific Station, Natural Reserve of the Russian Academy of Sciences, Kurortnoe, 98188 Feodosiya, Russia;
| | - Nikolai Davidovich
- Institute of Marine and Environmental Sciences, University of Szczecin, Mickiewicza 16, 70-383 Szczecin, Poland; (R.G.); (N.D.); (A.W.)
- Karadag Scientific Station, Natural Reserve of the Russian Academy of Sciences, Kurortnoe, 98188 Feodosiya, Russia;
| | - Andrzej Witkowski
- Institute of Marine and Environmental Sciences, University of Szczecin, Mickiewicza 16, 70-383 Szczecin, Poland; (R.G.); (N.D.); (A.W.)
| | - Jon Bent Kristoffersen
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research (HCMR), Gournes Pediados, 71003 Heraklion, Greece; (N.G.); (J.B.K.); (A.B.)
| | - Amel Benali
- Institute of Marine Biology, Biotechnology and Aquaculture, Hellenic Centre for Marine Research (HCMR), Gournes Pediados, 71003 Heraklion, Greece; (N.G.); (J.B.K.); (A.B.)
- Laboratoire d’Aquaculture et Bioremediation AquaBior, Université d’Oran 1, Oran 31000, Algeria
- Laboratoire de Génétique Moléculaire et Cellulaire, Université des Sciences et de la Technologie d’Oran Mohamed BOUDIAF-USTO-MB, BP 1505, El M’naouer, Oran 31000, Algeria
| | - Efstathia Ioannou
- Section of Pharmacognosy and Chemistry of Natural Products, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece; (E.I.); (A.K.); (V.R.)
| | - Aikaterini Koutsaviti
- Section of Pharmacognosy and Chemistry of Natural Products, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece; (E.I.); (A.K.); (V.R.)
| | - Vassilios Roussis
- Section of Pharmacognosy and Chemistry of Natural Products, Department of Pharmacy, National and Kapodistrian University of Athens, Panepistimiopolis Zografou, 15771 Athens, Greece; (E.I.); (A.K.); (V.R.)
| | - Hélène Gâteau
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Suliya Phimmaha
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Vincent Leignel
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Myriam Badawi
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Feriel Khiar
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Nellie Francezon
- Institut des Molécules et Matériaux du Mans, UMR CNRS 6283, Le Mans Université, Avenue Olivier Messiaen, 2085 Le Mans, France; (N.F.); (P.P.)
| | - Mostefa Fodil
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| | - Pamela Pasetto
- Institut des Molécules et Matériaux du Mans, UMR CNRS 6283, Le Mans Université, Avenue Olivier Messiaen, 2085 Le Mans, France; (N.F.); (P.P.)
| | - Jean-Luc Mouget
- Laboratoire Biologie des Organismes, Stress, Santé, Environnement (BiOSSE), Le Mans Université, Avenue Olivier Messiaen, 72085 Le Mans, France; (A.P.); (O.B.); (J.S.); (H.G.); (S.P.); (V.L.); (M.B.); (F.K.); (M.F.); (J.-L.M.)
| |
Collapse
|
46
|
Iyengar BR, Wagner A. GroEL/S overexpression helps to purge deleterious mutations and reduce genetic diversity during adaptive protein evolution. Mol Biol Evol 2022; 39:6540901. [PMID: 35234895 PMCID: PMC9188349 DOI: 10.1093/molbev/msac047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Chaperones are proteins that help other proteins fold. They also affect the adaptive evolution of their client proteins by buffering the effect of deleterious mutations and increasing the genetic diversity of evolving proteins. We study how the bacterial chaperone GroE (GroEL + GroES) affects the evolution of green fluorescent protein (GFP). To this end we subjected GFP to multiple rounds of mutation and selection for its color phenotype in four replicate E. coli populations, and studied its evolutionary dynamics through high-throughput sequencing and mutant engineering. We evolved GFP both under stabilizing selection for its ancestral (green) phenotype, and to directional selection for a new (cyan) phenotype. We did so both under low and high expression of the chaperone GroE. In contrast to previous work, we observe that GroE does not just buffer but also helps purge deleterious (fluorescence reducing) mutations from evolving populations. In doing so, GroE helps reduce the genetic diversity of evolving populations. In addition, it causes phenotypic heterogeneity in mutants with the same genotype, helping to enhance their fluorescence in some cells, and reducing it in others. Our observations show that chaperones can affect adaptive evolution in more than one way.
Collapse
|
47
|
Petrone JR, Muñoz-Beristain A, Glusberger PR, Russell JT, Triplett EW. Unamplified, Long-Read Metagenomic Sequencing Approach to Close Endosymbiont Genomes of Low-Biomass Insect Populations. Microorganisms 2022; 10:microorganisms10030513. [PMID: 35336091 PMCID: PMC8948638 DOI: 10.3390/microorganisms10030513] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 01/23/2022] [Accepted: 02/23/2022] [Indexed: 02/04/2023] Open
Abstract
With the current advancements in DNA sequencing technology, the limiting factor in long-read metagenomic assemblies is now the quantity and quality of input DNA. Although these requirements can be met through the use of axenic bacterial cultures or large amounts of biological material, insect systems that contain unculturable bacteria or that contain a low amount of available DNA cannot fully utilize the benefits of third-generation sequencing. The citrus greening disease insect vector Diaphorina citri is an example that exhibits both of these limitations. Although endosymbiont genomes have mostly been closed after the short-read sequencing of amplified template DNA, creating de novo long-read genomes from the unamplified DNA of an insect population may benefit communities using bioinformatics to study insect pathosystems. Here all four genomes of the infected D. citri microbiome were sequenced to closure using unamplified template DNA and two long-read sequencing technologies. Avoiding amplification bias and using long reads to assemble the bacterial genomes allowed for the circularization of the Wolbachia endosymbiont of Diaphorina citri for the first time and paralleled the annotation context of all four reference genomes without utilizing a traditional hybrid assembly. The strategies detailed here are suitable for the sequencing of other insect systems for which the input DNA, time, and cost are an issue.
Collapse
|
48
|
Artuso I, Lucidi M, Visaggio D, Capecchi G, Lugli GA, Ventura M, Visca P. Genome diversity of domesticated Acinetobacter baumannii ATCC 19606 T strains. Microb Genom 2022; 8. [PMID: 35084299 PMCID: PMC8914354 DOI: 10.1099/mgen.0.000749] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Acinetobacter baumannii has emerged as an important opportunistic pathogen worldwide, being responsible for large outbreaks for nosocomial infections, primarily in intensive care units. A. baumannii ATCC 19606T is the species type strain, and a reference organism in many laboratories due to its low virulence, amenability to genetic manipulation and extensive antibiotic susceptibility. We wondered if frequent propagation of A. baumannii ATCC 19606T in different laboratories may have driven micro- and macro-evolutionary events that could determine inter-laboratory differences of genome-based data. By combining Illumina MiSeq, MinION and Sanger technologies, we generated a high-quality whole-genome sequence of A. baumannii ATCC 19606T, then performed a comparative genome analysis between A. baumannii ATCC 19606T strains from several research laboratories and a reference collection. Differences between publicly available ATCC 19606T genome sequences were observed, including SNPs, macro- and micro-deletions, and the uneven presence of a 52 kb prophage belonging to genus Vieuvirus. Two plasmids, pMAC and p1ATCC19606, were invariably detected in all tested strains. The presence of a putative replicase, a replication origin containing four 22-mer direct repeats, and a toxin-antitoxin system implicated in plasmid stability were predicted by in silico analysis of p1ATCC19606, and experimentally confirmed. This work refines the sequence, structure and functional annotation of the A. baumannii ATCC 19606T genome, and highlights some remarkable differences between domesticated strains, likely resulting from genetic drift.
Collapse
Affiliation(s)
- Irene Artuso
- Department of Science, Roma Tre University, Viale G. Marconi 446, 00146 Rome, Italy
| | - Massimiliano Lucidi
- Department of Science, Roma Tre University, Viale G. Marconi 446, 00146 Rome, Italy
| | - Daniela Visaggio
- Department of Science, Roma Tre University, Viale G. Marconi 446, 00146 Rome, Italy.,Santa Lucia Fundation IRCCS, Via Ardeatina 306-354, 00179 Rome, Italy
| | - Giulia Capecchi
- Department of Science, Roma Tre University, Viale G. Marconi 446, 00146 Rome, Italy
| | - Gabriele Andrea Lugli
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11a, 43124 Parma, Italy
| | - Marco Ventura
- Laboratory of Probiogenomics, Department of Chemistry, Life Sciences, and Environmental Sustainability, University of Parma, Parco Area delle Scienze 11a, 43124 Parma, Italy
| | - Paolo Visca
- Department of Science, Roma Tre University, Viale G. Marconi 446, 00146 Rome, Italy.,Santa Lucia Fundation IRCCS, Via Ardeatina 306-354, 00179 Rome, Italy
| |
Collapse
|
49
|
Kress WJ, Soltis DE, Kersey PJ, Wegrzyn JL, Leebens-Mack JH, Gostel MR, Liu X, Soltis PS. Green plant genomes: What we know in an era of rapidly expanding opportunities. Proc Natl Acad Sci U S A 2022; 119:e2115640118. [PMID: 35042803 PMCID: PMC8795535 DOI: 10.1073/pnas.2115640118] [Citation(s) in RCA: 51] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Green plants play a fundamental role in ecosystems, human health, and agriculture. As de novo genomes are being generated for all known eukaryotic species as advocated by the Earth BioGenome Project, increasing genomic information on green land plants is essential. However, setting standards for the generation and storage of the complex set of genomes that characterize the green lineage of life is a major challenge for plant scientists. Such standards will need to accommodate the immense variation in green plant genome size, transposable element content, and structural complexity while enabling research into the molecular and evolutionary processes that have resulted in this enormous genomic variation. Here we provide an overview and assessment of the current state of knowledge of green plant genomes. To date fewer than 300 complete chromosome-scale genome assemblies representing fewer than 900 species have been generated across the estimated 450,000 to 500,000 species in the green plant clade. These genomes range in size from 12 Mb to 27.6 Gb and are biased toward agricultural crops with large branches of the green tree of life untouched by genomic-scale sequencing. Locating suitable tissue samples of most species of plants, especially those taxa from extreme environments, remains one of the biggest hurdles to increasing our genomic inventory. Furthermore, the annotation of plant genomes is at present undergoing intensive improvement. It is our hope that this fresh overview will help in the development of genomic quality standards for a cohesive and meaningful synthesis of green plant genomes as we scale up for the future.
Collapse
Affiliation(s)
- W John Kress
- National Museum of Natural History, Smithsonian Institution, Department of Botany, Washington, DC 20013-7012;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
- Arnold Arboretum, Harvard University, Boston, MA 02130
| | - Douglas E Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
- Department of Biology, University of Florida, Gainesville, FL 32611
| | - Paul J Kersey
- Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, United Kingdom
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, Institute for Systems Genomics: Computational Biology Core, University of Connecticut, Storrs, CT 06269-3214
| | - James H Leebens-Mack
- Department of Plant Biology, 2101 Miller Plant Sciences, University of Georgia, Athens, GA 30602-7271
| | - Morgan R Gostel
- Botanical Research Institute of Texas, Fort Worth, TX 76107-3400
| | - Xin Liu
- China National GeneBank, BGI-Shenzhen, Shenzhen 518120, China
| | - Pamela S Soltis
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611
- Biodiversity Institute, University of Florida, Gainesville, FL 32611
| |
Collapse
|
50
|
Neupane S, Bonilla SI, Manalo AM, Pelz-Stelinski KS. Complete de novo assembly of Wolbachia endosymbiont of Diaphorina citri Kuwayama (Hemiptera: Liviidae) using long-read genome sequencing. Sci Rep 2022; 12:125. [PMID: 34996906 PMCID: PMC8741817 DOI: 10.1038/s41598-021-03184-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 11/26/2021] [Indexed: 01/23/2023] Open
Abstract
Wolbachia, a gram-negative \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\mathrm{\alpha }$$\end{document}α-proteobacterium, is an endosymbiont found in some arthropods and nematodes. Diaphorina citri Kuwayama, the vector of ‘Candidatus Liberibacter asiaticus’ (CLas), are naturally infected with a strain of Wolbachia (wDi), which has been shown to colocalize with the bacteria pathogens CLas, the pathogen associated with huanglongbing (HLB) disease of citrus. The relationship between wDi and CLas is poorly understood in part because the complete genome of wDi has not been available. Using high-quality long-read PacBio circular consensus sequences, we present the largest complete circular wDi genome among supergroup-B members. The assembled circular chromosome is 1.52 megabases with 95.7% genome completeness with contamination of 1.45%, as assessed by checkM. We identified Insertion Sequences (ISs) and prophage genes scattered throughout the genomes. The proteins were annotated using Pfam, eggNOG, and COG that assigned unique domains and functions. The wDi genome was compared with previously sequenced Wolbachia genomes using pangenome and phylogenetic analyses. The availability of a complete circular chromosome of wDi will facilitate understanding of its role within the insect vector, which may assist in developing tools for disease management. This information also provides a baseline for understanding phylogenetic relationships among Wolbachia of other insect vectors.
Collapse
Affiliation(s)
- Surendra Neupane
- Entomology and Nematology Department, Citrus Research and Education Center/IFAS, University of Florida, Lake Alfred, Florida, 33850, USA
| | - Sylvia I Bonilla
- Entomology and Nematology Department, Citrus Research and Education Center/IFAS, University of Florida, Lake Alfred, Florida, 33850, USA
| | - Andrew M Manalo
- Entomology and Nematology Department, Citrus Research and Education Center/IFAS, University of Florida, Lake Alfred, Florida, 33850, USA
| | - Kirsten S Pelz-Stelinski
- Entomology and Nematology Department, Citrus Research and Education Center/IFAS, University of Florida, Lake Alfred, Florida, 33850, USA.
| |
Collapse
|