1
|
Herzog KS, Wu R, Hawdon JM, Nejsum P, Fauver JR. Assessing de novo parasite genomes assembled using only Oxford Nanopore Technologies MinION data. iScience 2024; 27:110614. [PMID: 39211578 PMCID: PMC11357801 DOI: 10.1016/j.isci.2024.110614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 06/09/2024] [Accepted: 07/26/2024] [Indexed: 09/04/2024] Open
Abstract
In this study, we assessed the quality of de novo genome assemblies for three species of parasitic nematodes (Brugia malayi, Trichuris trichiura, and Ancylostoma caninum) generated using only Oxford Nanopore Technologies MinION data. Assemblies were compared to current reference genomes and against additional assemblies that were supplemented with short-read Illumina data through polishing or hybrid assembly approaches. For each species, assemblies generated using only MinION data had similar or superior measures of contiguity, completeness, and gene content. In terms of gene composition, depending on the species, between 88.9 and 97.6% of complete coding sequences predicted in MinION data only assemblies were identical to those predicted in assemblies polished with Illumina data. Polishing MinION data only assemblies with Illumina data therefore improved gene-level accuracy to a degree. Furthermore, modified DNA extraction and library preparation protocols produced sufficient genomic DNA from B. malayi and T. trichiura to generate de novo assemblies from individual specimens.
Collapse
Affiliation(s)
- Kaylee S. Herzog
- Department of Epidemiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Rachel Wu
- Department of Epidemiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - John M. Hawdon
- Department of Microbiology, Immunology, and Tropical Medicine, The George Washington University, Washington, DC 20037, USA
| | - Peter Nejsum
- Department of Clinical Medicine, Aarhus University, 8200 Aarhus, Denmark
| | - Joseph R. Fauver
- Department of Epidemiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| |
Collapse
|
2
|
Vuruputoor VS, Starovoitov A, Cai Y, Liu Y, Rahmatpour N, Hedderson TA, Wilding N, Wegrzyn JL, Goffinet B. Crossroads of assembling a moss genome: navigating contaminants and horizontal gene transfer in the moss Physcomitrellopsis africana. G3 (BETHESDA, MD.) 2024; 14:jkae104. [PMID: 38781445 PMCID: PMC11228847 DOI: 10.1093/g3journal/jkae104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/03/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024]
Abstract
The first chromosome-scale reference genome of the rare narrow-endemic African moss Physcomitrellopsis africana (P. africana) is presented here. Assembled from 73 × Oxford Nanopore Technologies (ONT) long reads and 163 × Beijing Genomics Institute (BGI)-seq short reads, the 414 Mb reference comprises 26 chromosomes and 22,925 protein-coding genes [Benchmarking Universal Single-Copy Ortholog (BUSCO) scores: C:94.8% (D:13.9%)]. This genome holds 2 genes that withstood rigorous filtration of microbial contaminants, have no homolog in other land plants, and are thus interpreted as resulting from 2 unique horizontal gene transfers (HGTs) from microbes. Further, P. africana shares 176 of the 273 published HGT candidates identified in Physcomitrium patens (P. patens), but lacks 98 of these, highlighting that perhaps as many as 91 genes were acquired in P. patens in the last 40 million years following its divergence from its common ancestor with P. africana. These observations suggest rather continuous gene gains via HGT followed by potential losses during the diversification of the Funariaceae. Our findings showcase both dynamic flux in plant HGTs over evolutionarily "short" timescales, alongside enduring impacts of successful integrations, like those still functionally maintained in extant P. africana. Furthermore, this study describes the informatic processes employed to distinguish contaminants from candidate HGT events.
Collapse
Affiliation(s)
- Vidya S Vuruputoor
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Andrew Starovoitov
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Yuqing Cai
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
- Key Laboratory of Southern Subtropical Plant Diversity, Fairy Lake 518004, China
| | - Yang Liu
- State Key Laboratory of Agricultural Genomics, BGI-Shenzhen, Shenzhen 518083, China
- Key Laboratory of Southern Subtropical Plant Diversity, Fairy Lake 518004, China
| | - Nasim Rahmatpour
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| | - Terry A Hedderson
- Department of Biological Sciences, Bolus Herbarium, University of Cape Town, Private Bag, 7701 Rondebosch, South Africa
| | - Nicholas Wilding
- UMR PVBMT, BP 7151, Université de La Réunion, chemin de l’IRAT, 97410 Saint-Pierre, La Réunion, France
- Missouri Botanical Garden, P.O. Box 299, St. Louis, MO 63166-0299, USA
| | - Jill L Wegrzyn
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
- Institute for Systems Genomics, University of Connecticut, Storrs, CT 06269, USA
| | - Bernard Goffinet
- Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
3
|
Fava S, Sollitto M, Racaku M, Iannucci A, Benazzo A, Ancona L, Gratton P, Florian F, Pallavicini A, Ciofi C, Cesaroni D, Gerdol M, Sbordoni V, Bertorelle G, Trucchi E. Chromosome-Level Reference Genome of the Ponza Grayling (Hipparchia sbordonii), an Italian Endemic and Endangered Butterfly. Genome Biol Evol 2024; 16:evae136. [PMID: 39023104 PMCID: PMC11255612 DOI: 10.1093/gbe/evae136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/13/2024] [Indexed: 07/20/2024] Open
Abstract
Islands are crucial evolutionary hotspots, providing unique opportunities for differentiation of novel biodiversity and long-term segregation of endemic species. Islands are also fragile ecosystems, where biodiversity is more exposed to environmental and anthropogenic pressures than on continents. The Ponza grayling, Hipparchia sbordonii, is an endemic butterfly species that is currently found only in two tiny islands of the Pontine archipelago, off the coast of Italy, occupying an area smaller than 10 km2. It has been classified as Endangered (IUCN) because of the extremely limited area of occurrence, population fragmentation, and the recent demographic decline. Thanks to a combination of different assemblers of long and short genomic reads, bulk transcriptome RNAseq, and synteny analysis with phylogenetically close butterflies, we produced a highly contiguous, chromosome-scale annotated reference genome for the Ponza grayling, including 28 autosomes and the Z sexual chromosomes. The final assembly spanned 388.61 Gb with a contig N50 of 14.5 Mb and a BUSCO completeness score of 98.5%. Synteny analysis using four other butterfly species revealed high collinearity with Hipparchia semele and highlighted 10 intrachromosomal inversions longer than 10 kb, of which two appeared on the lineage leading to H. sbordonii. Our results show that a chromosome-scale reference genome is attainable also when chromatin conformation data may be impractical or present specific technical challenges. The high-quality genomic resource for H. sbordonii opens up new opportunities for the accurate assessment of genetic diversity and genetic load and for the investigations of the genomic novelties characterizing the evolutionary path of this endemic island species.
Collapse
Affiliation(s)
- Sebastiano Fava
- Department of Life and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| | - Marco Sollitto
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Mbarsid Racaku
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | | | - Andrea Benazzo
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Lorena Ancona
- Department of Life and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| | - Paolo Gratton
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Fiorella Florian
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | | | - Claudio Ciofi
- Department of Biology, University of Florence, Florence, Italy
| | | | - Marco Gerdol
- Department of Life Sciences, University of Trieste, Trieste, Italy
| | - Valerio Sbordoni
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - Giorgio Bertorelle
- Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy
| | - Emiliano Trucchi
- Department of Life and Environmental Sciences, Marche Polytechnic University, Ancona, Italy
| |
Collapse
|
4
|
Goussarov G, Mysara M, Cleenwerck I, Claesen J, Leys N, Vandamme P, Van Houdt R. Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities. MICROBIOLOGY (READING, ENGLAND) 2024; 170:001469. [PMID: 38916949 PMCID: PMC11261854 DOI: 10.1099/mic.0.001469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/23/2024] [Indexed: 06/26/2024]
Abstract
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Bioinformatics group, Information Technology & Computer Science, Nile University, Giza, Egypt
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
5
|
Dimens PV, Jones KL, Margulies D, Scholey V, Cusatti S, McPeak B, Hildahl TE, Saillant EAE. Genomic resources for the Yellowfin tuna Thunnus albacares. Mol Biol Rep 2024; 51:232. [PMID: 38281308 DOI: 10.1007/s11033-023-09117-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 12/06/2023] [Indexed: 01/30/2024]
Abstract
BACKGROUND The Yellowfin tuna (Thunnus albacares) is a large tuna exploited by major fisheries in tropical and subtropical waters of all oceans except the Mediterranean Sea. Genomic studies of population structure, adaptive variation or of the genetic basis of phenotypic traits are needed to inform fisheries management but are currently limited by the lack of a reference genome for this species. Here we report a draft genome assembly and a linkage map for use in genomic studies of T. albacares. METHODS AND RESULTS Illumina and PacBio SMRT sequencing were used in combination to generate a hybrid assembly that comprises 743,073,847 base pairs contained in 2,661 scaffolds. The assembly has a N50 of 351,587 and complete and partial BUSCO scores of 86.47% and 3.63%, respectively. Double-digest restriction associated DNA (ddRAD) was used to genotype the 2 parents and 164 of their F1 offspring resulting from a controlled breeding cross, retaining 19,469 biallelic single nucleotide polymorphism (SNP) loci. The SNP loci were used to construct a linkage map that features 24 linkage groups that represent the 24 chromosomes of yellowfin tuna. The male and female maps span 1,243.8 cM and 1,222.9 cM, respectively. The map was used to anchor the assembly in 24 super-scaffolds that contain 79% of the yellowfin tuna genome. Gene prediction identified 46,992 putative genes 20,203 of which could be annotated via gene ontology. CONCLUSIONS The draft reference will be valuable to interpret studies of genome wide variation in T. albacares and other Scombroid species.
Collapse
Affiliation(s)
- Pavel V Dimens
- School of Ocean Science and Engineering, The University of Southern Mississippi, Ocean Springs, MS, 39564, USA
| | | | - Daniel Margulies
- Inter-American Tropical Tuna Commission, 8901 La Jolla Shores Drive, La Jolla, CA, 92037, USA
| | - Vernon Scholey
- Inter-American Tropical Tuna Commission, 8901 La Jolla Shores Drive, La Jolla, CA, 92037, USA
| | - Susana Cusatti
- Inter-American Tropical Tuna Commission, 8901 La Jolla Shores Drive, La Jolla, CA, 92037, USA
| | - Brooke McPeak
- School of Ocean Science and Engineering, The University of Southern Mississippi, Ocean Springs, MS, 39564, USA
| | - Tami E Hildahl
- School of Ocean Science and Engineering, The University of Southern Mississippi, Ocean Springs, MS, 39564, USA
| | - Eric A E Saillant
- School of Ocean Science and Engineering, The University of Southern Mississippi, Ocean Springs, MS, 39564, USA.
| |
Collapse
|
6
|
Dvorianinova EM, Sigova EA, Mollaev TD, Rozhmina TA, Kudryavtseva LP, Novakovskiy RO, Turba AA, Zhernova DA, Borkhert EV, Pushkova EN, Melnikova NV, Dmitriev AA. Comparative Genomic Analysis of Colletotrichum lini Strains with Different Virulence on Flax. J Fungi (Basel) 2023; 10:32. [PMID: 38248942 PMCID: PMC10817032 DOI: 10.3390/jof10010032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 12/04/2023] [Accepted: 12/24/2023] [Indexed: 01/23/2024] Open
Abstract
Colletotrichum lini is a flax fungal pathogen. The genus comprises differently virulent strains, leading to significant yield losses. However, there were no attempts to investigate the molecular mechanisms of C. lini pathogenicity from high-quality genome assemblies until this study. In this work, we sequenced the genomes of three C. lini strains of high (#390-1), medium (#757), and low (#771) virulence. We obtained more than 100× genome coverage with Oxford Nanopore Technologies reads (N50 = 12.1, 6.1, 5.0 kb) and more than 50× genome coverage with Illumina data (150 + 150 bp). Several assembly strategies were tested. The final assemblies were obtained using the Canu-Racon ×2-Medaka-Polca scheme. The assembled genomes had a size of 54.0-55.3 Mb, 26-32 contigs, N50 values > 5 Mb, and BUSCO completeness > 96%. A comparative genomic analysis showed high similarity among mitochondrial and nuclear genomes. However, a rearrangement event and the loss of a 0.7 Mb contig were revealed. After genome annotation with Funannotate, secreting proteins were selected using SignalP, and candidate effectors were predicted among them using EffectorP. The analysis of the InterPro annotations of predicted effectors revealed unique protein categories in each strain. The assembled genomes and the conducted comparative analysis extend the knowledge of the genetic diversity of C. lini and form the basis for establishing the molecular mechanisms of its pathogenicity.
Collapse
Affiliation(s)
- Ekaterina M. Dvorianinova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| | - Elizaveta A. Sigova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
- Moscow Institute of Physics and Technology, Moscow 141701, Russia
| | - Timur D. Mollaev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
- Agrarian and Technological Institute, Peoples Friendship University of Russia (RUDN University), Moscow 117198, Russia
| | - Tatiana A. Rozhmina
- Federal Research Center for Bast Fiber Crops, Torzhok 172002, Russia; (T.A.R.); (L.P.K.)
| | | | - Roman O. Novakovskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| | - Anastasia A. Turba
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| | - Daiana A. Zhernova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
- Faculty of Biology, Lomonosov Moscow State University, Moscow 119234, Russia
| | - Elena V. Borkhert
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| | - Elena N. Pushkova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| | - Nataliya V. Melnikova
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| | - Alexey A. Dmitriev
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia; (E.A.S.); (T.D.M.); (R.O.N.); (A.A.T.); (D.A.Z.); (E.V.B.); (E.N.P.); (N.V.M.)
| |
Collapse
|
7
|
Schelkunov MI. Mabs, a suite of tools for gene-informed genome assembly. BMC Bioinformatics 2023; 24:377. [PMID: 37794322 PMCID: PMC10548655 DOI: 10.1186/s12859-023-05499-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 09/26/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Despite constantly improving genome sequencing methods, error-free eukaryotic genome assembly has not yet been achieved. Among other kinds of problems of eukaryotic genome assembly are so-called "haplotypic duplications", which may manifest themselves as cases of alleles being mistakenly assembled as paralogues. Haplotypic duplications are dangerous because they create illusions of gene family expansions and, thus, may lead scientists to incorrect conclusions about genome evolution and functioning. RESULTS Here, I present Mabs, a suite of tools that serve as parameter optimizers of the popular genome assemblers Hifiasm and Flye. By optimizing the parameters of Hifiasm and Flye, Mabs tries to create genome assemblies with the genes assembled as accurately as possible. Tests on 6 eukaryotic genomes showed that in 6 out of 6 cases, Mabs created assemblies with more accurately assembled genes than those generated by Hifiasm and Flye when they were run with default parameters. When assemblies of Mabs, Hifiasm and Flye were postprocessed by a popular tool for haplotypic duplication removal, Purge_dups, genes were better assembled by Mabs in 5 out of 6 cases. CONCLUSIONS Mabs is useful for making high-quality genome assemblies. It is available at https://github.com/shelkmike/Mabs.
Collapse
|
8
|
de Almeida FM, de Campos TA, Pappas Jr GJ. Scalable and versatile container-based pipelines for de novo genome assembly and bacterial annotation. F1000Res 2023; 12:1205. [PMID: 37970066 PMCID: PMC10646344 DOI: 10.12688/f1000research.139488.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/16/2023] [Indexed: 11/17/2023] Open
Abstract
Background: Advancements in DNA sequencing technology have transformed the field of bacterial genomics, allowing for faster and more cost effective chromosome level assemblies compared to a decade ago. However, transforming raw reads into a complete genome model is a significant computational challenge due to the varying quality and quantity of data obtained from different sequencing instruments, as well as intrinsic characteristics of the genome and desired analyses. To address this issue, we have developed a set of container-based pipelines using Nextflow, offering both common workflows for inexperienced users and high levels of customization for experienced ones. Their processing strategies are adaptable based on the sequencing data type, and their modularity enables the incorporation of new components to address the community's evolving needs. Methods: These pipelines consist of three parts: quality control, de novo genome assembly, and bacterial genome annotation. In particular, the genome annotation pipeline provides a comprehensive overview of the genome, including standard gene prediction and functional inference, as well as predictions relevant to clinical applications such as virulence and resistance gene annotation, secondary metabolite detection, prophage and plasmid prediction, and more. Results: The annotation results are presented in reports, genome browsers, and a web-based application that enables users to explore and interact with the genome annotation results. Conclusions: Overall, our user-friendly pipelines offer a seamless integration of computational tools to facilitate routine bacterial genomics research. The effectiveness of these is illustrated by examining the sequencing data of a clinical sample of Klebsiella pneumoniae.
Collapse
Affiliation(s)
- Felipe Marques de Almeida
- Programa de Pós-graduação em Biologia Molecular, Universidade de Brasilia, Brasília, FD, 70910-900, Brazil
- Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Tatiana Amabile de Campos
- Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
- Programa de Pós-graduação em Biologia Microbiana, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| | - Georgios Joannis Pappas Jr
- Programa de Pós-graduação em Biologia Molecular, Universidade de Brasilia, Brasília, FD, 70910-900, Brazil
- Departamento de Biologia Celular, Universidade de Brasília, Brasília, DF, 70910-900, Brazil
| |
Collapse
|
9
|
Mochizuki T, Sakamoto M, Tanizawa Y, Nakayama T, Tanifuji G, Kamikawa R, Nakamura Y. A practical assembly guideline for genomes with various levels of heterozygosity. Brief Bioinform 2023; 24:bbad337. [PMID: 37798248 PMCID: PMC10555665 DOI: 10.1093/bib/bbad337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/06/2023] [Accepted: 09/03/2023] [Indexed: 10/07/2023] Open
Abstract
Although current long-read sequencing technologies have a long-read length that facilitates assembly for genome reconstruction, they have high sequence errors. While various assemblers with different perspectives have been developed, no systematic evaluation of assemblers with long reads for diploid genomes with varying heterozygosity has been performed. Here, we evaluated a series of processes, including the estimation of genome characteristics such as genome size and heterozygosity, de novo assembly, polishing, and removal of allelic contigs, using six genomes with various heterozygosity levels. We evaluated five long-read-only assemblers (Canu, Flye, miniasm, NextDenovo and Redbean) and five hybrid assemblers that combine short and long reads (HASLR, MaSuRCA, Platanus-allee, SPAdes and WENGAN) and proposed a concrete guideline for the construction of haplotype representation according to the degree of heterozygosity, followed by polishing and purging haplotigs, using stable and high-performance assemblers: Redbean, Flye and MaSuRCA.
Collapse
Affiliation(s)
| | - Mika Sakamoto
- Genome Informatics Laboratory, National Institute of Genetics
| | | | - Takuro Nakayama
- Division of Life Sciences Center for Computational Sciences, University of Tsukuba, Japan
| | - Goro Tanifuji
- Department of Zoology, National Museum of Nature and Science
| | | | | |
Collapse
|
10
|
Erdos Z, Studholme DJ, Raymond B, Sharma MD. De novo genome assembly of Akanthomyces muscarius, a biocontrol agent of insect agricultural pests. Access Microbiol 2023; 5:acmi000568.v3. [PMID: 37424543 PMCID: PMC10323777 DOI: 10.1099/acmi.0.000568.v3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 03/29/2023] [Indexed: 07/11/2023] Open
Abstract
The entomopathogenic fungus Akanthomyces muscarius is commonly used in agriculture to manage insect pests. Besides its use as a commercially important biological control agent, it also presents a potential model for studying host-pathogen interactions and the evolution of virulence in a laboratory setting. Here, we describe the first high-quality genome sequence for A. muscarius. We used long- and short-read sequencing to assemble a sequence of 36.1 Mb with an N50 of 4.9 Mb. Genome annotation predicted 12347 genes, with 96.6 % completeness based on the core Hypocrealen gene set. The high-quality assembly and annotation of A. muscarius presented in this study provides an essential tool for future research on this commercially important species.
Collapse
Affiliation(s)
- Zoltan Erdos
- Ecology and Conservation, University of Exeter, Penryn, TR9 10FE, UK
| | | | - Ben Raymond
- Ecology and Conservation, University of Exeter, Penryn, TR9 10FE, UK
| | | |
Collapse
|
11
|
Luo J, Guan T, Chen G, Yu Z, Zhai H, Yan C, Luo H. SLHSD: hybrid scaffolding method based on short and long reads. Brief Bioinform 2023; 24:7152317. [PMID: 37141142 DOI: 10.1093/bib/bbad169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 01/08/2023] [Accepted: 04/12/2023] [Indexed: 05/05/2023] Open
Abstract
In genome assembly, scaffolding can obtain more complete and continuous scaffolds. Current scaffolding methods usually adopt one type of read to construct a scaffold graph and then orient and order contigs. However, scaffolding with the strengths of two or more types of reads seems to be a better solution to some tricky problems. Combining the advantages of different types of data is significant for scaffolding. Here, a hybrid scaffolding method (SLHSD) is present that simultaneously leverages the precision of short reads and the length advantage of long reads. Building an optimal scaffold graph is an important foundation for getting scaffolds. SLHSD uses a new algorithm that combines long and short read alignment information to determine whether to add an edge and how to calculate the edge weight in a scaffold graph. In addition, SLHSD develops a strategy to ensure that edges with high confidence can be added to the graph with priority. Then, a linear programming model is used to detect and remove remaining false edges in the graph. We compared SLHSD with other scaffolding methods on five datasets. Experimental results show that SLHSD outperforms other methods. The open-source code of SLHSD is available at https://github.com/luojunwei/SLHSD.
Collapse
Affiliation(s)
- Junwei Luo
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Ting Guan
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Guolin Chen
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Zhonghua Yu
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Haixia Zhai
- School of Software, Henan Polytechnic University, Jiaozuo 454003, China
| | - Chaokun Yan
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| | - Huimin Luo
- School of Computer and Information Engineering, Henan University, Kaifeng 475001, China
| |
Collapse
|
12
|
Sereika M, Petriglieri F, Jensen TBN, Sannikov A, Hoppe M, Nielsen PH, Marshall IPG, Schramm A, Albertsen M. Closed genomes uncover a saltwater species of Candidatus Electronema and shed new light on the boundary between marine and freshwater cable bacteria. THE ISME JOURNAL 2023; 17:561-569. [PMID: 36697964 PMCID: PMC10030654 DOI: 10.1038/s41396-023-01372-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 01/11/2023] [Accepted: 01/13/2023] [Indexed: 01/26/2023]
Abstract
Cable bacteria of the Desulfobulbaceae family are centimeter-long filamentous bacteria, which are capable of conducting long-distance electron transfer. Currently, all cable bacteria are classified into two candidate genera: Candidatus Electronema, typically found in freshwater environments, and Candidatus Electrothrix, typically found in saltwater environments. This taxonomic framework is based on both 16S rRNA gene sequences and metagenome-assembled genome (MAG) phylogenies. However, most of the currently available MAGs are highly fragmented, incomplete, and thus likely miss key genes essential for deciphering the physiology of cable bacteria. Also, a closed, circular genome of cable bacteria has not been published yet. To address this, we performed Nanopore long-read and Illumina short-read shotgun sequencing of selected environmental samples and a single-strain enrichment of Ca. Electronema aureum. We recovered multiple cable bacteria MAGs, including two circular and one single-contig. Phylogenomic analysis, also confirmed by 16S rRNA gene-based phylogeny, classified one circular MAG and the single-contig MAG as novel species of cable bacteria, which we propose to name Ca. Electronema halotolerans and Ca. Electrothrix laxa, respectively. The Ca. Electronema halotolerans, despite belonging to the previously recognized freshwater genus of cable bacteria, was retrieved from brackish-water sediment. Metabolic predictions showed several adaptations to a high salinity environment, similar to the "saltwater" Ca. Electrothrix species, indicating how Ca. Electronema halotolerans may be the evolutionary link between marine and freshwater cable bacteria lineages.
Collapse
Affiliation(s)
- Mantas Sereika
- Center for Microbial Communities, Aalborg University, Aalborg, Denmark
| | | | | | - Artur Sannikov
- Center for Electromicrobiology, Aarhus University, Aarhus, Denmark
| | - Morten Hoppe
- Center for Electromicrobiology, Aarhus University, Aarhus, Denmark
| | | | - Ian P G Marshall
- Center for Electromicrobiology, Aarhus University, Aarhus, Denmark
| | - Andreas Schramm
- Center for Electromicrobiology, Aarhus University, Aarhus, Denmark
| | - Mads Albertsen
- Center for Microbial Communities, Aalborg University, Aalborg, Denmark.
| |
Collapse
|
13
|
Thippabhotla S, Liu B, Podgorny A, Yooseph S, Yang Y, Zhang J, Zhong C. Integrated de novo gene prediction and peptide assembly of metagenomic sequencing data. NAR Genom Bioinform 2023; 5:lqad023. [PMID: 36915411 PMCID: PMC10006731 DOI: 10.1093/nargab/lqad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 12/03/2022] [Accepted: 02/18/2023] [Indexed: 03/16/2023] Open
Abstract
Metagenomics is the study of all genomic content contained in given microbial communities. Metagenomic functional analysis aims to quantify protein families and reconstruct metabolic pathways from the metagenome. It plays a central role in understanding the interaction between the microbial community and its host or environment. De novo functional analysis, which allows the discovery of novel protein families, remains challenging for high-complexity communities. There are currently three main approaches for recovering novel genes or proteins: de novo nucleotide assembly, gene calling and peptide assembly. Unfortunately, their information dependency has been overlooked, and each has been formulated as an independent problem. In this work, we develop a sophisticated workflow called integrated Metagenomic Protein Predictor (iMPP), which leverages the information dependencies for better de novo functional analysis. iMPP contains three novel modules: a hybrid assembly graph generation module, a graph-based gene calling module, and a peptide assembly-based refinement module. iMPP significantly improved the existing gene calling sensitivity on unassembled metagenomic reads, achieving a 92-97% recall rate at a high precision level (>85%). iMPP further allowed for more sensitive and accurate peptide assembly, recovering more reference proteins and delivering more hypothetical protein sequences. The high performance of iMPP can provide a more comprehensive and unbiased view of the microbial communities under investigation. iMPP is freely available from https://github.com/Sirisha-t/iMPP.
Collapse
Affiliation(s)
- Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
| | - Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
| | - Adam Podgorny
- Center for Computational Biology, The University of Kansas, Lawrence, KS 66045, USA
| | - Shibu Yooseph
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Youngik Yang
- National Marine Biodiversity Institute of Korea, 101-75, Jangsan-ro, Janghang-eup, Seochun-gun, Chungchungnam-do, 33662, South Korea
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS 66160, USA.,Department of Cancer Biology, University of Kansas Cancer Center; Kansas City, KS 66160, USA
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS 66045, USA
| |
Collapse
|
14
|
Sekino M, Hashimoto K, Nakamichi R, Yamamoto M, Fujinami Y, Sasaki T. Introgressive hybridization in the west Pacific pen shells (genus Atrina): Restricted interspecies gene flow within the genome. Mol Ecol 2023; 32:2945-2963. [PMID: 36855846 DOI: 10.1111/mec.16908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 02/03/2023] [Accepted: 02/14/2023] [Indexed: 03/02/2023]
Abstract
A compelling interest in marine biology is to elucidate how species boundaries between sympatric free-spawning marine invertebrates such as bivalve molluscs are maintained in the face of potential hybridization. Hybrid zones provide the natural resources for us to study the underlying genetic mechanisms of reproductive isolation between hybridizing species. Against this backdrop, we examined the occurrence of introgressive hybridization (introgression) between two bivalves distributed in the western Pacific margin, Atrina japonica and Atrina lischkeana, based on single-nucleotide polymorphisms (SNPs) derived from restriction site-associated DNA sequencing. Using 1066 ancestry-informative SNP sites, we also investigated the extent of introgression within the genome to search for SNP sites with reduced interspecies gene flow. A series of our individual-level clustering analyses including the principal component analysis, Bayesian model-based clustering, and triangle plotting based on ancestry-heterozygosity relationships for an admixed population sample from the Seto Inland Sea (Japan) consistently suggested the presence of specimens with varying degrees of genomic admixture, thereby implying that the two species are not completely isolated. The Bayesian genomic cline analysis identified 10 SNP sites with reduced introgression, each of which was located within a genic region or an intergenic region physically close to a functional gene. No, or very few, heterozygotes were observed at these sites in the hybrid zone, suggesting that selection acts against heterozygotes. Accordingly, we raised the possibility that the SNP sites are within genomic regions that are incompatible between the two species. Our finding of restricted interspecies gene flow at certain genomic regions gives new insight into the maintenance of species boundaries in hybridizing broadcast-spawning molluscs.
Collapse
Affiliation(s)
- Masashi Sekino
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa, Japan
| | - Kazumasa Hashimoto
- Fisheries Technology Institute, Japan Fisheries Research and Education Agency, Nagasaki, Japan
| | - Reiichiro Nakamichi
- Fisheries Resources Institute, Japan Fisheries Research and Education Agency, Yokohama, Kanagawa, Japan
| | - Masayuki Yamamoto
- Fisheries Division, Kagawa Prefectural Government, Takamatsu, Kagawa, Japan
| | - Yuichiro Fujinami
- Goto Field Station, Fisheries Technology Institute, Japan Fisheries Research and Education Agency, Nagasaki, Japan
| | - Takenori Sasaki
- The University Museum, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
15
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:v15020587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
- Correspondence:
| |
Collapse
|
16
|
McLay TGB, Murphy DJ, Holmes GD, Mathews S, Brown GK, Cantrill DJ, Udovicic F, Allnutt TR, Jackson CJ. A genome resource for Acacia, Australia's largest plant genus. PLoS One 2022; 17:e0274267. [PMID: 36240205 PMCID: PMC9565413 DOI: 10.1371/journal.pone.0274267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 08/24/2022] [Indexed: 11/05/2022] Open
Abstract
Acacia (Leguminosae, Caesalpinioideae, mimosoid clade) is the largest and most widespread genus of plants in the Australian flora, occupying and dominating a diverse range of environments, with an equally diverse range of forms. For a genus of its size and importance, Acacia currently has surprisingly few genomic resources. Acacia pycnantha, the golden wattle, is a woody shrub or tree occurring in south-eastern Australia and is the country's floral emblem. To assemble a genome for A. pycnantha, we generated long-read sequences using Oxford Nanopore Technology, 10x Genomics Chromium linked reads, and short-read Illumina sequences, and produced an assembly spanning 814 Mb, with a scaffold N50 of 2.8 Mb, and 98.3% of complete Embryophyta BUSCOs. Genome annotation predicted 47,624 protein-coding genes, with 62.3% of the genome predicted to comprise transposable elements. Evolutionary analyses indicated a shared genome duplication event in the Caesalpinioideae, and conflict in the relationships between Cercis (subfamily Cercidoideae) and subfamilies Caesalpinioideae and Papilionoideae (pea-flowered legumes). Comparative genomics identified a suite of expanded and contracted gene families in A. pycnantha, and these were annotated with both GO terms and KEGG functional categories. One expanded gene family of particular interest is involved in flowering time and may be associated with the characteristic synchronous flowering of Acacia. This genome assembly and annotation will be a valuable resource for all studies involving Acacia, including the evolution, conservation, breeding, invasiveness, and physiology of the genus, and for comparative studies of legumes.
Collapse
Affiliation(s)
- Todd G. B. McLay
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
- School of BioSciences, The University of Melbourne, Parkville, Victoria, Australia
- Centre for Australian Biodiversity Research, CSIRO, Black Mountain, Australian Capital Territory, Australia
| | - Daniel J. Murphy
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| | - Gareth D. Holmes
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| | - Sarah Mathews
- Centre for Australian Biodiversity Research, CSIRO, Black Mountain, Australian Capital Territory, Australia
- Department of Biological Sciences, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Gillian K. Brown
- Queensland Herbarium, Department of Environment and Science, Toowong, Queensland, Australia
| | | | - Frank Udovicic
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| | | | - Chris J. Jackson
- Royal Botanic Gardens Victoria, South Yarra, Victoria, Australia
| |
Collapse
|
17
|
Greenberg G, Shomorony I. Improving bacterial genome assembly using a test of strand orientation. Bioinformatics 2022; 38:ii34-ii41. [PMID: 36124787 DOI: 10.1093/bioinformatics/btac516] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
SUMMARY The complexity of genome assembly is due in large part to the presence of repeats. In particular, large reverse-complemented repeats can lead to incorrect inversions of large segments of the genome. To detect and correct such inversions in finished bacterial genomes, we propose a statistical test based on tetranucleotide frequency (TNF), which determines whether two segments from the same genome are of the same or opposite orientation. In most cases, the test neatly partitions the genome into two segments of roughly equal length with seemingly opposite orientations. This corresponds to the segments between the DNA replication origin and terminus, which were previously known to have distinct nucleotide compositions. We show that, in several cases where this balanced partition is not observed, the test identifies a potential inverted misassembly, which is validated by the presence of a reverse-complemented repeat at the boundaries of the inversion. After inverting the sequence between the repeat, the balance of the misassembled genome is restored. Our method identifies 31 potential misassemblies in the NCBI database, several of which are further supported by a reassembly of the read data. AVAILABILITY AND IMPLEMENTATION A github repository is available at https://github.com/gcgreenberg/Oriented-TNF.git. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Grant Greenberg
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
| | - Ilan Shomorony
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
| |
Collapse
|
18
|
Whole genome assembly of the armored loricariid catfish Ancistrus triradiatus highlights herbivory signatures. Mol Genet Genomics 2022; 297:1627-1642. [PMID: 36006456 PMCID: PMC9596584 DOI: 10.1007/s00438-022-01947-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 08/12/2022] [Indexed: 11/01/2022]
Abstract
The catfish Ancistrus triradiatus belongs to the species-rich family Loricariidae. Loricariids display remarkable traits such as herbivory, a benthic lifestyle, the absence of scales but the presence of dermal bony plates. They are exported as ornamental fish worldwide, with escaped fishes becoming a threat locally. Although genetic and phylogenetic studies are continuously increasing and developmental genetic investigations are underway, no genome assembly has been formally proposed for Loricariidae yet. We report a high-quality genome assembly of Ancistrus triradiatus using long and short reads, and a newly assembled transcriptome. The genome assembly is composed of 9530 scaffolds, including 85.6% of ray-finned fish BUSCOs, and 26,885 predicted protein-coding genes. The genomic GC content is higher than in other catfishes, reflecting the higher metabolism associated with herbivory. The examination of the SCPP gene family indicates that the genes presumably triggering scale loss when absent, are present in the scaleless A. triradiatus, questioning their explanatory role. The analysis of the opsin gene repertoire revealed that gene losses associated to the nocturnal lifestyle of catfishes were not entirely found in A. triradiatus, as the UV-sensitive opsin 5 is present. Finally, most gene family expansions were related to immunity except the gamma crystallin gene family which controls pupil shape and sub-aquatic vision. Thus, the genome of A. triradiatus reveals that fish herbivory may be related to the photic zone habitat, conditions metabolism, photoreception and visual functions. This genome is the first for the catfish suborder Loricarioidei and will serve as backbone for future genetic, developmental and conservation studies.
Collapse
|
19
|
Mgwatyu Y, Cornelissen S, van Heusden P, Stander A, Ranketse M, Hesse U. Establishing MinION Sequencing and Genome Assembly Procedures for the Analysis of the Rooibos ( Aspalathus linearis) Genome. PLANTS (BASEL, SWITZERLAND) 2022; 11:2156. [PMID: 36015459 PMCID: PMC9416007 DOI: 10.3390/plants11162156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 08/08/2022] [Accepted: 08/14/2022] [Indexed: 11/17/2022]
Abstract
While plant genome analysis is gaining speed worldwide, few plant genomes have been sequenced and analyzed on the African continent. Yet, this information holds the potential to transform diverse industries as it unlocks medicinally and industrially relevant biosynthesis pathways for bioprospecting. Considering that South Africa is home to the highly diverse Cape Floristic Region, local establishment of methods for plant genome analysis is essential. Long-read sequencing is becoming standard procedure for plant genome research, as these reads can span repetitive regions of the DNA, substantially facilitating reassembly of a contiguous genome. With the MinION, Oxford Nanopore offers a cost-efficient sequencing method to generate long reads; however, DNA purification protocols must be adapted for each plant species to generate ultra-pure DNA, essential for these analyses. Here, we describe a cost-effective procedure for the extraction and purification of plant DNA and evaluate diverse genome assembly approaches for the reconstruction of the genome of rooibos (Aspalathus linearis), an endemic South African medicinal plant widely used for tea production. We discuss the pros and cons of nine tested assembly programs, specifically Redbean and NextDenovo, which generated the most contiguous assemblies, and Flye, which produced an assembly closest to the predicted genome size.
Collapse
Affiliation(s)
- Yamkela Mgwatyu
- Department of Biotechnology, University of the Western Cape, Robert Sobukwe Road, Bellville 7535, South Africa
| | - Stephanie Cornelissen
- Agricultural Research Council, Biotechnology Platform, 100 Old Soutpans Road, Onderstepoort 0110, South Africa
| | - Peter van Heusden
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Robert Sobukwe Road, Bellville 7535, South Africa
| | - Allison Stander
- Department of Biotechnology, University of the Western Cape, Robert Sobukwe Road, Bellville 7535, South Africa
| | - Mary Ranketse
- Agricultural Research Council, Biotechnology Platform, 100 Old Soutpans Road, Onderstepoort 0110, South Africa
| | - Uljana Hesse
- Department of Biotechnology, University of the Western Cape, Robert Sobukwe Road, Bellville 7535, South Africa
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Robert Sobukwe Road, Bellville 7535, South Africa
- Institute for Microbial Biotechnology and Metagenomics, University of the Western Cape, Robert Sobukwe Road, Bellville 7535, South Africa
| |
Collapse
|
20
|
A high-quality genome of the dobsonfly Neoneuromus ignobilis reveals molecular convergences in aquatic insects. Genomics 2022; 114:110437. [PMID: 35902070 DOI: 10.1016/j.ygeno.2022.110437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 07/03/2022] [Accepted: 07/21/2022] [Indexed: 11/22/2022]
Abstract
Neoneuromus ignobilis is an archaic holometabolous aquatic predatory insect. However, a lack of genomic resources hinders the use of whole genome sequencing to explore their genetic basis and molecular mechanisms for adaptive evolution. Here, we provided a high-contiguity, chromosome-level genome assembly of N. ignobilis using high coverage Nanopore and PacBio reads with the Hi-C technique. The final assembly is 480.67 MB in size, containing 12 telomere-ended pseudochromosomes with only 17 gaps. We compared 42 hexapod species genomes including six independent lineages comprising 11 aquatic insects, and found convergent expansions of long wavelength-sensitive and blue-sensitive opsins, thermal stress response TRP channels, and sulfotransferases in aquatic insects, which may be related to their aquatic adaptation. We also detected strong nonrandom signals of convergent amino acid substitutions in aquatic insects. Collectively, our comparative genomic analysis revealed the evidence of molecular convergences in aquatic insects during both gene family evolution and convergent amino acid substitutions.
Collapse
|
21
|
Goussarov G, Mysara M, Vandamme P, Van Houdt R. Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data. Microbiologyopen 2022; 11:e1298. [PMID: 35765182 PMCID: PMC9179125 DOI: 10.1002/mbo3.1298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/18/2022] Open
Abstract
The rise of metagenomics offers a leap forward for understanding the genetic diversity of microorganisms in many different complex environments by providing a platform that can identify potentially unlimited numbers of known and novel microorganisms. As such, it is impossible to imagine new major initiatives without metagenomics. Nevertheless, it represents a relatively new discipline with various levels of complexity and demands on bioinformatics. The underlying principles and methods used in metagenomics are often seen as common knowledge and often not detailed or fragmented. Therefore, we reviewed these to guide microbiologists in taking the first steps into metagenomics. We specifically focus on a workflow aimed at reconstructing individual genomes, that is, metagenome-assembled genomes, integrating DNA sequencing, assembly, binning, identification and annotation.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| |
Collapse
|
22
|
Li H, Matsuda H, Tsuboyama A, Munakata R, Sugiyama A, Yazaki K. Inventory of ATP-binding cassette proteins in Lithospermum erythrorhizon as a model plant producing divergent secondary metabolites. DNA Res 2022; 29:6596041. [PMID: 35640979 PMCID: PMC9195045 DOI: 10.1093/dnares/dsac016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2021] [Accepted: 05/26/2022] [Indexed: 02/07/2023] Open
Abstract
ATP-binding cassette (ABC) proteins are the largest membrane transporter family in plants. In addition to transporting organic substances, these proteins function as ion channels and molecular switches. The development of multiple genes encoding ABC proteins has been associated with their various biological roles. Plants utilize many secondary metabolites to adapt to environmental stresses and to communicate with other organisms, with many ABC proteins thought to be involved in metabolite transport. Lithospermum erythrorhizon is regarded as a model plant for studying secondary metabolism, as cells in culture yielded high concentrations of meroterpenes and phenylpropanoids. Analysis of the genome and transcriptomes of L. erythrorhizon showed expression of genes encoding 118 ABC proteins, similar to other plant species. The number of expressed proteins in the half-size ABCA and full-size ABCB subfamilies was ca. 50% lower in L. erythrorhizon than in Arabidopsis, whereas there was no significant difference in the numbers of other expressed ABC proteins. Because many ABCG proteins are involved in the export of organic substances, members of this subfamily may play important roles in the transport of secondary metabolites that are secreted into apoplasts.
Collapse
Affiliation(s)
- Hao Li
- Research Institute for Sustainable Humanosphere, Kyoto University, Uji 611-0011, Japan
| | - Hinako Matsuda
- Research Institute for Sustainable Humanosphere, Kyoto University, Uji 611-0011, Japan
| | - Ai Tsuboyama
- Research Institute for Sustainable Humanosphere, Kyoto University, Uji 611-0011, Japan
| | - Ryosuke Munakata
- Research Institute for Sustainable Humanosphere, Kyoto University, Uji 611-0011, Japan
| | - Akifumi Sugiyama
- Research Institute for Sustainable Humanosphere, Kyoto University, Uji 611-0011, Japan
| | - Kazufumi Yazaki
- To whom correspondence should be addressed. Tel. +81 774 38 3617.
| |
Collapse
|
23
|
Huang F, Xiao L, Gao M, Vallely EJ, Dybvig K, Atkinson TP, Waites KB, Chong Z. B-assembler: a circular bacterial genome assembler. BMC Genomics 2022; 23:361. [PMID: 35546658 PMCID: PMC9092672 DOI: 10.1186/s12864-022-08577-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Accepted: 04/21/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate bacteria genome de novo assembly is fundamental to understand the evolution and pathogenesis of new bacteria species. The advent and popularity of Third-Generation Sequencing (TGS) enables assembly of bacteria genomes at an unprecedented speed. However, most current TGS assemblers were specifically designed for human or other species that do not have a circular genome. Besides, the repetitive DNA fragments in many bacterial genomes plus the high error rate of long sequencing data make it still very challenging to accurately assemble their genomes even with a relatively small genome size. Therefore, there is an urgent need for the development of an optimized method to address these issues. RESULTS We developed B-assembler, which is capable of assembling bacterial genomes when there are only long reads or a combination of short and long reads. B-assembler takes advantage of the structural resolving power of long reads and the accuracy of short reads if applicable. It first selects and corrects the ultra-long reads to get an initial contig. Then, it collects the reads overlapping with the ends of the initial contig. This two-round assembling procedure along with optimized error correction enables a high-confidence and circularized genome assembly. Benchmarked on both synthetic and real sequencing data of several species of bacterium, the results show that both long-read-only and hybrid-read modes can accurately assemble circular bacterial genomes free of structural errors and have fewer small errors compared to other assemblers. CONCLUSIONS B-assembler provides a better solution to bacterial genome assembly, which will facilitate downstream bacterial genome analysis.
Collapse
Affiliation(s)
- Fengyuan Huang
- Informatics Institute, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA.,Department of Genetics, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA
| | - Li Xiao
- Department of Medicine, Heersink School of Medicine, the University of Alabama at Birmingham, AB, 35294, Birmingham, USA
| | - Min Gao
- Informatics Institute, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA.,Department of Medicine, Heersink School of Medicine, the University of Alabama at Birmingham, AB, 35294, Birmingham, USA
| | - Ethan J Vallely
- Informatics Institute, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA
| | - Kevin Dybvig
- Department of Genetics, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA.,Department of Pediatrics, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35233, Birmingham, USA
| | - T Prescott Atkinson
- Department of Pediatrics, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35233, Birmingham, USA
| | - Ken B Waites
- Department of Pathology, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35233, Birmingham, USA
| | - Zechen Chong
- Informatics Institute, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA. .,Department of Genetics, Heersink School of Medicine, the University of Alabama at Birmingham, AL, 35294, Birmingham, USA.
| |
Collapse
|
24
|
Nabeshima K, Nakajima N, Ogata M, Onuma M. Draft genome sequence data of Indian rhinoceros, Rhinoceros unicornis. Data Brief 2022; 41:107857. [PMID: 35141371 PMCID: PMC8814301 DOI: 10.1016/j.dib.2022.107857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 01/13/2022] [Accepted: 01/18/2022] [Indexed: 11/18/2022] Open
Abstract
The Indian rhinoceros (Rhinoceros unicornis) is a large herbivore found in northern India and southern Nepal. It is a critically endangered species, with an estimated population of approximately 3,600 in the wild. Genetic factors, such as the loss of genetic diversity and the accumulation of deleterious variations, are critical risk factors for the extinction of endangered species, such as the Indian rhinoceros. To support the conservation efforts of the Indian rhinoceros, we assembled its draft genome. The new genomic data will enable the study of functional genes associated with the ecological and physiological characteristics of Indian rhinoceros and help us establish more effective conservation measures. The muscles of an Indian rhinoceros that died from prostration at a zoo were collected, and the samples were stored at the National Institute for Environmental Studies (Tsukuba, Japan). Sequence data were obtained using an Illumina NovaSeq 6000 platform for short reads and an Oxford Nanopore Technologies PromethION for long reads. We generated approximately 235.2 Gbp of data. From these sequences, we assembled a 2,375,051,758 bp genome consisting of 7,615 contigs. The genome data are available from the National Center Biotechnology Information BioProject database under accession number BOSQ00000000.
Collapse
|
25
|
Bessette M, Ste‐Croix DT, Brodeur J, Mimee B, Gagnon A. Population genetic structure of the carrot weevil ( Listronotus oregonensis) in North America. Evol Appl 2022; 15:300-315. [PMID: 35233249 PMCID: PMC8867704 DOI: 10.1111/eva.13343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Accepted: 12/30/2021] [Indexed: 11/29/2022] Open
Abstract
Population genetic studies of insect pests enhance our ability to anticipate problems in agroecosystems, such as pest outbreaks, insecticide resistance, or expansions of the host range. This study focuses on geographic distance and host plant selection as potential determinants of genetic differentiation of the carrot weevil Listronotus oregonensis, a major pest of several apiaceous crops in North America. To undertake genetic studies on this species, we assembled the first complete genome sequence for L. oregonensis. Then, we used both haplotype discrimination with mitochondrial DNA (mtDNA) and a genotyping-by-sequencing (GBS) approach to characterize the genetic population structure. A total of 220 individuals were sampled from 17 localities in the provinces of Québec, Ontario, Nova Scotia (Canada), and the state of Ohio (USA). Our results showed significant genetic differences between distant populations across North America, indicating that geographic distance represents an important factor of differentiation for the carrot weevil. Furthermore, the GBS analysis revealed more different clusters than COI analysis between Québec and Nova Scotia populations, suggesting a recent differentiation in the latter province. In contrast, we found no clear evidence of population structure associated with the four cultivated apiaceous plants tested (carrot, parsley, celery, and celeriac) using populations from Québec. This first characterization of the genetic structure of the carrot weevil contributes to a better understanding of the gene flow of the species and helps to adapt local pest management measures to better control this agricultural pest.
Collapse
Affiliation(s)
- Marianne Bessette
- Saint‐Jean‐sur‐Richelieu Research and Development CentreAgriculture and Agri‐Food CanadaSaint‐Jean‐sur‐RichelieuQCCanada
- Département de sciences biologiquesInstitut de recherche en biologie végétaleUniversité de MontréalMontrealQCCanada
| | - Dave T. Ste‐Croix
- Saint‐Jean‐sur‐Richelieu Research and Development CentreAgriculture and Agri‐Food CanadaSaint‐Jean‐sur‐RichelieuQCCanada
| | - Jacques Brodeur
- Département de sciences biologiquesInstitut de recherche en biologie végétaleUniversité de MontréalMontrealQCCanada
| | - Benjamin Mimee
- Saint‐Jean‐sur‐Richelieu Research and Development CentreAgriculture and Agri‐Food CanadaSaint‐Jean‐sur‐RichelieuQCCanada
| | - Annie‐Ève Gagnon
- Saint‐Jean‐sur‐Richelieu Research and Development CentreAgriculture and Agri‐Food CanadaSaint‐Jean‐sur‐RichelieuQCCanada
| |
Collapse
|
26
|
Pucker B, Irisarri I, de Vries J, Xu B. Plant genome sequence assembly in the era of long reads: Progress, challenges and future directions. QUANTITATIVE PLANT BIOLOGY 2022; 3:e5. [PMID: 37077982 PMCID: PMC10095996 DOI: 10.1017/qpb.2021.18] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 11/24/2021] [Accepted: 12/21/2021] [Indexed: 05/03/2023]
Abstract
Third-generation long-read sequencing is transforming plant genomics. Oxford Nanopore Technologies and Pacific Biosciences are offering competing long-read sequencing technologies and enable plant scientists to investigate even large and complex plant genomes. Sequencing projects can be conducted by single research groups and sequences of smaller plant genomes can be completed within days. This also resulted in an increased investigation of genomes from multiple species in large scale to address fundamental questions associated with the origin and evolution of land plants. Increased accessibility of sequencing devices and user-friendly software allows more researchers to get involved in genomics. Current challenges are accurately resolving diploid or polyploid genome sequences and better accounting for the intra-specific diversity by switching from the use of single reference genome sequences to a pangenome graph.
Collapse
Affiliation(s)
- Boas Pucker
- Department of Plant Sciences, University of Cambridge, Cambridge, United Kingdom
- Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Author for correspondence: Boas Pucker E-mail:
| | - Iker Irisarri
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
| | - Jan de Vries
- Department of Applied Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Göttingen, Germany
- Campus Institute Data Science (CIDAS), University of Goettingen, Göttingen, Germany
- Department of Applied Bioinformatics, Göttingen Center for Molecular Biosciences (GZMB), University of Goettingen, Göttingen, Germany
| | - Bo Xu
- State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
27
|
Kajitani R, Noguchi H, Gotoh Y, Ogura Y, Yoshimura D, Okuno M, Toyoda A, Kuwahara T, Hayashi T, Itoh T. MetaPlatanus: a metagenome assembler that combines long-range sequence links and species-specific features. Nucleic Acids Res 2021; 49:e130. [PMID: 34570223 PMCID: PMC8682757 DOI: 10.1093/nar/gkab831] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 09/05/2021] [Accepted: 09/09/2021] [Indexed: 12/27/2022] Open
Abstract
De novo metagenome assembly is effective in assembling multiple draft genomes, including those of uncultured organisms. However, heterogeneity in the metagenome hinders assembly and introduces interspecies misassembly deleterious for downstream analysis. For this purpose, we developed a hybrid metagenome assembler, MetaPlatanus. First, as a characteristic function, it assembles the basic contigs from accurate short reads and then iteratively utilizes long-range sequence links, species-specific sequence compositions, and coverage depth. The binning information was also used to improve contiguity. Benchmarking using mock datasets consisting of known bacteria with long reads or mate pairs revealed the high contiguity MetaPlatanus with a few interspecies misassemblies. For published human gut data with nanopore reads from potable sequencers, MetaPlatanus assembled many biologically important elements, such as coding genes, gene clusters, viral sequences, and over-half bacterial genomes. In the benchmark with published human saliva data with high-throughput nanopore reads, the superiority of MetaPlatanus was considerably more evident. We found that some high-abundance bacterial genomes were assembled only by MetaPlatanus as near-complete. Furthermore, MetaPlatanus can circumvent the limitations of highly fragmented assemblies and frequent interspecies misassembles obtained by the other tools. Overall, the study demonstrates that MetaPlatanus could be an effective approach for exploring large-scale structures in metagenomes.
Collapse
Affiliation(s)
- Rei Kajitani
- School of Life Science and Technology, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Hideki Noguchi
- Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Yasuhiro Gotoh
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan
| | - Yoshitoshi Ogura
- Division of Microbiology, Department of Infectious Medicine, Kurume University School of Medicine, Asahi-machi, Kurume, Fukuoka 830-0011, Japan
| | - Dai Yoshimura
- School of Life Science and Technology, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| | - Miki Okuno
- Division of Microbiology, Department of Infectious Medicine, Kurume University School of Medicine, Asahi-machi, Kurume, Fukuoka 830-0011, Japan
| | - Atsushi Toyoda
- Advanced Genomics Center, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan.,Comparative Genomics Laboratory, National Institute of Genetics, Mishima, Shizuoka 411-8540, Japan
| | - Tomomi Kuwahara
- Department of Molecular Microbiology, Faculty of Medicine, Kagawa University, Miki-cho, Kita-gun, Kagawa 761-0793, Japan
| | - Tetsuya Hayashi
- Department of Bacteriology, Graduate School of Medical Sciences, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan
| | - Takehiko Itoh
- School of Life Science and Technology, Tokyo Institute of Technology, Meguro-ku, Tokyo 152-8550, Japan
| |
Collapse
|
28
|
Galata V, Busi SB, Kunath BJ, de Nies L, Calusinska M, Halder R, May P, Wilmes P, Laczny CC. Functional meta-omics provide critical insights into long- and short-read assemblies. Brief Bioinform 2021; 22:bbab330. [PMID: 34453168 PMCID: PMC8575027 DOI: 10.1093/bib/bbab330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 07/13/2021] [Accepted: 07/26/2021] [Indexed: 11/12/2022] Open
Abstract
Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.
Collapse
Affiliation(s)
- Valentina Galata
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Susheel Bhanu Busi
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Benoît Josef Kunath
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Laura de Nies
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Magdalena Calusinska
- BioSystems and Bioprocessing Engineering, Luxembourg Institute of Science and Technology, Rue du Brill 41, Belvaux L-4422, Luxembourg
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Cédric Christian Laczny
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
29
|
Díaz-Viraqué F, Greif G, Berná L, Robello C. Nanopore Long Read DNA Sequencing of Protozoan Parasites: Hybrid Genome Assembly of Trypanosoma cruzi. Methods Mol Biol 2021; 2369:3-13. [PMID: 34313980 DOI: 10.1007/978-1-0716-1681-9_1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Abstract
Due to highly repetitive genome sequences, short-read-based Trypanosoma cruzi genomes are extremely fragmented. Contiguous trypanosomatid genomes assemblies have resulted in the advent of third-generation sequencing technologies. Long reads span several to hundreds of kbps allowing to correct assemblies of repeated and low complexity DNA regions. However, these techniques present higher error rates. Hybrid assembly strategies that combine error-prone long reads with much more accurate Illumina short reads represent a very convenient approach for enhancing genome completeness. Here, we describe how to perform a hybrid assembly for genomic analysis of protozoan pathogens using Illumina and Oxford Nanopore sequencing.
Collapse
Affiliation(s)
- Florencia Díaz-Viraqué
- Laboratorio de Interacciones Hospedero-Patógeno-UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Gonzalo Greif
- Laboratorio de Interacciones Hospedero-Patógeno-UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay
| | - Luisa Berná
- Laboratorio de Interacciones Hospedero-Patógeno-UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay
- Sección Biomatemática-Unidad de Genómica Evolutiva, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Carlos Robello
- Laboratorio de Interacciones Hospedero-Patógeno-UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay.
- Departamento de Bioquímica, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay.
| |
Collapse
|
30
|
Liu L, Tumi L, Suni ML, Arakaki M, Wang ZF, Ge XJ. Draft genome of Puya raimondii (Bromeliaceae), the Queen of the Andes. Genomics 2021; 113:2537-2546. [PMID: 34089785 DOI: 10.1016/j.ygeno.2021.05.042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 05/16/2021] [Accepted: 05/31/2021] [Indexed: 01/20/2023]
Abstract
Puya raimondii, the Queen of the Andes, is an endangered high Andean species in the Bromeliaceae family. Here, we report its first genome to promote its conservation and evolutionary study. Comparative genomics showed P. raimondii diverged from Ananas comosus about 14.8 million years ago, and the long terminal repeats were likely to contribute to the genus diversification in last 3.5 million years. The gene families related to plant reproductive development and stress responses significantly expanded in the genome. At the same time, gene families involved in disease defense, photosynthesis and carbohydrate metabolism significantly contracted, which may be an evolutionary strategy to adapt to the harsh conditions in high Andes. The demographic history analysis revealed the P. raimondii population size sharply declined in the Pleistocene and then increased in the Holocene. We also designed and tested 46 pairs of universal primers for amplifying orthologous single-copy nuclear genes in Puya species.
Collapse
Affiliation(s)
- Lu Liu
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China; University of Chinese Academy of Sciences, Beijing, China
| | - Liscely Tumi
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Mery L Suni
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Monica Arakaki
- Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima, Peru
| | - Zheng-Feng Wang
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China; Center of Plant Ecology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, China; South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
| | - Xue-Jun Ge
- Key Laboratory of Plant Resources Conservation and Sustainable Utilization and Guangdong Provincial Key Laboratory of Applied Botany, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China; Center of Conservation Biology, Core Botanical Gardens, Chinese Academy of Sciences, Guangzhou, China; South China Botanical Garden, Chinese Academy of Sciences, Guangzhou, China.
| |
Collapse
|
31
|
Gatter T, von Löhneysen S, Fallmann J, Drozdova P, Hartmann T, Stadler PF. LazyB: fast and cheap genome assembly. Algorithms Mol Biol 2021; 16:8. [PMID: 34074310 PMCID: PMC8168326 DOI: 10.1186/s13015-021-00186-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 05/06/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, "hybrid" methods that integrate short and long read data have been devised to address this need. RESULTS LazyB is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort. CONCLUSIONS LazyB is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. AVAILABILITY The LazyB prototype is available at https://github.com/TGatter/LazyB .
Collapse
Affiliation(s)
- Thomas Gatter
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.
| | - Sarah von Löhneysen
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Polina Drozdova
- Institute of Biology, Irkutsk State University, RU-664003, Irkutsk, Russia
| | - Tom Hartmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany
| | - Peter F Stadler
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia.
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18, 04107, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany.
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, 1090, Vienna, Austria.
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA.
| |
Collapse
|
32
|
Weng YM, Francoeur CB, Currie CR, Kavanaugh DH, Schoville SD. A high-quality carabid genome assembly provides insights into beetle genome evolution and cold adaptation. Mol Ecol Resour 2021; 21:2145-2165. [PMID: 33938156 DOI: 10.1111/1755-0998.13409] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 04/13/2021] [Accepted: 04/26/2021] [Indexed: 12/13/2022]
Abstract
The hyperdiverse order Coleoptera comprises a staggering ~25% of known species on Earth. Despite recent breakthroughs in next generation sequencing, there remains a limited representation of beetle diversity in assembled genomes. Most notably, the ground beetle family Carabidae, comprising more than 40,000 described species, has not been studied in a comparative genomics framework using whole genome data. Here we generate a high-quality genome assembly for Nebria riversi, to examine sources of novelty in the genome evolution of beetles, as well as genetic changes associated with specialization to high-elevation alpine habitats. In particular, this genome resource provides a foundation for expanding comparative molecular research into mechanisms of insect cold adaptation. Comparison to other beetles shows a strong signature of genome compaction, with N. riversi possessing a relatively small genome (~147 Mb) compared to other beetles, with associated reductions in repeat element content and intron length. Small genome size is not, however, associated with fewer protein-coding genes, and an analysis of gene family diversity shows significant expansions of genes associated with cellular membranes and membrane transport, as well as protein phosphorylation and muscle filament structure. Finally, our genomic analyses show that these high-elevation beetles have endosymbiotic Spiroplasma, with several metabolic pathways (e.g., propanoate biosynthesis) that might complement N. riversi, although its role as a beneficial symbiont or as a reproductive parasite remains equivocal.
Collapse
Affiliation(s)
- Yi-Ming Weng
- Department of Entomology, University of Wisconsin - Madison, Madison, WI, USA
| | - Charlotte B Francoeur
- Department of Bacteriology, University of Wisconsin - Madison, Madison, WI, USA.,Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin - Madison, Madison, WI, USA
| | - Cameron R Currie
- Department of Bacteriology, University of Wisconsin - Madison, Madison, WI, USA.,Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin - Madison, Madison, WI, USA
| | - David H Kavanaugh
- Department of Entomology, California Academy of Sciences, San Francisco, CA, USA
| | - Sean D Schoville
- Department of Entomology, University of Wisconsin - Madison, Madison, WI, USA
| |
Collapse
|
33
|
Patro R, Salmela L. Algorithms meet sequencing technologies - 10th edition of the RECOMB-Seq workshop. iScience 2021; 24:101956. [PMID: 33437938 PMCID: PMC7788091 DOI: 10.1016/j.isci.2020.101956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
DNA and RNA sequencing is a core technology in biological and medical research. The high throughput of these technologies and the consistent development of new experimental assays and biotechnologies demand the continuous development of methods to analyze the resulting data. The RECOMB Satellite Workshop on Massively Parallel Sequencing brings together leading researchers in computational genomics to discuss emerging frontiers in algorithm development for massively parallel sequencing data. The 10th meeting in this series, RECOMB-Seq 2020, was scheduled to be held in Padua, Italy, but due to the ongoing COVID-19 pandemic, the meeting was carried out virtually instead. The online workshop featured keynote talks by Paola Bonizzoni and Zamin Iqbal, two highlight talks, ten regular talks, and three short talks. Seven of the works presented in the workshop are featured in this edition of iScience, and many of the talks are available online in the RECOMB-Seq 2020 YouTube channel.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science and Center for Bioinformatics and Computational Biology, University of Maryland, MD, USA
| | - Leena Salmela
- Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| |
Collapse
|