1
|
Uppal S, Waterworth SC, Nick A, Vogel H, Flórez LV, Kaltenpoth M, Kwan JC. Repeated horizontal acquisition of lagriamide-producing symbionts in Lagriinae beetles. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.23.576914. [PMID: 39026795 PMCID: PMC11257431 DOI: 10.1101/2024.01.23.576914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
Microbial symbionts associate with multicellular organisms on a continuum from facultative associations to mutual codependency. In some of the oldest intracellular symbioses there is exclusive vertical symbiont transmission, and co-diversification of symbiotic partners over millions of years. Such symbionts often undergo genome reduction due to low effective population sizes, frequent population bottlenecks, and reduced purifying selection. Here, we describe multiple independent acquisition events of closely related defensive symbionts followed by genome erosion in a group of Lagriinae beetles. Previous work in Lagria villosa revealed the dominant genome-eroded symbiont of the genus Burkholderia produces the antifungal compound lagriamide and protects the beetle's eggs and larvae from antagonistic fungi. Here, we use metagenomics to assemble 11 additional genomes of lagriamide-producing symbionts from seven different host species within Lagriinae from five countries, to unravel the evolutionary history of this symbiotic relationship. In each host species, we detected one dominant genome-eroded Burkholderia symbiont encoding the lagriamide biosynthetic gene cluster (BGC). Surprisingly, however, we did not find evidence for host-symbiont co-diversification, or for a monophyly of the lagriamide-producing symbionts. Instead, our analyses support at least four independent acquisition events of lagriamide-encoding symbionts and subsequent genome erosion in each of these lineages. By contrast, a clade of plant-associated relatives retained large genomes but secondarily lost the lagriamide BGC. In conclusion, our results reveal a dynamic evolutionary history with multiple independent symbiont acquisitions characterized by high degree of specificity. They highlight the importance of the specialized metabolite lagriamide for the establishment and maintenance of this defensive symbiosis.
Collapse
|
2
|
Mejias-Gomez O, Braghetto M, Sørensen MKD, Madsen AV, Guiu LS, Kristensen P, Pedersen LE, Goletz S. Deep mining of antibody phage-display selections using Oxford Nanopore Technologies and Dual Unique Molecular Identifiers. N Biotechnol 2024; 80:56-68. [PMID: 38354946 DOI: 10.1016/j.nbt.2024.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2023] [Revised: 02/05/2024] [Accepted: 02/11/2024] [Indexed: 02/16/2024]
Abstract
Antibody phage-display technology identifies antibody-antigen interactions through multiple panning rounds, but traditional screening gives no information on enrichment or diversity throughout the process. This results in the loss of valuable binders. Next Generation Sequencing can overcome this problem. We introduce a high accuracy long-read sequencing method based on the recent Oxford Nanopore Technologies (ONT) Q20 + chemistry in combination with dual unique molecular identifiers (UMIs) and an optimized bioinformatic analysis pipeline to monitor the selections. We identified binders from two single-domain antibody libraries selected against a model protein. Traditional colony-picking was compared with our ONT-UMI method. ONT-UMI enabled monitoring of diversity and enrichment before and after each selection round. By combining phage antibody selections with ONT-UMIs, deep mining of output selections is possible. The approach provides an alternative to traditional screening, enabling diversity quantification after each selection round and rare binder recovery, even when the dominating binder was > 99% abundant. Moreover, it can give insights on binding motifs for further affinity maturation and specificity optimizations. Our results demonstrate a platform for future data guided selection strategies.
Collapse
Affiliation(s)
- Oscar Mejias-Gomez
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Marta Braghetto
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Morten Kielsgaard Dziegiel Sørensen
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Andreas Visbech Madsen
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Laura Salse Guiu
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Peter Kristensen
- Department of Chemistry and Bioscience, Section for Bioscience and Engineering, Aalborg University, Aalborg, Denmark
| | - Lasse Ebdrup Pedersen
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark.
| | - Steffen Goletz
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark.
| |
Collapse
|
3
|
Watson KJ, Bromley RE, Sparklin BC, Gasser MT, Bhattacharya T, Lebov JF, Tyson T, Dai N, Teigen LE, Graf KT, Foster JM, Michalski M, Bruno VM, Lindsey AR, Corrêa IR, Hardy RW, Newton IL, Dunning Hotopp JC. Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m 5C at GCU motifs. Life Sci Alliance 2024; 7:e202302201. [PMID: 38030223 PMCID: PMC10687253 DOI: 10.26508/lsa.202302201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 11/16/2023] [Accepted: 11/20/2023] [Indexed: 12/01/2023] Open
Abstract
RNA modifications, such as methylation, can be detected with Oxford Nanopore Technologies direct RNA sequencing. One commonly used tool for detecting 5-methylcytosine (m5C) modifications is Tombo, which uses an "Alternative Model" to detect putative modifications from a single sample. We examined direct RNA sequencing data from diverse taxa including viruses, bacteria, fungi, and animals. The algorithm consistently identified a m5C at the central position of a GCU motif. However, it also identified a m5C in the same motif in fully unmodified in vitro transcribed RNA, suggesting that this is a frequent false prediction. In the absence of further validation, several published predictions of m5C in a GCU context should be reconsidered, including those from human coronavirus and human cerebral organoid samples.
Collapse
Affiliation(s)
- Kaylee J Watson
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Robin E Bromley
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Benjamin C Sparklin
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Mark T Gasser
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Tamanash Bhattacharya
- https://ror.org/01kg8sb98 Department of Biology, Indiana University, Bloomington, IN, USA
| | - Jarrett F Lebov
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Tyonna Tyson
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nan Dai
- https://ror.org/04ywg3445 New England Biolabs, Ipswich, MA, USA
| | - Laura E Teigen
- https://ror.org/05w22af52 Department of Biology, University of Wisconsin Oshkosh, Oshkosh, WI, USA
| | - Karen T Graf
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jeremy M Foster
- https://ror.org/04ywg3445 New England Biolabs, Ipswich, MA, USA
| | - Michelle Michalski
- https://ror.org/05w22af52 Department of Biology, University of Wisconsin Oshkosh, Oshkosh, WI, USA
| | - Vincent M Bruno
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- https://ror.org/04rq5mt64 Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Amelia Ri Lindsey
- https://ror.org/01kg8sb98 Department of Biology, Indiana University, Bloomington, IN, USA
| | - Ivan R Corrêa
- https://ror.org/04ywg3445 New England Biolabs, Ipswich, MA, USA
| | - Richard W Hardy
- https://ror.org/01kg8sb98 Department of Biology, Indiana University, Bloomington, IN, USA
| | - Irene Lg Newton
- https://ror.org/01kg8sb98 Department of Biology, Indiana University, Bloomington, IN, USA
| | - Julie C Dunning Hotopp
- https://ror.org/04rq5mt64 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
- https://ror.org/04rq5mt64 Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD, USA
- https://ror.org/04rq5mt64 Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD, USA
| |
Collapse
|
4
|
Hopkins BR, Angus-Henry A, Kim BY, Carlisle JA, Thompson A, Kopp A. Decoupled evolution of the Sex Peptide gene family and Sex Peptide Receptor in Drosophilidae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.29.547128. [PMID: 37425821 PMCID: PMC10327216 DOI: 10.1101/2023.06.29.547128] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2023]
Abstract
Across internally fertilising species, males transfer ejaculate proteins that trigger wide-ranging changes in female behaviour and physiology. Much theory has been developed to explore the drivers of ejaculate protein evolution. The accelerating availability of high-quality genomes now allows us to test how these proteins are evolving at fine taxonomic scales. Here, we use genomes from 264 species to chart the evolutionary history of Sex Peptide (SP), a potent regulator of female post-mating responses in Drosophila melanogaster. We infer that SP first evolved in the Drosophilinae subfamily and has followed markedly different evolutionary trajectories in different lineages. Outside of the Sophophora-Lordiphosa, SP exists largely as a single-copy gene with independent losses in several lineages. Within the Sophophora-Lordiphosa, the SP gene family has repeatedly and independently expanded. Up to seven copies, collectively displaying extensive sequence variation, are present in some species. Despite these changes, SP expression remains restricted to the male reproductive tract. Alongside, we document considerable interspecific variation in the presence and morphology of seminal microcarriers that, despite the critical role SP plays in microcarrier assembly in D. melanogaster, appear to be independent of changes in the presence/absence or sequence of SP. We end by providing evidence that SP's evolution is decoupled from that of its receptor, SPR, in which we detect no evidence of correlated diversifying selection. Collectively, our work describes the divergent evolutionary trajectories that a novel gene has taken following its origin and finds a surprisingly weak coevolutionary signal between a supposedly sexually antagonistic protein and its receptor.
Collapse
Affiliation(s)
- Ben R. Hopkins
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| | - Aidan Angus-Henry
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| | | | - Jolie A. Carlisle
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA
| | - Ammon Thompson
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| | - Artyom Kopp
- Department of Evolution and Ecology, University of California – Davis, CA, USA
| |
Collapse
|
5
|
Bloemen B, Gand M, Vanneste K, Marchal K, Roosens NHC, De Keersmaecker SCJ. Development of a portable on-site applicable metagenomic data generation workflow for enhanced pathogen and antimicrobial resistance surveillance. Sci Rep 2023; 13:19656. [PMID: 37952062 PMCID: PMC10640560 DOI: 10.1038/s41598-023-46771-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 11/04/2023] [Indexed: 11/14/2023] Open
Abstract
Rapid, accurate and comprehensive diagnostics are essential for outbreak prevention and pathogen surveillance. Real-time, on-site metagenomics on miniaturized devices, such as Oxford Nanopore Technologies MinION sequencing, could provide a promising approach. However, current sample preparation protocols often require substantial equipment and dedicated laboratories, limiting their use. In this study, we developed a rapid on-site applicable DNA extraction and library preparation approach for nanopore sequencing, using portable devices. The optimized method consists of a portable mechanical lysis approach followed by magnetic bead-based DNA purification and automated sequencing library preparation, and resulted in a throughput comparable to a current optimal, laboratory-based protocol using enzymatic digestion to lyse cells. By using spike-in reference communities, we compared the on-site method with other workflows, and demonstrated reliable taxonomic profiling, despite method-specific biases. We also demonstrated the added value of long-read sequencing by recovering reads containing full-length antimicrobial resistance genes, and attributing them to a host species based on the additional genomic information they contain. Our method may provide a rapid, widely-applicable approach for microbial detection and surveillance in a variety of on-site settings.
Collapse
Affiliation(s)
- Bram Bloemen
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
- Department of Information Technology, IDLab, Ghent University, IMEC, 9052, Ghent, Belgium
| | - Mathieu Gand
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Kevin Vanneste
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Kathleen Marchal
- Department of Information Technology, IDLab, Ghent University, IMEC, 9052, Ghent, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052, Ghent, Belgium
| | - Nancy H C Roosens
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Sigrid C J De Keersmaecker
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium.
| |
Collapse
|
6
|
Greenberg G, Ravi AN, Shomorony I. LexicHash: sequence similarity estimation via lexicographic comparison of hashes. Bioinformatics 2023; 39:btad652. [PMID: 37878809 PMCID: PMC10628434 DOI: 10.1093/bioinformatics/btad652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 10/11/2023] [Accepted: 10/23/2023] [Indexed: 10/27/2023] Open
Abstract
MOTIVATION Pairwise sequence alignment is a heavy computational burden, particularly in the context of third-generation sequencing technologies. This issue is commonly addressed by approximately estimating sequence similarities using a hash-based method such as MinHash. In MinHash, all k-mers in a read are hashed and the minimum hash value, the min-hash, is stored. Pairwise similarities can then be estimated by counting the number of min-hash matches between a pair of reads, across many distinct hash functions. The choice of the parameter k controls an important tradeoff in the task of identifying alignments: larger k-values give greater confidence in the identification of alignments (high precision) but can lead to many missing alignments (low recall), particularly in the presence of significant noise. RESULTS In this work, we introduce LexicHash, a new similarity estimation method that is effectively independent of the choice of k and attains the high precision of large-k and the high sensitivity of small-k MinHash. LexicHash is a variant of MinHash with a carefully designed hash function. When estimating the similarity between two reads, instead of simply checking whether min-hashes match (as in standard MinHash), one checks how "lexicographically similar" the LexicHash min-hashes are. In our experiments on 40 PacBio datasets, the area under the precision-recall curves obtained by LexicHash had an average improvement of 20.9% over MinHash. Additionally, the LexicHash framework lends itself naturally to an efficient search of the largest alignments, yielding an O(n) time algorithm, and circumventing the seemingly fundamental O(n2) scaling associated with pairwise similarity search. AVAILABILITY AND IMPLEMENTATION LexicHash is available on GitHub at https://github.com/gcgreenberg/LexicHash.
Collapse
Affiliation(s)
- Grant Greenberg
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Aditya Narayan Ravi
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| | - Ilan Shomorony
- Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
| |
Collapse
|
7
|
Leung W, Torosin N, Cao W, Reed LK, Arrigo C, Elgin SCR, Ellison CE. Long-read genome assemblies for the study of chromosome expansion: Drosophila kikkawai, Drosophila takahashii, Drosophila bipectinata, and Drosophila ananassae. G3 (BETHESDA, MD.) 2023; 13:jkad191. [PMID: 37611223 PMCID: PMC10542312 DOI: 10.1093/g3journal/jkad191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Revised: 08/01/2023] [Accepted: 08/04/2023] [Indexed: 08/25/2023]
Abstract
Flow cytometry estimates of genome sizes among species of Drosophila show a 3-fold variation, ranging from ∼127 Mb in Drosophila mercatorum to ∼400 Mb in Drosophila cyrtoloma. However, the assembled portion of the Muller F element (orthologous to the fourth chromosome in Drosophila melanogaster) shows a nearly 14-fold variation in size, ranging from ∼1.3 Mb to >18 Mb. Here, we present chromosome-level long-read genome assemblies for 4 Drosophila species with expanded F elements ranging in size from 2.3 to 20.5 Mb. Each Muller element is present as a single scaffold in each assembly. These assemblies will enable new insights into the evolutionary causes and consequences of chromosome size expansion.
Collapse
Affiliation(s)
- Wilson Leung
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Nicole Torosin
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Weihuan Cao
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Laura K Reed
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, AL 35487, USA
| | - Cindy Arrigo
- Department of Biology, New Jersey City University, Jersey City, NJ 07305, USA
| | - Sarah C R Elgin
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Christopher E Ellison
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
8
|
Esteller-Cucala P, Palmada-Flores M, Kuderna LFK, Fontsere C, Serres-Armero A, Dabad M, Torralvo M, Faella A, Ferrández-Peral L, Llovera L, Fornas O, Julià E, Ramírez E, González I, Hecht J, Lizano E, Juan D, Marquès-Bonet T. Y chromosome sequence and epigenomic reconstruction across human populations. Commun Biol 2023; 6:623. [PMID: 37296226 PMCID: PMC10256797 DOI: 10.1038/s42003-023-05004-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 05/31/2023] [Indexed: 06/12/2023] Open
Abstract
Recent advances in long-read sequencing technologies have allowed the generation and curation of more complete genome assemblies, enabling the analysis of traditionally neglected chromosomes, such as the human Y chromosome (chrY). Native DNA was sequenced on a MinION Oxford Nanopore Technologies sequencing device to generate genome assemblies for seven major chrY human haplogroups. We analyzed and compared the chrY enrichment of sequencing data obtained using two different selective sequencing approaches: adaptive sampling and flow cytometry chromosome sorting. We show that adaptive sampling can produce data to create assemblies comparable to chromosome sorting while being a less expensive and time-consuming technique. We also assessed haplogroup-specific structural variants, which would be otherwise difficult to study using short-read sequencing data only. Finally, we took advantage of this technology to detect and profile epigenetic modifications among the considered haplogroups. Altogether, we provide a framework to study complex genomic regions with a simple, fast, and affordable methodology that could be applied to larger population genomics datasets.
Collapse
Affiliation(s)
- Paula Esteller-Cucala
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain.
| | - Marc Palmada-Flores
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Lukas F K Kuderna
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Claudia Fontsere
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Aitor Serres-Armero
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Marc Dabad
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain
| | - María Torralvo
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Armida Faella
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Luis Ferrández-Peral
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Laia Llovera
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Oscar Fornas
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona, Spain
| | - Eva Julià
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Erika Ramírez
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Irene González
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Jochen Hecht
- Centre for Genomic Regulation (CRG), Barcelona Institute for Science and Technology (BIST), Doctor Aiguader 88, Barcelona, Spain
| | - Esther Lizano
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Cerdanyola del Vallès, Spain
| | - David Juan
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain
| | - Tomàs Marquès-Bonet
- Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Doctor Aiguader 88, Barcelona, Spain.
- CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Baldiri i Reixac 4, Barcelona, Spain.
- Universitat Pompeu Fabra (UPF), Doctor Aiguader 88, Barcelona, Spain.
- Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Edifici ICTA-ICP, Cerdanyola del Vallès, Spain.
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Passeig Lluís Companys 23, Barcelona, Spain.
| |
Collapse
|
9
|
Spealman P, De T, Chuong JN, Gresham D. Best Practices in Microbial Experimental Evolution: Using Reporters and Long-Read Sequencing to Identify Copy Number Variation in Experimental Evolution. J Mol Evol 2023; 91:356-368. [PMID: 37012421 PMCID: PMC10275804 DOI: 10.1007/s00239-023-10102-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/21/2023] [Indexed: 04/05/2023]
Abstract
Copy number variants (CNVs), comprising gene amplifications and deletions, are a pervasive class of heritable variation. CNVs play a key role in rapid adaptation in both natural, and experimental, evolution. However, despite the advent of new DNA sequencing technologies, detection and quantification of CNVs in heterogeneous populations has remained challenging. Here, we summarize recent advances in the use of CNV reporters that provide a facile means of quantifying de novo CNVs at a specific locus in the genome, and nanopore sequencing, for resolving the often complex structures of CNVs. We provide guidance for the engineering and analysis of CNV reporters and practical guidelines for single-cell analysis of CNVs using flow cytometry. We summarize recent advances in nanopore sequencing, discuss the utility of this technology, and provide guidance for the bioinformatic analysis of these data to define the molecular structure of CNVs. The combination of reporter systems for tracking and isolating CNV lineages and long-read DNA sequencing for characterizing CNV structures enables unprecedented resolution of the mechanisms by which CNVs are generated and their evolutionary dynamics.
Collapse
Affiliation(s)
- Pieter Spealman
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Titir De
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Julie N Chuong
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - David Gresham
- Department of Biology, New York University, New York, NY, 10003, USA.
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
| |
Collapse
|
10
|
Leung W, Torosin N, Cao W, Reed LK, Arrigo C, Elgin SCR, Ellison CE. Long-read genome assemblies for the study of chromosome expansion: Drosophila kikkawai , Drosophila takahashii , Drosophila bipectinata , and Drosophila ananassae. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.22.541758. [PMID: 37292993 PMCID: PMC10245892 DOI: 10.1101/2023.05.22.541758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Flow cytometry estimates of genome sizes among species of Drosophila show a 3-fold variation, ranging from ∼127 Mb in Drosophila mercatorum to ∼400 Mb in Drosophila cyrtoloma . However, the assembled portion of the Muller F Element (orthologous to the fourth chromosome in Drosophila melanogaster ) shows a nearly 14-fold variation in size, ranging from ∼1.3 Mb to > 18 Mb. Here, we present chromosome-level long read genome assemblies for four Drosophila species with expanded F Elements ranging in size from 2.3 Mb to 20.5 Mb. Each Muller Element is present as a single scaffold in each assembly. These assemblies will enable new insights into the evolutionary causes and consequences of chromosome size expansion.
Collapse
Affiliation(s)
- Wilson Leung
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Nicole Torosin
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Weihuan Cao
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| | - Laura K Reed
- Department of Biological Sciences, The University of Alabama, Tuscaloosa, Alabama, 35487, USA
| | - Cindy Arrigo
- Department of Biology, New Jersey City University, Jersey City, NJ 07305, USA
| | - Sarah C R Elgin
- Department of Biology, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Christopher E Ellison
- Department of Genetics and Human Genetics Institute of New Jersey, Rutgers University, Piscataway, NJ 08854, USA
| |
Collapse
|
11
|
Watson KJ, Bromley RE, Sparklin BC, Gasser MT, Bhattacharya T, Lebov JF, Tyson T, Teigen LE, Graf KT, Michalski M, Bruno VM, Lindsey ARI, Hardy RW, Newton ILG, Hotopp JCD. Common Analysis of Direct RNA SequencinG CUrrently Leads to Misidentification of 5-Methylcytosine Modifications at GCU Motifs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.03.539298. [PMID: 37205495 PMCID: PMC10187288 DOI: 10.1101/2023.05.03.539298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
RNA modifications, such as méthylation, can be detected with Oxford Nanopore Technologies direct RNA sequencing. One commonly used tool for detecting 5-methylcytosine (m5C) modifications is Tombo, which uses an "Alternative Model" to detect putative modifications from a single sample. We examined direct RNA sequencing data from diverse taxa including virus, bacteria, fungi, and animals. The algorithm consistently identified a 5-methylcytosine at the central position of a GCU motif. However, it also identified a 5-methylcytosine in the same motif in fully unmodified in vitro transcribed RNA, suggesting that this a frequent false prediction. In the absence of further validation, several published predictions of 5-methylcytosine in human coronavirus and human cerebral organoid RNA in a GCU context should be reconsidered.
Collapse
Affiliation(s)
- Kaylee J. Watson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Robin E. Bromley
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Benjamin C. Sparklin
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Mark T. Gasser
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | | | - Jarrett F. Lebov
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Tyonna Tyson
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Laura E. Teigen
- Department of Biology, University of Wisconsin Oshkosh, Oshkosh, WI, USA
| | - Karen T. Graf
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | - Michelle Michalski
- Department of Biology, University of Wisconsin Oshkosh, Oshkosh, WI, USA
| | - Vincent M. Bruno
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology & Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| | | | | | | | - Julie C. Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Department of Microbiology & Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA
- Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201, USA
| |
Collapse
|
12
|
Yang C, Lo T, Nip KM, Hafezqorani S, Warren RL, Birol I. Characterization and simulation of metagenomic nanopore sequencing data with Meta-NanoSim. Gigascience 2023; 12:giad013. [PMID: 36939007 PMCID: PMC10025935 DOI: 10.1093/gigascience/giad013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Revised: 01/19/2023] [Accepted: 02/17/2023] [Indexed: 03/21/2023] Open
Abstract
BACKGROUND Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.
Collapse
Affiliation(s)
- Chen Yang
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Theodora Lo
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - Saber Hafezqorani
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Bioinformatics Graduate Program, University of British Columbia, Genome Sciences Centre, BCCA 100-570 West 7th Avenue, Vancouver, BC, V5Z 4S6, Canada
| | - René L Warren
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
| | - Inanc Birol
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, V5Z 4S6, Canada
- Department of Medical Genetics, University of British Columbia, Life Sciences Centre Room 1364 – 2350 Health Science Mall Vancouver, BC V6T 1Z3, Canada
| |
Collapse
|
13
|
Abstract
A perfect bacterial genome assembly is one where the assembled sequence is an exact match for the organism's genome-each replicon sequence is complete and contains no errors. While this has been difficult to achieve in the past, improvements in long-read sequencing, assemblers, and polishers have brought perfect assemblies within reach. Here, we describe our recommended approach for assembling a bacterial genome to perfection using a combination of Oxford Nanopore Technologies long reads and Illumina short reads: Trycycler long-read assembly, Medaka long-read polishing, Polypolish short-read polishing, followed by other short-read polishing tools and manual curation. We also discuss potential pitfalls one might encounter when assembling challenging genomes, and we provide an online tutorial with sample data (github.com/rrwick/perfect-bacterial-genome-tutorial).
Collapse
|
14
|
Firtina C, Park J, Alser M, Kim JS, Cali D, Shahroodi T, Ghiasi N, Singh G, Kanellopoulos K, Alkan C, Mutlu O. BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis. NAR Genom Bioinform 2023; 5:lqad004. [PMID: 36685727 PMCID: PMC9853099 DOI: 10.1093/nargab/lqad004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 12/16/2022] [Accepted: 01/10/2023] [Indexed: 01/22/2023] Open
Abstract
Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either (i) increasing the use of the costly sequence alignment or (ii) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND (i) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and (ii) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.4×-83.9× (on average 19.3×), has a lower memory footprint by 0.9×-14.1× (on average 3.8×), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.8×-4.1× (on average 1.7×) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND.
Collapse
Affiliation(s)
- Can Firtina
- To whom correspondence should be addressed. Tel: +41 44 632 64 29;
| | - Jisung Park
- ETH Zurich, Zurich 8092, Switzerland,POSTECH, Pohang 37673, Republic of Korea
| | | | | | | | | | | | | | | | - Can Alkan
- Bilkent University, Ankara 06800, Turkey
| | - Onur Mutlu
- Correspondence may also be addressed to Onur Mutlu. Tel: +41 44 632 64 29;
| |
Collapse
|
15
|
Ono Y, Hamada M, Asai K. PBSIM3: a simulator for all types of PacBio and ONT long reads. NAR Genom Bioinform 2022; 4:lqac092. [PMID: 36465498 PMCID: PMC9713900 DOI: 10.1093/nargab/lqac092] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 11/02/2022] [Accepted: 11/12/2022] [Indexed: 12/03/2022] Open
Abstract
Long-read sequencers, such as Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) sequencers, have improved their read length and accuracy, thereby opening up unprecedented research. Many tools and algorithms have been developed to analyze long reads, and rapid progress in PacBio and ONT has further accelerated their development. Together with the development of high-throughput sequencing technologies and their analysis tools, many read simulators have been developed and effectively utilized. PBSIM is one of the popular long-read simulators. In this study, we developed PBSIM3 with three new functions: error models for long reads, multi-pass sequencing for high-fidelity read simulation and transcriptome sequencing simulation. Therefore, PBSIM3 is now able to meet a wide range of long-read simulation requirements.
Collapse
Affiliation(s)
- Yukiteru Ono
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
| | - Michiaki Hamada
- Department of Electrical Engineering and Bioscience, Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 63-520, 3-4-1, Okubo, Shinjuku-ku, Tokyo 169-8555, Japan
- Institute for Medical-Oriented Structural Biology, Waseda University, 2-2, Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan
- Graduate School of Medicine, Nippon Medical School, 1-1-5, Sendagi, Bunkyo-ku, Tokyo, 113-8602, Japan
| | - Kiyoshi Asai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa 277-8561, Japan
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-3-26, Aomi, Koto-ku, 135-0064 Tokyo, Japan
| |
Collapse
|
16
|
DNA read count calibration for single-molecule, long-read sequencing. Sci Rep 2022; 12:17257. [PMID: 36319642 PMCID: PMC9626564 DOI: 10.1038/s41598-022-21606-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 09/29/2022] [Indexed: 11/17/2022] Open
Abstract
There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations.
Collapse
|
17
|
Tvedte ES, Gasser M, Zhao X, Tallon LJ, Sadzewicz L, Bromley RE, Chung M, Mattick J, Sparklin BC, Dunning Hotopp JC. Accumulation of endosymbiont genomes in an insect autosome followed by endosymbiont replacement. Curr Biol 2022; 32:2786-2795.e5. [PMID: 35671755 DOI: 10.1016/j.cub.2022.05.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 04/12/2022] [Accepted: 05/10/2022] [Indexed: 12/01/2022]
Abstract
Eukaryotic genomes can acquire bacterial DNA via lateral gene transfer (LGT).1 A prominent source of LGT is Wolbachia,2 a widespread endosymbiont of arthropods and nematodes that is transmitted maternally through female germline cells.3,4 The DNA transfer from the Wolbachia endosymbiont wAna to Drosophila ananassae is extensive5-7 and has been localized to chromosome 4, contributing to chromosome expansion in this lineage.6 As has happened frequently with claims of bacteria-to-eukaryote LGT, the contribution of wAna transfers to the expanded size of D. ananassae chromosome 4 has been specifically contested8 owing to an assembly where Wolbachia sequences were classified as contaminants and removed.9 Here, long-read sequencing with DNA from a Wolbachia-cured line enabled assembly of 4.9 Mbp of nuclear Wolbachia transfers (nuwts) in D. ananassae and a 24-kbp nuclear mitochondrial transfer. The nuwts are <8,000 years old in at least two locations in chromosome 4 with at least one whole-genome integration followed by rapid extensive duplication of most of the genome with regions that have up to 10 copies. The genes in nuwts are accumulating small indels and mobile element insertions. Among the highly duplicated genes are cifA and cifB, two genes associated with Wolbachia-mediated Drosophila cytoplasmic incompatibility. The wAna strain that was the source of nuwts was subsequently replaced by a different wAna endosymbiont. Direct RNA Nanopore sequencing of Wolbachia-cured lines identified nuwt transcripts, including spliced transcripts, but functionality, if any, remains elusive.
Collapse
Affiliation(s)
- Eric S Tvedte
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Mark Gasser
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Xuechu Zhao
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Luke J Tallon
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Lisa Sadzewicz
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Robin E Bromley
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Matthew Chung
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - John Mattick
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Benjamin C Sparklin
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA
| | - Julie C Dunning Hotopp
- Institute for Genome Sciences, University of Maryland School of Medicine, Health Sciences Facility III #2106, 670 West Baltimore Street, Baltimore, MD 21201, USA; Department of Microbiology and Immunology, University of Maryland School of Medicine, Baltimore, MD 21201, USA; Greenebaum Cancer Center, University of Maryland School of Medicine, Baltimore, MD 21201, USA.
| |
Collapse
|
18
|
Zhang X, Liu CG, Yang SH, Wang X, Bai FW, Wang Z. Benchmarking of long-read sequencing, assemblers and polishers for yeast genome. Brief Bioinform 2022; 23:6576452. [PMID: 35511110 DOI: 10.1093/bib/bbac146] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Revised: 03/26/2022] [Accepted: 03/31/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND The long reads of the third-generation sequencing significantly benefit the quality of the de novo genome assembly. However, its relatively high single-base error rate has been criticized. Currently, sequencing accuracy and throughput continue to improve, and many advanced tools are constantly emerging. PacBio HiFi sequencing and Oxford Nanopore Technologies (ONT) PromethION are two up-to-date platforms with low error rates and ultralong high-throughput reads. Therefore, it is urgently needed to select the appropriate sequencing platforms, depths and genome assembly tools for high-quality genomes in the era of explosive data production. METHODS We performed 455 (7 assemblers with 4 polishing pipelines or without polishing on 13 subsets with different depths) and 88 (4 assemblers with or without polishing on 11 subsets with different depths) de novo assemblies of Yeast S288C on high-coverage ONT and HiFi datasets, respectively. The assembly quality was evaluated by Quality Assessment Tool (QUAST), Benchmarking Universal Single-Copy Orthologs (BUSCO) and the newly proposed Comprehensive_score (C_score). In addition, we applied four preferable pipelines to assemble the genome of nonreference yeast strains. RESULTS The assembler plays an essential role in genome construction, especially for low-depth datasets. For ONT datasets, Flye is superior to other tools through C_score evaluation. Polishing by Pilon and Medaka improve accuracy and continuity of the preassemblies, respectively, and their combination pipeline worked well in most quality metrics. For HiFi datasets, Flye and NextDenovo performed better than other tools, and polishing is also necessary. Enough data depth is required for high-quality genome construction by ONT (>80X) and HiFi (>20X) datasets.
Collapse
Affiliation(s)
- Xue Zhang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Science of the Ministry of Education, Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders of the Ministry of Education, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Chen-Guang Liu
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Science of the Ministry of Education, Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders of the Ministry of Education, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shi-Hui Yang
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China
| | - Xia Wang
- State Key Laboratory of Biocatalysis and Enzyme Engineering, Environmental Microbial Technology Center of Hubei Province, and School of Life Sciences, Hubei University, Wuhan, 430062, China
| | - Feng-Wu Bai
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Science of the Ministry of Education, Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders of the Ministry of Education, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zhuo Wang
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Science of the Ministry of Education, Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders of the Ministry of Education, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
19
|
Bankevich A, Bzikadze AV, Kolmogorov M, Antipov D, Pevzner PA. Multiplex de Bruijn graphs enable genome assembly from long, high-fidelity reads. Nat Biotechnol 2022; 40:1075-1081. [PMID: 35228706 DOI: 10.1038/s41587-022-01220-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 01/11/2022] [Indexed: 11/09/2022]
Abstract
Although most existing genome assemblers are based on de Bruijn graphs, the construction of these graphs for large genomes and large k-mer sizes has remained elusive. This algorithmic challenge has become particularly pressing with the emergence of long, high-fidelity (HiFi) reads that have been recently used to generate a semi-manual telomere-to-telomere assembly of the human genome. To enable automated assemblies of long, HiFi reads, we present the La Jolla Assembler (LJA), a fast algorithm using the Bloom filter, sparse de Bruijn graphs and disjointig generation. LJA reduces the error rate in HiFi reads by three orders of magnitude, constructs the de Bruijn graph for large genomes and large k-mer sizes and transforms it into a multiplex de Bruijn graph with varying k-mer sizes. Compared to state-of-the-art assemblers, our algorithm not only achieves five-fold fewer misassemblies but also generates more contiguous assemblies. We demonstrate the utility of LJA via the automated assembly of a human genome that completely assembled six chromosomes.
Collapse
Affiliation(s)
- Anton Bankevich
- Department of Computer Science and Engineering, University of California, San Diego, San Diego CA, USA.
| | - Andrey V Bzikadze
- Program in Bioinformatics and Systems Biology, University of California, San Diego, San Diego CA, USA
| | - Mikhail Kolmogorov
- Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz CA, USA
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Institute for Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Pavel A Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, San Diego CA, USA.
| |
Collapse
|
20
|
Tegha G, Ciccone EJ, Krysiak R, Kaphatika J, Chikaonda T, Ndhlovu I, van Duin D, Hoffman I, Juliano JJ, Wang J. Genomic epidemiology of Escherichia coli isolates from a tertiary referral center in Lilongwe, Malawi. Microb Genom 2021; 7:mgen000490. [PMID: 33295867 PMCID: PMC8115906 DOI: 10.1099/mgen.0.000490] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 11/17/2020] [Indexed: 02/06/2023] Open
Abstract
Antimicrobial resistance (AMR) is a global threat, including in sub-Saharan Africa. However, little is known about the genetics of resistant bacteria in the region. In Malawi, there is growing concern about increasing rates of antimicrobial resistance to most empirically used antimicrobials. The highly drug resistant Escherichia coli sequence type (ST) 131, which is associated with the extended spectrum β-lactamase blaCTX-M-15, has been increasing in prevalence globally. Previous data from isolates collected between 2006 and 2013 in southern Malawi have revealed the presence of ST131 and the blaCTX-M-15 gene in the country. We performed whole genome sequencing (WGS) of 58 clinical E. coli isolates at Kamuzu Central Hospital, a tertiary care centre in central Malawi, collected from 2012 to 2018. We used Oxford Nanopore Technologies (ONT) sequencing, which was performed in Malawi. We show that ST131 is observed more often (14.9% increasing to 32.8%) and that the blaCTX-M-15 gene is occurring at a higher frequency (21.3% increasing to 44.8%). Phylogenetics indicates that isolates are highly related between the central and southern geographic regions and confirms that ST131 isolates are contained in a single group. All AMR genes, including blaCTX-M-15, were widely distributed across sequence types. We also identified an increased number of ST410 isolates, which in this study tend to carry a plasmid-located copy of blaCTX-M-15 gene at a higher frequency than blaCTX-M-15 occurs in ST131. This study confirms the expanding nature of ST131 and the wide distribution of the blaCTX-M-15 gene in Malawi. We also highlight the feasibility of conducting longitudinal genomic epidemiology studies of important bacteria with the sequencing done on site using a nanopore platform that requires minimal infrastructure.
Collapse
Affiliation(s)
| | - Emily J. Ciccone
- Division of Infectious Diseases, School of Medicine, University of North Carolina, USA
| | | | | | | | | | - David van Duin
- Division of Infectious Diseases, School of Medicine, University of North Carolina, USA
| | - Irving Hoffman
- Division of Infectious Diseases, School of Medicine, University of North Carolina, USA
| | - Jonathan J. Juliano
- Division of Infectious Diseases, School of Medicine, University of North Carolina, USA
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina, USA
- Curriculum in Genetics and Molecular Biology, School of Medicine, University of North Carolina, USA
| | - Jeremy Wang
- Department of Genetics, School of Medicine, University of North Carolina, USA
| |
Collapse
|