1
|
Wattanasombat S, Tongjai S. Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline. F1000Res 2024; 13:556. [PMID: 38984017 PMCID: PMC11231628 DOI: 10.12688/f1000research.149577.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 07/11/2024] Open
Abstract
Background Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources. Methods We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment. Results Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime. Conclusions The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Collapse
Affiliation(s)
- Sara Wattanasombat
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Siripong Tongjai
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| |
Collapse
|
2
|
Wambach JA, Wegner DJ, Kitzmiller J, White FV, Heins HB, Yang P, Paul AJ, Granadillo JL, Eghtesady P, Kuklinski C, Turner T, Fairman K, Stone K, Wilson T, Breman A, Smith J, Schroeder MC, Neidich JA, Whitsett JA, Cole FS. Homozygous, Intragenic Tandem Duplication of SFTPB Causes Neonatal Respiratory Failure. Am J Respir Cell Mol Biol 2024; 70:78-80. [PMID: 38156804 PMCID: PMC10768837 DOI: 10.1165/rcmb.2023-0156le] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2024] Open
Affiliation(s)
| | | | | | | | | | - Ping Yang
- Washington University School of MedicineSt. Louis, Missouri
| | | | | | | | | | - Tiffany Turner
- Indiana University School of MedicineIndianapolis, Indiana
| | - Korre Fairman
- Indiana University School of MedicineIndianapolis, Indiana
| | - Kristyne Stone
- Indiana University School of MedicineIndianapolis, Indiana
| | | | - Amy Breman
- Indiana University School of MedicineIndianapolis, Indiana
| | | | | | | | | | | |
Collapse
|
3
|
Spealman P, De T, Chuong JN, Gresham D. Best Practices in Microbial Experimental Evolution: Using Reporters and Long-Read Sequencing to Identify Copy Number Variation in Experimental Evolution. J Mol Evol 2023; 91:356-368. [PMID: 37012421 PMCID: PMC10275804 DOI: 10.1007/s00239-023-10102-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 02/21/2023] [Indexed: 04/05/2023]
Abstract
Copy number variants (CNVs), comprising gene amplifications and deletions, are a pervasive class of heritable variation. CNVs play a key role in rapid adaptation in both natural, and experimental, evolution. However, despite the advent of new DNA sequencing technologies, detection and quantification of CNVs in heterogeneous populations has remained challenging. Here, we summarize recent advances in the use of CNV reporters that provide a facile means of quantifying de novo CNVs at a specific locus in the genome, and nanopore sequencing, for resolving the often complex structures of CNVs. We provide guidance for the engineering and analysis of CNV reporters and practical guidelines for single-cell analysis of CNVs using flow cytometry. We summarize recent advances in nanopore sequencing, discuss the utility of this technology, and provide guidance for the bioinformatic analysis of these data to define the molecular structure of CNVs. The combination of reporter systems for tracking and isolating CNV lineages and long-read DNA sequencing for characterizing CNV structures enables unprecedented resolution of the mechanisms by which CNVs are generated and their evolutionary dynamics.
Collapse
Affiliation(s)
- Pieter Spealman
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Titir De
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - Julie N Chuong
- Department of Biology, New York University, New York, NY, 10003, USA
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA
| | - David Gresham
- Department of Biology, New York University, New York, NY, 10003, USA.
- Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
| |
Collapse
|
4
|
Mohammadi MM, Bavi O, Jamali Y. DNA sequencing via molecular dynamics simulation with functionalized graphene nanopore. J Mol Graph Model 2023; 122:108467. [PMID: 37028198 DOI: 10.1016/j.jmgm.2023.108467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 04/03/2023]
Abstract
Through this research, functionalized graphene nanopores are used to verify how effective such an apparatus for DNA sequencing is. The circular symmetric pores are functionalized with hydrogen and a hydroxyl group bonded with carbon atoms of the pore rim. Plus, two adenine bases are also put at the rim perimeter to verify whether such a combination would lead to base detection. A homopolymer of single-stranded DNA (ssDNA) is pulled through a nanopore using steered molecular dynamics (SMD) simulation. Pulling force profile, moving fashion of ssDNA in irreversible DNA pulling as well as the base orientation during translocation relative to the graphene plane, called beta angle, are assessed. Based on the studied parameters, SMD force, and base orientation, the hydrogenated and hydroxylated pores do not show a clear distinction between bases, while the adenine-functionalized pore can distinguish between adenine and cytosine. Therefore, there may be some hope for achieving single-base sequencing, while further research is needed.
Collapse
|
5
|
Chen P, Sun Z, Wang J, Liu X, Bai Y, Chen J, Liu A, Qiao F, Chen Y, Yuan C, Sha J, Zhang J, Xu LQ, Li J. Portable nanopore-sequencing technology: Trends in development and applications. Front Microbiol 2023; 14:1043967. [PMID: 36819021 PMCID: PMC9929578 DOI: 10.3389/fmicb.2023.1043967] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 01/03/2023] [Indexed: 02/04/2023] Open
Abstract
Sequencing technology is the most commonly used technology in molecular biology research and an essential pillar for the development and applications of molecular biology. Since 1977, when the first generation of sequencing technology opened the door to interpreting the genetic code, sequencing technology has been developing for three generations. It has applications in all aspects of life and scientific research, such as disease diagnosis, drug target discovery, pathological research, species protection, and SARS-CoV-2 detection. However, the first- and second-generation sequencing technology relied on fluorescence detection systems and DNA polymerization enzyme systems, which increased the cost of sequencing technology and limited its scope of applications. The third-generation sequencing technology performs PCR-free and single-molecule sequencing, but it still depends on the fluorescence detection device. To break through these limitations, researchers have made arduous efforts to develop a new advanced portable sequencing technology represented by nanopore sequencing. Nanopore technology has the advantages of small size and convenient portability, independent of biochemical reagents, and direct reading using physical methods. This paper reviews the research and development process of nanopore sequencing technology (NST) from the laboratory to commercially viable tools; discusses the main types of nanopore sequencing technologies and their various applications in solving a wide range of real-world problems. In addition, the paper collates the analysis tools necessary for performing different processing tasks in nanopore sequencing. Finally, we highlight the challenges of NST and its future research and application directions.
Collapse
Affiliation(s)
- Pin Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Zepeng Sun
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Jiawei Wang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Xinlong Liu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yun Bai
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Jiang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Anna Liu
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Feng Qiao
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China
| | - Yang Chen
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China
| | - Chenyan Yuan
- Clinical Laboratory, Southeast University Zhongda Hospital, Nanjing, China
| | - Jingjie Sha
- School of Mechanical Engineering, Southeast University, Nanjing, China
| | - Jinghui Zhang
- School of Computer Science and Technology, Southeast University, Nanjing, China
| | - Li-Qun Xu
- China Mobile (Chengdu) Industrial Research Institute, Chengdu, China,*Correspondence: Li-Qun Xu, ✉
| | - Jian Li
- Key Laboratory of DGHD, MOE, School of Life Science and Technology, Southeast University, Nanjing, China,Jian Li, ✉
| |
Collapse
|
6
|
Cai D, Shang J, Sun Y. HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization. Bioinformatics 2022; 38:5360-5367. [PMID: 36308467 PMCID: PMC9750122 DOI: 10.1093/bioinformatics/btac708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 10/06/2022] [Accepted: 10/25/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Lacking strict proofreading mechanisms, many RNA viruses can generate progeny with slightly changed genomes. Being able to characterize highly similar genomes (i.e. haplotypes) in one virus population helps study the viruses' evolution and their interactions with the host/other microbes. High-throughput sequencing data has become the major source for characterizing viral populations. However, the inherent limitation on read length by next-generation sequencing makes complete haplotype reconstruction difficult. RESULTS In this work, we present a new tool named HaploDMF that can construct complete haplotypes using third-generation sequencing (TGS) data. HaploDMF utilizes a deep matrix factorization model with an adapted loss function to learn latent features from aligned reads automatically. The latent features are then used to cluster reads of the same haplotype. Unlike existing tools whose performance can be affected by the overlap size between reads, HaploDMF is able to achieve highly robust performance on data with different coverage, haplotype number and error rates. In particular, it can generate more complete haplotypes even when the sequencing coverage drops in the middle. We benchmark HaploDMF against the state-of-the-art tools on simulated and real sequencing TGS data on different viruses. The results show that HaploDMF competes favorably against all others. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of HaploDMF are available at https://github.com/dhcai21/HaploDMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Jiayu Shang
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Yanni Sun
- To whom correspondence should be addressed.
| |
Collapse
|
7
|
Bruels CC, Littel HR, Daugherty AL, Stafki S, Estrella EA, McGaughy ES, Truong D, Badalamenti JP, Pais L, Ganesh VS, O'Donnell-Luria A, Stalker HJ, Wang Y, Collins C, Behlmann A, Lemmers RJLF, van der Maarel SM, Laine R, Ghosh PS, Darras BT, Zingariello CD, Pacak CA, Kunkel LM, Kang PB. Diagnostic capabilities of nanopore long-read sequencing in muscular dystrophy. Ann Clin Transl Neurol 2022; 9:1302-1309. [PMID: 35734998 PMCID: PMC9380148 DOI: 10.1002/acn3.51612] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/08/2022] [Accepted: 06/09/2022] [Indexed: 11/05/2022] Open
Abstract
Many individuals with muscular dystrophies remain genetically undiagnosed despite clinical diagnostic testing, including exome sequencing. Some may harbor previously undetected structural variants (SVs) or cryptic splice sites. We enrolled 10 unrelated families: nine had muscular dystrophy but lacked complete genetic diagnoses and one had an asymptomatic DMD duplication. Nanopore genomic long-read sequencing identified previously undetected pathogenic variants in four individuals: an SV in DMD, an SV in LAMA2, and two single nucleotide variants in DMD that alter splicing. The DMD duplication in the asymptomatic individual was in tandem. Nanopore sequencing may help streamline genetic diagnostic approaches for muscular dystrophy.
Collapse
Affiliation(s)
- Christine C Bruels
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| | - Hannah R Littel
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| | - Audrey L Daugherty
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| | - Seth Stafki
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| | - Elicia A Estrella
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts.,Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts
| | - Emily S McGaughy
- Division of Pediatric Neurology, Department of Pediatrics, University of Florida College of Medicine, Gainesville, Florida, 32610
| | - Don Truong
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| | - Jonathan P Badalamenti
- University of Minnesota Genomics Center, University of Minnesota, Minneapolis, Minnesota, 55455
| | - Lynn Pais
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts.,Program in Medical and Population Genetics, Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Analytic and Translational Genetics Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Vijay S Ganesh
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts.,Program in Medical and Population Genetics, Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Analytic and Translational Genetics Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts.,Department of Neurology, Brigham and Women's Hospital, Boston, Massachusetts
| | - Anne O'Donnell-Luria
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts.,Program in Medical and Population Genetics, Center for Mendelian Genomics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts.,Analytic and Translational Genetics Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Heather J Stalker
- Division of Genetics, Department of Pediatrics, University of Florida College of Medicine, Gainesville, Florida, 32610
| | - Yang Wang
- PerkinElmer Genomics, Pittsburgh, Pennsylvania
| | | | | | | | | | - Regina Laine
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts
| | - Partha S Ghosh
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts
| | - Basil T Darras
- Department of Neurology, Boston Children's Hospital, Boston, Massachusetts
| | - Carla D Zingariello
- Division of Pediatric Neurology, Department of Pediatrics, University of Florida College of Medicine, Gainesville, Florida, 32610
| | - Christina A Pacak
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| | - Louis M Kunkel
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, Massachusetts
| | - Peter B Kang
- Paul and Sheila Wellstone Muscular Dystrophy Center and Department of Neurology, University of Minnesota Medical School, Minneapolis, Minnesota, 55455.,Institute for Translational Neuroscience, University of Minnesota Medical School, Minneapolis, Minnesota, 55455
| |
Collapse
|
8
|
de la Morena-Barrio B, Orlando C, Sanchis-Juan A, García JL, Padilla J, de la Morena-Barrio ME, Puruunen M, Stouffs K, Cifuentes R, Borràs N, Bravo-Pérez C, Benito R, Cuenca-Guardiola J, Vicente V, Vidal F, Hernández-Rivas JM, Ouwehand W, Jochmans K, Corral J. Molecular Dissection of Structural Variations Involved in Antithrombin Deficiency. J Mol Diagn 2022; 24:462-475. [PMID: 35218943 DOI: 10.1016/j.jmoldx.2022.01.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/15/2021] [Accepted: 01/11/2022] [Indexed: 12/30/2022] Open
Abstract
Inherited antithrombin deficiency, the most severe form of thrombophilia, is predominantly caused by variants in SERPINC1. Few causal structural variants have been described, usually detected by multiplex ligation-dependent probe amplification or cytogenetic arrays, which only define the gain or loss and the approximate size and location. This study has done a complete dissection of the structural variants affecting SERPINC1 of 39 unrelated patients with antithrombin deficiency using multiplex ligation-dependent probe amplification, comparative genome hybridization array, long-range PCR, and whole genome nanopore sequencing. Structural variants, in all cases only affecting one allele, were deleterious and caused a severe type I deficiency. Most defects were deletions affecting exons of SERPINC1 (82.1%), but the whole cohort was heterogeneous, as tandem duplications, deletion of introns, or retrotransposon insertions were also detected. Their size was also variable, ranging from 193 bp to 8 Mb, and in 54% of the cases involved neighboring genes. All but two structural variants had repetitive elements and/or microhomologies in their breakpoints, suggesting a common mechanism of formation. This study also suggested regions recurrently involved in structural variants causing antithrombin deficiency and found three structural variants with a founder effect: the insertion of a retrotransposon, duplication of exon 6, and a 20-gene deletion. Finally, nanopore sequencing was determined to be the most appropriate method to identify and characterize all structural variants at nucleotide level, independently of their size or type.
Collapse
Affiliation(s)
- Belén de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Christelle Orlando
- Department of Haematology, Vrije Universiteit Brussel, Universitair Ziekenhuis Brussel, Brussels, Belgium
| | - Alba Sanchis-Juan
- Department of Haematology, University of Cambridge, National Health Service (NHS) Blood and Transplant Centre, Cambridge, United Kingdom; National Institute for Health Research BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Juan L García
- Cancer Research Center (Instituto Universitario de Biología Molecular y Celular del Cáncer) Consejo Superior de Investigaciones Científicas-University of Salamanca, Salamanca, Spain; Instituto de Investigación Biomédica, Department of Hematology, University Hospital of Salamanca, Department of Medicine, University of Salamanca, Salamanca, Spain
| | - José Padilla
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - María E de la Morena-Barrio
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Marija Puruunen
- National Heart, Lung, and Blood Institute Framingham Heart Study, Framingham, Massachusetts
| | - Katrien Stouffs
- Center for Medical Genetics, Vrije Universiteit Brussel, Universitair Ziekenhuis Brussel, Brussels, Belgium
| | - Rosa Cifuentes
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Nina Borràs
- Laboratori de Coagulopaties Congènites, Banc de Sang i Teixits, Barcelona, Medicina Transfusional, Vall d'Hebron Institut de Recerca, Universitat Autònoma de Barcelona, Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Instituto Carlos III, Barcelona, Spain
| | - Carlos Bravo-Pérez
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Rocio Benito
- Cancer Research Center (Instituto Universitario de Biología Molecular y Celular del Cáncer) Consejo Superior de Investigaciones Científicas-University of Salamanca, Salamanca, Spain; Instituto de Investigación Biomédica, Department of Hematology, University Hospital of Salamanca, Department of Medicine, University of Salamanca, Salamanca, Spain
| | - Javier Cuenca-Guardiola
- Departamento de Informática y Sistemas, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria-Arrixaca, Murcia, Spain
| | - Vicente Vicente
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain
| | - Francisco Vidal
- Laboratori de Coagulopaties Congènites, Banc de Sang i Teixits, Barcelona, Medicina Transfusional, Vall d'Hebron Institut de Recerca, Universitat Autònoma de Barcelona, Centro de Investigación Biomédica en Red de Enfermedades Cardiovasculares, Instituto Carlos III, Barcelona, Spain
| | - Jesús M Hernández-Rivas
- Cancer Research Center (Instituto Universitario de Biología Molecular y Celular del Cáncer) Consejo Superior de Investigaciones Científicas-University of Salamanca, Salamanca, Spain; Instituto de Investigación Biomédica, Department of Hematology, University Hospital of Salamanca, Department of Medicine, University of Salamanca, Salamanca, Spain
| | - Willem Ouwehand
- Department of Haematology, University of Cambridge, National Health Service (NHS) Blood and Transplant Centre, Cambridge, United Kingdom; National Institute for Health Research BioResource, Cambridge University Hospitals NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Kristin Jochmans
- Department of Haematology, Vrije Universiteit Brussel, Universitair Ziekenhuis Brussel, Brussels, Belgium
| | - Javier Corral
- Servicio de Hematología y Oncología Médica, Hospital Universitario Morales Meseguer, Centro Regional de Hemodonación, Universidad de Murcia, Instituto Murciano de Investigación Biosanitaria, Centro de Investigación Biomédica en Red de Enfermedades Raras, Murcia, Spain.
| |
Collapse
|
9
|
Lang J, Sun J, Yang Z, He L, He Y, Chen Y, Huang L, Li P, Li J, Qin L. Nano2NGS-Muta: a framework for converting nanopore sequencing data to NGS-liked sequencing data for hotspot mutation detection. NAR Genom Bioinform 2022; 4:lqac033. [PMID: 35464239 PMCID: PMC9022462 DOI: 10.1093/nargab/lqac033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 03/30/2022] [Accepted: 04/13/2022] [Indexed: 12/12/2022] Open
Abstract
Nanopore sequencing, also known as single-molecule real-time sequencing, is a third/fourth generation sequencing technology that enables deciphering single DNA/RNA molecules without the polymerase chain reaction. Although nanopore sequencing has made significant progress in scientific research and clinical practice, its application has been limited compared with next-generation sequencing (NGS) due to specific design principle and data characteristics, especially in hotspot mutation detection. Therefore, we developed Nano2NGS-Muta as a data analysis framework for hotspot mutation detection based on long reads from nanopore sequencing. Nano2NGS-Muta is characterized by applying nanopore sequencing data to NGS-liked data analysis pipelines. Long reads can be converted into short reads and then processed through existing NGS analysis pipelines in combination with statistical methods for hotspot mutation detection. Nano2NGS-Muta not only effectively avoids false positive/negative results caused by non-random errors and unexpected insertions-deletions (indels) of nanopore sequencing data, improves the detection accuracy of hotspot mutations compared to conventional nanopore sequencing data analysis algorithms but also breaks the barriers of data analysis methods between short-read sequencing and long-read sequencing. We hope Nano2NGS-Muta can serves as a reference method for nanopore sequencing data and promotes higher application scope of nanopore sequencing technology in scientific research and clinical practice.
Collapse
Affiliation(s)
- Jidong Lang
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Jiguo Sun
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Zhi Yang
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Lei He
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Yu He
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Yanmei Chen
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Lei Huang
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Ping Li
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Jialin Li
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| | - Liu Qin
- Bioinformatics and Product Development Department, Qitan Technology (Beijing) Co., Ltd, Beijing 100192, China
| |
Collapse
|
10
|
Cai D, Sun Y. Reconstructing viral haplotypes using long reads. Bioinformatics 2022; 38:2127-2134. [PMID: 35157018 DOI: 10.1093/bioinformatics/btac089] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 01/19/2022] [Accepted: 02/08/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Most RNA viruses lack strict proofreading during replication. Coupled with a high replication rate, some RNA viruses can form a virus population containing a group of genetically related but different haplotypes. Characterizing the haplotype composition in a virus population is thus important to understand viruses' evolution. Many attempts have been made to reconstruct viral haplotypes using next-generation sequencing (NGS) reads. However, the short length of NGS reads cannot cover distant single-nucleotide variants, making it difficult to reconstruct complete or near-complete haplotypes. Given the fast developments of third-generation sequencing technologies, a new opportunity has arisen for reconstructing full-length haplotypes with long reads. RESULTS In this work, we developed a new tool, RVHaplo to reconstruct haplotypes for known viruses from long reads. We tested it rigorously on both simulated and real viral sequencing data and compared it against other popular haplotype reconstruction tools. The results demonstrated that RVHaplo outperforms the state-of-the-art tools for viral haplotype reconstruction from long reads. Especially, RVHaplo can reconstruct the rare (1% abundance) haplotypes that other tools usually missed. AVAILABILITY AND IMPLEMENTATION The source code and the documentation of RVHaplo are available at https://github.com/dhcai21/RVHaplo. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR, China
| |
Collapse
|
11
|
Ben Khedher M, Ghedira K, Rolain JM, Ruimy R, Croce O. Application and Challenge of 3rd Generation Sequencing for Clinical Bacterial Studies. Int J Mol Sci 2022; 23:1395. [PMID: 35163319 PMCID: PMC8835973 DOI: 10.3390/ijms23031395] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Revised: 01/20/2022] [Accepted: 01/24/2022] [Indexed: 02/04/2023] Open
Abstract
Over the past 25 years, the powerful combination of genome sequencing and bioinformatics analysis has played a crucial role in interpreting information encoded in bacterial genomes. High-throughput sequencing technologies have paved the way towards understanding an increasingly wide range of biological questions. This revolution has enabled advances in areas ranging from genome composition to how proteins interact with nucleic acids. This has created unprecedented opportunities through the integration of genomic data into clinics for the diagnosis of genetic traits associated with disease. Since then, these technologies have continued to evolve, and recently, long-read sequencing has overcome previous limitations in terms of accuracy, thus expanding its applications in genomics, transcriptomics and metagenomics. In this review, we describe a brief history of the bacterial genome sequencing revolution and its application in public health and molecular epidemiology. We present a chronology that encompasses the various technological developments: whole-genome shotgun sequencing, high-throughput sequencing, long-read sequencing. We mainly discuss the application of next-generation sequencing to decipher bacterial genomes. Secondly, we highlight how long-read sequencing technologies go beyond the limitations of traditional short-read sequencing. We intend to provide a description of the guiding principles of the 3rd generation sequencing applications and ongoing improvements in the field of microbial medical research.
Collapse
Affiliation(s)
- Mariem Ben Khedher
- Bacteriology Laboratory, Archet 2 Hospital, CHU Nice, 06000 Nice, France
- Institute for Research on Cancer and Aging Nice (IRCAN), CNRS, INSERM, Université Côte d’Azur, 06108 Nice, France
| | - Kais Ghedira
- Laboratory of Bioinformatics, Biomathematics and Biostatistics, Institute Pasteur of Tunis, Tunis 1002, Tunisia;
| | - Jean-Marc Rolain
- IRD, APHM, MEPHI, IHU-Méditerranée Infection, Aix Marseille Université, 13005 Marseille, France;
| | - Raymond Ruimy
- Bacteriology Laboratory, Archet 2 Hospital, CHU Nice, 06000 Nice, France
- Centre Méditerranéen de Médecine Moléculaire (C3M), INSERM, Université Côte D’Azur, 06108 Nice, France
| | - Olivier Croce
- Institute for Research on Cancer and Aging Nice (IRCAN), CNRS, INSERM, Université Côte d’Azur, 06108 Nice, France
| |
Collapse
|
12
|
Hoang MTV, Irinyi L, Hu Y, Schwessinger B, Meyer W. Long-Reads-Based Metagenomics in Clinical Diagnosis With a Special Focus on Fungal Infections. Front Microbiol 2022; 12:708550. [PMID: 35069461 PMCID: PMC8770865 DOI: 10.3389/fmicb.2021.708550] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 12/03/2021] [Indexed: 12/12/2022] Open
Abstract
Identification of the causative infectious agent is essential in the management of infectious diseases, with the ideal diagnostic method being rapid, accurate, and informative, while remaining cost-effective. Traditional diagnostic techniques rely on culturing and cell propagation to isolate and identify the causative pathogen. These techniques are limited by the ability and the time required to grow or propagate an agent in vitro and the facts that identification based on morphological traits are non-specific, insensitive, and reliant on technical expertise. The evolution of next-generation sequencing has revolutionized genomic studies to generate more data at a cheaper cost. These are divided into short- and long-read sequencing technologies, depending on the length of reads generated during sequencing runs. Long-read sequencing also called third-generation sequencing emerged commercially through the instruments released by Pacific Biosciences and Oxford Nanopore Technologies, although relying on different sequencing chemistries, with the first one being more accurate both platforms can generate ultra-long sequence reads. Long-read sequencing is capable of entirely spanning previously established genomic identification regions or potentially small whole genomes, drastically improving the accuracy of the identification of pathogens directly from clinical samples. Long-read sequencing may also provide additional important clinical information, such as antimicrobial resistance profiles and epidemiological data from a single sequencing run. While initial applications of long-read sequencing in clinical diagnosis showed that it could be a promising diagnostic technique, it also has highlighted the need for further optimization. In this review, we show the potential long-read sequencing has in clinical diagnosis of fungal infections and discuss the pros and cons of its implementation.
Collapse
Affiliation(s)
- Minh Thuy Vi Hoang
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
| | - Laszlo Irinyi
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
- Sydney Infectious Disease Institute, The University of Sydney, Sydney, NSW, Australia
| | - Yiheng Hu
- Research School of Biology, Australia National University, Canberra, ACT, Australia
| | | | - Wieland Meyer
- Molecular Mycology Research Laboratory, Centre for Infectious Diseases and Microbiology, Faculty of Medicine and Health, Sydney Medical School, Westmead Clinical School, The University of Sydney, Sydney, NSW, Australia
- Westmead Institute for Medical Research, Westmead, NSW, Australia
- Sydney Infectious Disease Institute, The University of Sydney, Sydney, NSW, Australia
- Westmead Hospital (Research and Education Network), Westmead, NSW, Australia
| |
Collapse
|
13
|
Mohammadi MM, Bavi O. DNA sequencing: an overview of solid-state and biological nanopore-based methods. Biophys Rev 2021; 14:99-110. [PMID: 34840616 PMCID: PMC8609259 DOI: 10.1007/s12551-021-00857-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2021] [Accepted: 10/14/2021] [Indexed: 12/23/2022] Open
Abstract
The field of sequencing is a topic of significant interest since its emergence and has become increasingly important over time. Impressive achievements have been obtained in this field, especially in relations to DNA and RNA sequencing. Since the first achievements by Sanger and colleagues in the 1950s, many sequencing techniques have been developed, while others have disappeared. DNA sequencing has undergone three generations of major evolution. Each generation has its own specifications that are mentioned briefly. Among these generations, nanopore sequencing has its own exciting characteristics that have been given more attention here. Among pioneer technologies being used by the third-generation techniques, nanopores, either biological or solid-state, have been experimentally or theoretically extensively studied. All sequencing technologies have their own advantages and disadvantages, so nanopores are not free from this general rule. It is also generally pointed out what research has been done to overcome the obstacles. In this review, biological and solid-state nanopores are elaborated on, and applications of them are also discussed briefly.
Collapse
Affiliation(s)
- Mohammad M Mohammadi
- Department of Mechanical and Aerospace Engineering, Shiraz University of Technology, Shiraz, 71557-13876 Iran
| | - Omid Bavi
- Department of Mechanical and Aerospace Engineering, Shiraz University of Technology, Shiraz, 71557-13876 Iran
| |
Collapse
|
14
|
Nanopore sequencing technology, bioinformatics and applications. Nat Biotechnol 2021; 39:1348-1365. [PMID: 34750572 PMCID: PMC8988251 DOI: 10.1038/s41587-021-01108-x] [Citation(s) in RCA: 453] [Impact Index Per Article: 151.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Accepted: 09/22/2021] [Indexed: 12/13/2022]
Abstract
Rapid advances in nanopore technologies for sequencing single long DNA and RNA molecules have led to substantial improvements in accuracy, read length and throughput. These breakthroughs have required extensive development of experimental and bioinformatics methods to fully exploit nanopore long reads for investigations of genomes, transcriptomes, epigenomes and epitranscriptomes. Nanopore sequencing is being applied in genome assembly, full-length transcript detection and base modification detection and in more specialized areas, such as rapid clinical diagnoses and outbreak surveillance. Many opportunities remain for improving data quality and analytical approaches through the development of new nanopores, base-calling methods and experimental protocols tailored to particular applications.
Collapse
|
15
|
Dorado G, Gálvez S, Rosales TE, Vásquez VF, Hernández P. Analyzing Modern Biomolecules: The Revolution of Nucleic-Acid Sequencing - Review. Biomolecules 2021; 11:1111. [PMID: 34439777 PMCID: PMC8393538 DOI: 10.3390/biom11081111] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 07/12/2021] [Accepted: 07/23/2021] [Indexed: 02/06/2023] Open
Abstract
Recent developments have revolutionized the study of biomolecules. Among them are molecular markers, amplification and sequencing of nucleic acids. The latter is classified into three generations. The first allows to sequence small DNA fragments. The second one increases throughput, reducing turnaround and pricing, and is therefore more convenient to sequence full genomes and transcriptomes. The third generation is currently pushing technology to its limits, being able to sequence single molecules, without previous amplification, which was previously impossible. Besides, this represents a new revolution, allowing researchers to directly sequence RNA without previous retrotranscription. These technologies are having a significant impact on different areas, such as medicine, agronomy, ecology and biotechnology. Additionally, the study of biomolecules is revealing interesting evolutionary information. That includes deciphering what makes us human, including phenomena like non-coding RNA expansion. All this is redefining the concept of gene and transcript. Basic analyses and applications are now facilitated with new genome editing tools, such as CRISPR. All these developments, in general, and nucleic-acid sequencing, in particular, are opening a new exciting era of biomolecule analyses and applications, including personalized medicine, and diagnosis and prevention of diseases for humans and other animals.
Collapse
Affiliation(s)
- Gabriel Dorado
- Dep. Bioquímica y Biología Molecular, Campus Rabanales C6-1-E17, Campus de Excelencia Internacional Agroalimentario (ceiA3), Universidad de Córdoba, 14071 Córdoba, Spain
| | - Sergio Gálvez
- Dep. Lenguajes y Ciencias de la Computación, Boulevard Louis Pasteur 35, Universidad de Málaga, 29071 Málaga, Spain;
| | - Teresa E. Rosales
- Laboratorio de Arqueobiología, Avda. Universitaria s/n, Universidad Nacional de Trujillo, 13011 Trujillo, Peru;
| | - Víctor F. Vásquez
- Centro de Investigaciones Arqueobiológicas y Paleoecológicas Andinas Arqueobios, Martínez de Companón 430-Bajo 100, Urbanización San Andres, 13088 Trujillo, Peru;
| | - Pilar Hernández
- Instituto de Agricultura Sostenible (IAS), Consejo Superior de Investigaciones Científicas (CSIC), Alameda del Obispo s/n, 14080 Córdoba, Spain;
| |
Collapse
|