1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Shelton WJ, Zandpazandi S, Nix JS, Gokden M, Bauer M, Ryan KR, Wardell CP, Vaske OM, Rodriguez A. Long-read sequencing for brain tumors. Front Oncol 2024; 14:1395985. [PMID: 38915364 PMCID: PMC11194609 DOI: 10.3389/fonc.2024.1395985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 05/27/2024] [Indexed: 06/26/2024] Open
Abstract
Brain tumors and genomics have a long-standing history given that glioblastoma was the first cancer studied by the cancer genome atlas. The numerous and continuous advances through the decades in sequencing technologies have aided in the advanced molecular characterization of brain tumors for diagnosis, prognosis, and treatment. Since the implementation of molecular biomarkers by the WHO CNS in 2016, the genomics of brain tumors has been integrated into diagnostic criteria. Long-read sequencing, also known as third generation sequencing, is an emerging technique that allows for the sequencing of longer DNA segments leading to improved detection of structural variants and epigenetics. These capabilities are opening a way for better characterization of brain tumors. Here, we present a comprehensive summary of the state of the art of third-generation sequencing in the application for brain tumor diagnosis, prognosis, and treatment. We discuss the advantages and potential new implementations of long-read sequencing into clinical paradigms for neuro-oncology patients.
Collapse
Affiliation(s)
- William J. Shelton
- Department of Neurosurgery, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Sara Zandpazandi
- Department of Neurosurgery, Medical University of South Carolina, Charleston, SC, United States
| | - J Stephen Nix
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Murat Gokden
- Department of Pathology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Michael Bauer
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Katie Rose Ryan
- Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Christopher P. Wardell
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| | - Olena Morozova Vaske
- Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, CA, United States
| | - Analiz Rodriguez
- Department of Neurosurgery, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR, United States
| |
Collapse
|
3
|
Wang R, Chen J. NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning. BMC Genomics 2024; 25:573. [PMID: 38849740 PMCID: PMC11157743 DOI: 10.1186/s12864-024-10446-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/22/2024] [Indexed: 06/09/2024] Open
Abstract
BACKGROUNDS The single-pass long reads generated by third-generation sequencing technology exhibit a higher error rate. However, the circular consensus sequencing (CCS) produces shorter reads. Thus, it is effective to manage the error rate of long reads algorithmically with the help of the homologous high-precision and low-cost short reads from the Next Generation Sequencing (NGS) technology. METHODS In this work, a hybrid error correction method (NmTHC) based on a generative neural machine translation model is proposed to automatically capture discrepancies within the aligned regions of long reads and short reads, as well as the contextual relationships within the long reads themselves for error correction. Akin to natural language sequences, the long read can be regarded as a special "genetic language" and be processed with the idea of generative neural networks. The algorithm builds a sequence-to-sequence(seq2seq) framework with Recurrent Neural Network (RNN) as the core layer. The before and post-corrected long reads are regarded as the sentences in the source and target language of translation, and the alignment information of long reads with short reads is used to create the special corpus for training. The well-trained model can be used to predict the corrected long read. RESULTS NmTHC outperforms the latest mainstream hybrid error correction methods on real-world datasets from two mainstream platforms, including PacBio and Nanopore. Our experimental evaluation results demonstrate that NmTHC can align more bases with the reference genome without any segmenting in the six benchmark datasets, proving that it enhances alignment identity without sacrificing any length advantages of long reads. CONCLUSION Consequently, NmTHC reasonably adopts the generative Neural Machine Translation (NMT) model to transform hybrid error correction tasks into machine translation problems and provides a novel perspective for solving long-read error correction problems with the ideas of Natural Language Processing (NLP). More remarkably, the proposed methodology is sequencing-technology-independent and can produce more precise reads.
Collapse
Affiliation(s)
- Rongshu Wang
- Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China
| | - Jianhua Chen
- Department of Electronic Engineering, Information School, Yunnan University, Kunming, Yunnan, China.
| |
Collapse
|
4
|
Wattanasombat S, Tongjai S. Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline. F1000Res 2024; 13:556. [PMID: 38984017 PMCID: PMC11231628 DOI: 10.12688/f1000research.149577.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 07/11/2024] Open
Abstract
Background Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources. Methods We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment. Results Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime. Conclusions The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Collapse
Affiliation(s)
- Sara Wattanasombat
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Siripong Tongjai
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| |
Collapse
|
5
|
Szakállas N, Barták BK, Valcz G, Nagy ZB, Takács I, Molnár B. Can long-read sequencing tackle the barriers, which the next-generation could not? A review. Pathol Oncol Res 2024; 30:1611676. [PMID: 38818014 PMCID: PMC11137202 DOI: 10.3389/pore.2024.1611676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 04/30/2024] [Indexed: 06/01/2024]
Abstract
The large-scale heterogeneity of genetic diseases necessitated the deeper examination of nucleotide sequence alterations enhancing the discovery of new targeted drug attack points. The appearance of new sequencing techniques was essential to get more interpretable genomic data. In contrast to the previous short-reads, longer lengths can provide a better insight into the potential health threatening genetic abnormalities. Long-reads offer more accurate variant identification and genome assembly methods, indicating advances in nucleotide deflect-related studies. In this review, we introduce the historical background of sequencing technologies and show their benefits and limits, as well. Furthermore, we highlight the differences between short- and long-read approaches, including their unique advances and difficulties in methodologies and evaluation. Additionally, we provide a detailed description of the corresponding bioinformatics and the current applications.
Collapse
Affiliation(s)
- Nikolett Szakállas
- Department of Biological Physics, Faculty of Science, Eötvös Loránd University, Budapest, Hungary
| | - Barbara K. Barták
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Gábor Valcz
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
- HUN-REN-SU Translational Extracellular Vesicle Research Group, Budapest, Hungary
| | - Zsófia B. Nagy
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - István Takács
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Béla Molnár
- Department of Internal Medicine and Oncology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| |
Collapse
|
6
|
Li H, Durbin R. Genome assembly in the telomere-to-telomere era. Nat Rev Genet 2024:10.1038/s41576-024-00718-w. [PMID: 38649458 DOI: 10.1038/s41576-024-00718-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/27/2024] [Indexed: 04/25/2024]
Abstract
Genome sequences largely determine the biology and encode the history of an organism, and de novo assembly - the process of reconstructing the genome sequence of an organism from sequencing reads - has been a central problem in bioinformatics for four decades. Until recently, genomes were typically assembled into fragments of a few megabases at best, but now technological advances in long-read sequencing enable the near-complete assembly of each chromosome - also known as telomere-to-telomere assembly - for many organisms. Here, we review recent progress on assembly algorithms and protocols, with a focus on how to derive near-telomere-to-telomere assemblies. We also discuss the additional developments that will be required to resolve remaining assembly gaps and to assemble non-diploid genomes.
Collapse
Affiliation(s)
- Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
| | - Richard Durbin
- Department of Genetics, Cambridge University, Cambridge, UK.
| |
Collapse
|
7
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
8
|
Heath HD, Peng S, Szmatola T, Bellone RR, Kalbfleisch T, Petersen JL, Finno CJ. A Comprehensive Allele Specific Expression Resource for the Equine Transcriptome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.31.573798. [PMID: 38260378 PMCID: PMC10802363 DOI: 10.1101/2023.12.31.573798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Background Allele-specific expression (ASE) analysis provides a nuanced view of cis-regulatory mechanisms affecting gene expression. Results In this work, we introduce and highlight the significance of an equine ASE analysis, containing integrated long- and short-read RNA sequencing data, along with insight from histone modification data, from four healthy Thoroughbreds (2 mares and 2 stallions) across 9 tissues. Conclusions This valuable publicly accessible resource is poised to facilitate investigations into regulatory variation in equine tissues and foster a deeper understanding of the impact of allelic imbalance in equine health and disease at the molecular level.
Collapse
|
9
|
Schäfer L, Jehle JA, Kleespies RG, Wennmann JT. A practical guide and Galaxy workflow to avoid inter-plasmidic repeat collapse and false gene loss in Unicycler's hybrid assemblies. Microb Genom 2024; 10:001173. [PMID: 38197876 PMCID: PMC10868617 DOI: 10.1099/mgen.0.001173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 12/18/2023] [Indexed: 01/11/2024] Open
Abstract
Generating complete, high-quality genome assemblies is key for any downstream analysis, such as comparative genomics. For bacterial genome assembly, various algorithms and fully automated pipelines exist, which are free-of-charge and easily accessible. However, these assembly tools often cannot unambiguously resolve a bacterial genome, for example due to the presence of sequence repeat structures on the chromosome or on plasmids. Then, a more sophisticated approach and/or manual curation is needed. Such modifications can be challenging, especially for non-bioinformaticians, because they are generally not considered as a straightforward process. In this study, we propose a standardized approach for manual genome completion focusing on the popular hybrid assembly pipeline Unicycler. The provided Galaxy workflow addresses two weaknesses in Unicycler's hybrid assemblies: (i) collapse of inter-plasmidic repeats and (ii) false loss of single-copy sequences. To demonstrate and validate how to detect and resolve these assembly errors, we use two genomes from the Bacillus cereus group. By applying the proposed pipeline following an automated assembly, the genome sequence quality can be significantly improved.
Collapse
Affiliation(s)
- Lea Schäfer
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Johannes A. Jehle
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Regina G. Kleespies
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Jörg T. Wennmann
- Julius Kühn Institute (JKI) – Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| |
Collapse
|
10
|
Guo Y, Feng X, Li H. Evaluation of haplotype-aware long-read error correction with hifieval. Bioinformatics 2023; 39:btad631. [PMID: 37851384 PMCID: PMC10612404 DOI: 10.1093/bioinformatics/btad631] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 09/18/2023] [Accepted: 10/17/2023] [Indexed: 10/19/2023] Open
Abstract
SUMMARY The PacBio High-Fidelity (HiFi) sequencing technology produces long reads of >99% in accuracy. It has enabled the development of a new generation of de novo sequence assemblers, which all have sequencing error correction (EC) as the first step. As HiFi is a new data type, this critical step has not been evaluated before. Here, we introduced hifieval, a new command-line tool for measuring over- and under-corrections produced by EC algorithms. We assessed the accuracy of the EC components of existing HiFi assemblers on the CHM13 and the HG002 datasets and further investigated the performance of EC methods in challenging regions such as homopolymer regions, centromeric regions, and segmental duplications. Hifieval will help HiFi assemblers to improve EC and assembly quality in the long run. AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/magspho/hifieval.
Collapse
Affiliation(s)
- Yujie Guo
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, United States
| | - Xiaowen Feng
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, United States
| | - Heng Li
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, United States
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02215, United States
| |
Collapse
|
11
|
Lee C, Polo RO, Zaheer R, Van Domselaar G, Zovoilis A, McAllister TA. Evaluation of metagenomic assembly methods for the detection and characterization of antimicrobial resistance determinants and associated mobilizable elements. J Microbiol Methods 2023; 213:106815. [PMID: 37699502 DOI: 10.1016/j.mimet.2023.106815] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 08/31/2023] [Accepted: 08/31/2023] [Indexed: 09/14/2023]
Abstract
Antimicrobial resistance genes (ARGs) can be transferred between members of a bacterial population by mobile genetic elements (MGE). Understanding the risk of these transfer events is important in monitoring and predicting antimicrobial resistance (AMR), especially in the context of a One Health Continuum. However, there is no universally accepted method for detection of ARGs and MGEs, and especially for determining their linkages. This study used publicly available shotgun metagenomic DNA short-read (Illumina, 100 bp paired-end) sequence data from samples across the One Health Continuum (including beef cattle composite feces from feedlots, catch basin water at feedlots, agricultural soil from feedlot manured surrounding fields, and urban/municipal sewage influent from two municipal wastewater treatment plants) to develop a workflow to identify and associate ARGs and MGEs. ARG- and MGE-based targeted-assemblies with available short-read data were unable to meet this analysis goal. In contrast, de novo assembly of contigs provided enough sequence context to associate ARGs and MGEs, without compromising discovery rate. However, to estimate the relative abundance of these elements, unassembled sequence data must still be used.
Collapse
Affiliation(s)
- Catrione Lee
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Government of Canada, 5403 1st Avenue South, Lethbridge, AB T1J 4B1, Canada; Department of Chemistry and Biochemistry, University of Lethbridge, 4401 University Drive West, Lethbridge, AB T3M 2L7, Canada
| | - Rodrigo Ortega Polo
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Government of Canada, 5403 1st Avenue South, Lethbridge, AB T1J 4B1, Canada
| | - Rahat Zaheer
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Government of Canada, 5403 1st Avenue South, Lethbridge, AB T1J 4B1, Canada
| | - Gary Van Domselaar
- National Microbiology Laboratory, Public Health Agency of Canada, Government of Canada, 1015 Arlington Street, Winnipeg, MB R3E 3R2, Canada
| | - Athanasios Zovoilis
- Department of Chemistry and Biochemistry, University of Lethbridge, 4401 University Drive West, Lethbridge, AB T3M 2L7, Canada
| | - Tim A McAllister
- Lethbridge Research and Development Centre, Agriculture and Agri-Food Canada, Government of Canada, 5403 1st Avenue South, Lethbridge, AB T1J 4B1, Canada.
| |
Collapse
|
12
|
Wang J, Veldsman WP, Fang X, Huang Y, Xie X, Lyu A, Zhang L. Benchmarking multi-platform sequencing technologies for human genome assembly. Brief Bioinform 2023; 24:bbad300. [PMID: 37594299 DOI: 10.1093/bib/bbad300] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 07/12/2023] [Accepted: 07/26/2023] [Indexed: 08/19/2023] Open
Abstract
Genome assembly is a computational technique that involves piecing together deoxyribonucleic acid (DNA) fragments generated by sequencing technologies to create a comprehensive and precise representation of the entire genome. Generating a high-quality human reference genome is a crucial prerequisite for comprehending human biology, and it is also vital for downstream genomic variation analysis. Many efforts have been made over the past few decades to create a complete and gapless reference genome for humans by using a diverse range of advanced sequencing technologies. Several available tools are aimed at enhancing the quality of haploid and diploid human genome assemblies, which include contig assembly, polishing of contig errors, scaffolding and variant phasing. Selecting the appropriate tools and technologies remains a daunting task despite several studies have investigated the pros and cons of different assembly strategies. The goal of this paper was to benchmark various strategies for human genome assembly by combining sequencing technologies and tools on two publicly available samples (NA12878 and NA24385) from Genome in a Bottle. We then compared their performances in terms of continuity, accuracy, completeness, variant calling and phasing. We observed that PacBio HiFi long-reads are the optimal choice for generating an assembly with low base errors. On the other hand, we were able to produce the most continuous contigs with Oxford Nanopore long-reads, but they may require further polishing to improve on quality. We recommend using short-reads rather than long-reads themselves to improve the base accuracy of contigs from Oxford Nanopore long-reads. Hi-C is the best choice for chromosome-level scaffolding because it can capture the longest-range DNA connectedness compared to 10× linked-reads and Bionano optical maps. However, a combination of multiple technologies can be used to further improve the quality and completeness of genome assembly. For diploid assembly, hifiasm is the best tool for human diploid genome assembly using PacBio HiFi and Hi-C data. Looking to the future, we expect that further advancements in human diploid assemblers will leverage the power of PacBio HiFi reads and other technologies with long-range DNA connectedness to enable the generation of high-quality, chromosome-level and haplotype-resolved human genome assemblies.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Werner Pieter Veldsman
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | | | | | | | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
| | - Lu Zhang
- Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
- Institute for Research and Continuing Education, Hong Kong Baptist University, Shenzhen, China
| |
Collapse
|
13
|
van Dijk EL, Naquin D, Gorrichon K, Jaszczyszyn Y, Ouazahrou R, Thermes C, Hernandez C. Genomics in the long-read sequencing era. Trends Genet 2023; 39:649-671. [PMID: 37230864 DOI: 10.1016/j.tig.2023.04.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/21/2023] [Accepted: 04/25/2023] [Indexed: 05/27/2023]
Abstract
Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.
Collapse
Affiliation(s)
- Erwin L van Dijk
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France.
| | - Delphine Naquin
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Kévin Gorrichon
- National Center of Human Genomics Research (CNRGH), 91000 Évry-Courcouronnes, France
| | - Yan Jaszczyszyn
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Rania Ouazahrou
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Claude Thermes
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Céline Hernandez
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
14
|
Ojala T, Häkkinen AE, Kankuri E, Kankainen M. Current concepts, advances, and challenges in deciphering the human microbiota with metatranscriptomics. Trends Genet 2023; 39:686-702. [PMID: 37365103 DOI: 10.1016/j.tig.2023.05.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 05/24/2023] [Accepted: 05/25/2023] [Indexed: 06/28/2023]
Abstract
Metatranscriptomics refers to the analysis of the collective microbial transcriptome of a sample. Its increased utilization for the characterization of human-associated microbial communities has enabled the discovery of many disease-state related microbial activities. Here, we review the principles of metatranscriptomics-based analysis of human-associated microbial samples. We describe strengths and weaknesses of popular sample preparation, sequencing, and bioinformatics approaches and summarize strategies for their use. We then discuss how human-associated microbial communities have recently been examined and how their characterization may change. We conclude that metatranscriptomics insights into human microbiotas under health and disease have not only expanded our knowledge on human health, but also opened avenues for rational antimicrobial drug use and disease management.
Collapse
Affiliation(s)
- Teija Ojala
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | | | - Esko Kankuri
- Department of Pharmacology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Matti Kankainen
- Hematology Research Unit, University of Helsinki, Helsinki, Finland; Laboratory of Genetics, HUS Diagnostic Center, Hospital District of Helsinki and Uusimaa (HUS), Helsinki, Finland.
| |
Collapse
|
15
|
Ruiz JL, Reimering S, Escobar-Prieto JD, Brancucci NMB, Echeverry DF, Abdi AI, Marti M, Gómez-Díaz E, Otto TD. From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA). Brief Bioinform 2023; 24:bbad248. [PMID: 37406192 PMCID: PMC10359078 DOI: 10.1093/bib/bbad248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 05/24/2023] [Accepted: 06/16/2023] [Indexed: 07/07/2023] Open
Abstract
Recent advances in long read technologies not only enable large consortia to aim to sequence all eukaryotes on Earth, but they also allow individual laboratories to sequence their species of interest with relatively low investment. Long read technologies embody the promise of overcoming scaffolding problems associated with repeats and low complexity sequences, but the number of contigs often far exceeds the number of chromosomes and they may contain many insertion and deletion errors around homopolymer tracts. To overcome these issues, we have implemented the ILRA pipeline to correct long read-based assemblies. Contigs are first reordered, renamed, merged, circularized, or filtered if erroneous or contaminated. Illumina short reads are used subsequently to correct homopolymer errors. We successfully tested our approach by improving the genome sequences of Homo sapiens, Trypanosoma brucei, and Leptosphaeria spp., and by generating four novel Plasmodium falciparum assemblies from field samples. We found that correcting homopolymer tracts reduced the number of genes incorrectly annotated as pseudogenes, but an iterative approach seems to be required to correct more sequencing errors. In summary, we describe and benchmark the performance of our new tool, which improved the quality of novel long read assemblies up to 1 Gbp. The pipeline is available at GitHub: https://github.com/ThomasDOtto/ILRA.
Collapse
Affiliation(s)
- José Luis Ruiz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Susanne Reimering
- Department for Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Nicolas M B Brancucci
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
- Department of Medical Parasitology and Infection Biology, Swiss Tropical and Public Health Institute, 4123 Allschwil, Switzerland
- University of Basel, 4001 Basel, Switzerland
| | - Diego F Echeverry
- Centro Internacional de Entrenamiento e Investigaciones Médicas (CIDEIM), Cali, Colombia
- Departamento de Microbiología, Facultad de Salud, Universidad del Valle, Cali, Colombia
| | | | - Matthias Marti
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| | - Elena Gómez-Díaz
- Instituto de Parasitología y Biomedicina López-Neyra (IPBLN), Consejo Superior de Investigaciones Científicas, 18016, Granada, Spain
| | - Thomas D Otto
- School of Infection & Immunity, MVLS, University of Glasgow, Glasgow, UK
| |
Collapse
|
16
|
Karikari B, Lemay MA, Belzile F. k-mer-Based Genome-Wide Association Studies in Plants: Advances, Challenges, and Perspectives. Genes (Basel) 2023; 14:1439. [PMID: 37510343 PMCID: PMC10379394 DOI: 10.3390/genes14071439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 07/04/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Genome-wide association studies (GWAS) have allowed the discovery of marker-trait associations in crops over recent decades. However, their power is hampered by a number of limitations, with the key one among them being an overreliance on single-nucleotide polymorphisms (SNPs) as molecular markers. Indeed, SNPs represent only one type of genetic variation and are usually derived from alignment to a single genome assembly that may be poorly representative of the population under study. To overcome this, k-mer-based GWAS approaches have recently been developed. k-mer-based GWAS provide a universal way to assess variation due to SNPs, insertions/deletions, and structural variations without having to specifically detect and genotype these variants. In addition, k-mer-based analyses can be used in species that lack a reference genome. However, the use of k-mers for GWAS presents challenges such as data size and complexity, lack of standard tools, and potential detection of false associations. Nevertheless, efforts are being made to overcome these challenges and a general analysis workflow has started to emerge. We identify the priorities for k-mer-based GWAS in years to come, notably in the development of user-friendly programs for their analysis and approaches for linking significant k-mers to sequence variation.
Collapse
Affiliation(s)
- Benjamin Karikari
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
- Department of Agricultural Biotechnology, Faculty of Agriculture, Food and Consumer Sciences, University for Development Studies, Tamale P.O. Box TL 1882, Ghana
| | - Marc-André Lemay
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - François Belzile
- Département de Phytologie, Université Laval, Quebec City, QC G1V 0A6, Canada
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| |
Collapse
|
17
|
Gable SM, Mendez JM, Bushroe NA, Wilson A, Byars MI, Tollis M. The State of Squamate Genomics: Past, Present, and Future of Genome Research in the Most Speciose Terrestrial Vertebrate Order. Genes (Basel) 2023; 14:1387. [PMID: 37510292 PMCID: PMC10379679 DOI: 10.3390/genes14071387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 06/28/2023] [Accepted: 06/29/2023] [Indexed: 07/30/2023] Open
Abstract
Squamates include more than 11,000 extant species of lizards, snakes, and amphisbaenians, and display a dazzling diversity of phenotypes across their over 200-million-year evolutionary history on Earth. Here, we introduce and define squamates (Order Squamata) and review the history and promise of genomic investigations into the patterns and processes governing squamate evolution, given recent technological advances in DNA sequencing, genome assembly, and evolutionary analysis. We survey the most recently available whole genome assemblies for squamates, including the taxonomic distribution of available squamate genomes, and assess their quality metrics and usefulness for research. We then focus on disagreements in squamate phylogenetic inference, how methods of high-throughput phylogenomics affect these inferences, and demonstrate the promise of whole genomes to settle or sustain persistent phylogenetic arguments for squamates. We review the role transposable elements play in vertebrate evolution, methods of transposable element annotation and analysis, and further demonstrate that through the understanding of the diversity, abundance, and activity of transposable elements in squamate genomes, squamates can be an ideal model for the evolution of genome size and structure in vertebrates. We discuss how squamate genomes can contribute to other areas of biological research such as venom systems, studies of phenotypic evolution, and sex determination. Because they represent more than 30% of the living species of amniote, squamates deserve a genome consortium on par with recent efforts for other amniotes (i.e., mammals and birds) that aim to sequence most of the extant families in a clade.
Collapse
Affiliation(s)
- Simone M Gable
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Jasmine M Mendez
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Nicholas A Bushroe
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Adam Wilson
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Michael I Byars
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| | - Marc Tollis
- School of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff, AZ 86011, USA
| |
Collapse
|
18
|
Boßelmann CM, Leu C, Lal D. Technological and computational approaches to detect somatic mosaicism in epilepsy. Neurobiol Dis 2023:106208. [PMID: 37343892 DOI: 10.1016/j.nbd.2023.106208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 06/03/2023] [Accepted: 06/16/2023] [Indexed: 06/23/2023] Open
Abstract
Lesional epilepsy is a common and severe disease commonly associated with malformations of cortical development, including focal cortical dysplasia and hemimegalencephaly. Recent advances in sequencing and variant calling technologies have identified several genetic causes, including both short/single nucleotide and structural somatic variation. In this review, we aim to provide a comprehensive overview of the methodological advancements in this field while highlighting the unresolved technological and computational challenges that persist, including ultra-low variant allele fractions in bulk tissue, low availability of paired control samples, spatial variability of mutational burden within the lesion, and the issue of false-positive calls and validation procedures. Information from genetic testing in focal epilepsy may be integrated into clinical care to inform histopathological diagnosis, postoperative prognosis, and candidate precision therapies.
Collapse
Affiliation(s)
- Christian M Boßelmann
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA
| | - Costin Leu
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Department of Clinical and Experimental Epilepsy, Institute of Neurology, University College London, London, UK.
| | - Dennis Lal
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA; Epilepsy Center, Neurological Institute, Cleveland Clinic, Cleveland, OH, USA; Stanley Center for Psychiatric Research, Broad Institute of Harvard and M.I.T., Cambridge, MA, USA; Cologne Center for Genomics (CCG), University of Cologne, Cologne, DE, USA
| |
Collapse
|
19
|
Mastrorosa FK, Miller DE, Eichler EE. Applications of long-read sequencing to Mendelian genetics. Genome Med 2023; 15:42. [PMID: 37316925 DOI: 10.1186/s13073-023-01194-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Accepted: 05/18/2023] [Indexed: 06/16/2023] Open
Abstract
Advances in clinical genetic testing, including the introduction of exome sequencing, have uncovered the molecular etiology for many rare and previously unsolved genetic disorders, yet more than half of individuals with a suspected genetic disorder remain unsolved after complete clinical evaluation. A precise genetic diagnosis may guide clinical treatment plans, allow families to make informed care decisions, and permit individuals to participate in N-of-1 trials; thus, there is high interest in developing new tools and techniques to increase the solve rate. Long-read sequencing (LRS) is a promising technology for both increasing the solve rate and decreasing the amount of time required to make a precise genetic diagnosis. Here, we summarize current LRS technologies, give examples of how they have been used to evaluate complex genetic variation and identify missing variants, and discuss future clinical applications of LRS. As costs continue to decrease, LRS will find additional utility in the clinical space fundamentally changing how pathological variants are discovered and eventually acting as a single-data source that can be interrogated multiple times for clinical service.
Collapse
Affiliation(s)
| | - Danny E Miller
- Division of Genetic Medicine, Department of Pediatrics, University of Washington and Seattle Children's Hospital, Seattle, WA, 98195, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, 98195, USA
- Brotman Baty Institute for Precision Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, 98195, USA.
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, 98195, USA.
| |
Collapse
|
20
|
Ezoe A, Iuchi S, Sakurai T, Aso Y, Tokunaga H, Vu AT, Utsumi Y, Takahashi S, Tanaka M, Ishida J, Ishitani M, Seki M. Fully sequencing the cassava full-length cDNA library reveals unannotated transcript structures and alternative splicing events in regions with a high density of single nucleotide variations, insertions-deletions, and heterozygous sequences. PLANT MOLECULAR BIOLOGY 2023; 112:33-45. [PMID: 37014509 DOI: 10.1007/s11103-023-01346-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 02/27/2023] [Indexed: 05/09/2023]
Abstract
The primary transcript structure provides critical insights into protein diversity, transcriptional modification, and functions. Cassava transcript structures are highly diverse because of alternative splicing (AS) events and high heterozygosity. To precisely determine and characterize transcript structures, fully sequencing cloned transcripts is the most reliable method. However, cassava annotations were mainly determined according to fragmentation-based sequencing analyses (e.g., EST and short-read RNA-seq). In this study, we sequenced the cassava full-length cDNA library, which included rare transcripts. We obtained 8,628 non-redundant fully sequenced transcripts and detected 615 unannotated AS events and 421 unannotated loci. The different protein sequences resulting from the unannotated AS events tended to have diverse functional domains, implying that unannotated AS contributes to the truncation of functional domains. The unannotated loci tended to be derived from orphan genes, implying that the loci may be associated with cassava-specific traits. Unexpectedly, individual cassava transcripts were more likely to have multiple AS events than Arabidopsis transcripts, suggestive of the regulated interactions between cassava splicing-related complexes. We also observed that the unannotated loci and/or AS events were commonly in regions with abundant single nucleotide variations, insertions-deletions, and heterozygous sequences. These findings reflect the utility of completely sequenced FLcDNA clones for overcoming cassava-specific annotation-related problems to elucidate transcript structures. Our work provides researchers with transcript structural details that are useful for annotating highly diverse and unique transcripts and alternative splicing events.
Collapse
Affiliation(s)
- Akihiro Ezoe
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
| | - Satoshi Iuchi
- Experimental Plant Division, RIKEN BioResource Research Center, Tsukuba, Ibaraki, 305-0074, Japan
| | - Tetsuya Sakurai
- Multidisciplinary Science Cluster, Interdisciplinary Science Unit, Kochi University, Nankoku, Kochi, 783-8502, Japan
| | - Yukie Aso
- Experimental Plant Division, RIKEN BioResource Research Center, Tsukuba, Ibaraki, 305-0074, Japan
| | - Hiroki Tokunaga
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Tropical Agriculture Research Front, Japan International Research Center for Agricultural Sciences, Ishigaki, Okinawa, 907-0002, Japan
| | - Anh Thu Vu
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
| | - Yoshinori Utsumi
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
| | - Satoshi Takahashi
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Maho Tanaka
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Junko Ishida
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan
| | - Manabu Ishitani
- International Center for Tropical Agriculture (CIAT), Km 17, Recta Cali-Palmira Apartado Aéreo 6713, Cali, Colombia
| | - Motoaki Seki
- Plant Genomic Network Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Kanagawa, 230-0045, Japan.
- Plant Epigenome Regulation Laboratory, RIKEN Cluster for Pioneering Research, 2-1 Hirosawa, Wako, Saitama, 351-0198, Japan.
- Kihara Institute for Biological Research, Yokohama City University, 641-12 Maioka-cho, Totsuka-ku, Yokohama, Kanagawa, 244-0813, Japan.
| |
Collapse
|
21
|
Mejias-Gomez O, Madsen AV, Pedersen LE, Kristensen P, Goletz S. Eliminating OFF-frame clones in randomized gene libraries: An improved split β-lactamase enrichment system. N Biotechnol 2023; 75:13-20. [PMID: 36889578 DOI: 10.1016/j.nbt.2023.03.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 02/20/2023] [Accepted: 03/04/2023] [Indexed: 03/08/2023]
Abstract
Large, randomized libraries are a key technology for many biotechnological applications. While genetic diversity is the main parameter most libraries direct their resources on, less focus is devoted to ensuring functional IN-frame expression. This study describes a faster and more efficient system based on a split β-lactamase complementation for removal of OFF-frame clones and increase of functional diversity, suitable for construction of randomized libraries. The gene of interest is inserted between two fragments of the β-lactamase gene, conferring resistance to β-lactam drugs only upon expression of an inserted IN-frame gene without stop codons or frameshifts. The preinduction-free system was capable of eliminating OFF-frame clones in starting mixtures of as little as 1% IN-frame clones and enriching to about 70% IN-frame clones, even when their starting rate was as low as 0.001%. The curation system was verified by constructing a single-domain antibody phage display library using trinucleotide phosphoramidites for randomizing a complementary determining region, while eliminating OFF-frame clones and maximizing functional diversity.
Collapse
Affiliation(s)
- Oscar Mejias-Gomez
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Andreas V Madsen
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Lasse E Pedersen
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Peter Kristensen
- Department of Chemistry and Bioscience, Section for Bioscience and Engineering, Aalborg University, Aalborg, Denmark
| | - Steffen Goletz
- Department of Biotechnology and Biomedicine, Section for Protein Science and Biotherapeutics, Technical University of Denmark, Kongens Lyngby, Denmark.
| |
Collapse
|
22
|
Mak QXC, Wick RR, Holt JM, Wang JR. Polishing De Novo Nanopore Assemblies of Bacteria and Eukaryotes With FMLRC2. Mol Biol Evol 2023; 40:7069220. [PMID: 36869750 PMCID: PMC10015616 DOI: 10.1093/molbev/msad048] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/20/2023] [Accepted: 02/21/2023] [Indexed: 03/05/2023] Open
Abstract
As the accuracy and throughput of nanopore sequencing improve, it is increasingly common to perform long-read first de novo genome assemblies followed by polishing with accurate short reads. We briefly introduce FMLRC2, the successor to the original FM-index Long Read Corrector (FMLRC), and illustrate its performance as a fast and accurate de novo assembly polisher for both bacterial and eukaryotic genomes.
Collapse
Affiliation(s)
- Q X Charles Mak
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Ryan R Wick
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Australia
| | | | - Jeremy R Wang
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
23
|
Liang C, Wagstaff J, Aharony N, Schmit V, Manheim D. Managing the Transition to Widespread Metagenomic Monitoring: Policy Considerations for Future Biosurveillance. Health Secur 2023; 21:34-45. [PMID: 36629860 PMCID: PMC9940815 DOI: 10.1089/hs.2022.0029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
The technological possibilities and future public health importance of metagenomic sequencing have received extensive attention, but there has been little discussion about the policy and regulatory issues that need to be addressed if metagenomic sequencing is adopted as a key technology for biosurveillance. In this article, we introduce metagenomic monitoring as a possible path to eventually replacing current infectious disease monitoring models. Many key enablers are technological, whereas others are not. We therefore highlight key policy challenges and implementation questions that need to be addressed for "widespread metagenomic monitoring" to be possible. Policymakers must address pitfalls like fragmentation of the technological base, private capture of benefits, privacy concerns, the usefulness of the system during nonpandemic times, and how the future systems will enable better response. If these challenges are addressed, the technological and public health promise of metagenomic sequencing can be realized.
Collapse
Affiliation(s)
- Chelsea Liang
- Chelsea Liang is an Independent Researcher, University of New South Wales, School of Biotechnology and Biomolecular Sciences, Sydney, Australia
| | - James Wagstaff
- James Wagstaff, PhD, is a Research Fellow, Future of Humanity Institute, University of Oxford, Oxford, UK
| | - Noga Aharony
- Noga Aharony, MS, is a PhD Student, Department of Systems Biology, Columbia University, New York, NY
| | - Virginia Schmit
- Virginia Schmit, PhD, is Director of Research, 1DatSooner, DE, and a Policy Specialist, National Institute of Allergy and Infectious Diseases, Bethesda, MD
| | - David Manheim
- David Manheim, PhD, is Head of Policy and Research, ALTER, Rehovot, Israel; Lead Researcher, 1DaySooner, Claymont, DE,Visiting Researcher, Humanities and Arts Department, Technion – Israel Institute of Technology, Haifa, Israel.,Address correspondence to: David B. Manheim, 8734 First Avenue, Silver Spring, MD 20910
| |
Collapse
|
24
|
Nguyen TV, Vander Jagt CJ, Wang J, Daetwyler HD, Xiang R, Goddard ME, Nguyen LT, Ross EM, Hayes BJ, Chamberlain AJ, MacLeod IM. In it for the long run: perspectives on exploiting long-read sequencing in livestock for population scale studies of structural variants. Genet Sel Evol 2023; 55:9. [PMID: 36721111 PMCID: PMC9887926 DOI: 10.1186/s12711-023-00783-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 01/23/2023] [Indexed: 02/02/2023] Open
Abstract
Studies have demonstrated that structural variants (SV) play a substantial role in the evolution of species and have an impact on Mendelian traits in the genome. However, unlike small variants (< 50 bp), it has been challenging to accurately identify and genotype SV at the population scale using short-read sequencing. Long-read sequencing technologies are becoming competitively priced and can address several of the disadvantages of short-read sequencing for the discovery and genotyping of SV. In livestock species, analysis of SV at the population scale still faces challenges due to the lack of resources, high costs, technological barriers, and computational limitations. In this review, we summarize recent progress in the characterization of SV in the major livestock species, the obstacles that still need to be overcome, as well as the future directions in this growing field. It seems timely that research communities pool resources to build global population-scale long-read sequencing consortiums for the major livestock species for which the application of genomic tools has become cost-effective.
Collapse
Affiliation(s)
- Tuan V. Nguyen
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Christy J. Vander Jagt
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Jianghui Wang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| | - Hans D. Daetwyler
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Ruidong Xiang
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Michael E. Goddard
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1008.90000 0001 2179 088XFaculty of Veterinary & Agricultural Science, The University of Melbourne, Parkville, VIC 3052 Australia
| | - Loan T. Nguyen
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Elizabeth M. Ross
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Ben J. Hayes
- grid.1003.20000 0000 9320 7537Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St Lucia, QLD 4072 Australia
| | - Amanda J. Chamberlain
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083 Australia
| | - Iona M. MacLeod
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC 3083 Australia
| |
Collapse
|
25
|
Firtina C, Park J, Alser M, Kim JS, Cali D, Shahroodi T, Ghiasi N, Singh G, Kanellopoulos K, Alkan C, Mutlu O. BLEND: a fast, memory-efficient and accurate mechanism to find fuzzy seed matches in genome analysis. NAR Genom Bioinform 2023; 5:lqad004. [PMID: 36685727 PMCID: PMC9853099 DOI: 10.1093/nargab/lqad004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 12/16/2022] [Accepted: 01/10/2023] [Indexed: 01/22/2023] Open
Abstract
Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for finding exact-matching seeds as the conventional hashing methods assign distinct hash values for different seeds, including highly similar seeds. Finding only exact-matching seeds causes either (i) increasing the use of the costly sequence alignment or (ii) limited sensitivity. We introduce BLEND, the first efficient and accurate mechanism that can identify both exact-matching and highly similar seeds with a single lookup of their hash values, called fuzzy seed matches. BLEND (i) utilizes a technique called SimHash, that can generate the same hash value for similar sets, and (ii) provides the proper mechanisms for using seeds as sets with the SimHash technique to find fuzzy seed matches efficiently. We show the benefits of BLEND when used in read overlapping and read mapping. For read overlapping, BLEND is faster by 2.4×-83.9× (on average 19.3×), has a lower memory footprint by 0.9×-14.1× (on average 3.8×), and finds higher quality overlaps leading to accurate de novo assemblies than the state-of-the-art tool, minimap2. For read mapping, BLEND is faster by 0.8×-4.1× (on average 1.7×) than minimap2. Source code is available at https://github.com/CMU-SAFARI/BLEND.
Collapse
Affiliation(s)
- Can Firtina
- To whom correspondence should be addressed. Tel: +41 44 632 64 29;
| | - Jisung Park
- ETH Zurich, Zurich 8092, Switzerland,POSTECH, Pohang 37673, Republic of Korea
| | | | | | | | | | | | | | | | - Can Alkan
- Bilkent University, Ankara 06800, Turkey
| | - Onur Mutlu
- Correspondence may also be addressed to Onur Mutlu. Tel: +41 44 632 64 29;
| |
Collapse
|
26
|
Hassan S, Bahar R, Johan MF, Mohamed Hashim EK, Abdullah WZ, Esa E, Abdul Hamid FS, Zulkafli Z. Next-Generation Sequencing (NGS) and Third-Generation Sequencing (TGS) for the Diagnosis of Thalassemia. Diagnostics (Basel) 2023; 13:diagnostics13030373. [PMID: 36766477 PMCID: PMC9914462 DOI: 10.3390/diagnostics13030373] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 01/11/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open
Abstract
Thalassemia is one of the most heterogeneous diseases, with more than a thousand mutation types recorded worldwide. Molecular diagnosis of thalassemia by conventional PCR-based DNA analysis is time- and resource-consuming owing to the phenotype variability, disease complexity, and molecular diagnostic test limitations. Moreover, genetic counseling must be backed-up by an extensive diagnosis of the thalassemia-causing phenotype and the possible genetic modifiers. Data coming from advanced molecular techniques such as targeted sequencing by next-generation sequencing (NGS) and third-generation sequencing (TGS) are more appropriate and valuable for DNA analysis of thalassemia. While NGS is superior at variant calling to TGS thanks to its lower error rates, the longer reads nature of the TGS permits haplotype-phasing that is superior for variant discovery on the homologous genes and CNV calling. The emergence of many cutting-edge machine learning-based bioinformatics tools has improved the accuracy of variant and CNV calling. Constant improvement of these sequencing and bioinformatics will enable precise thalassemia detections, especially for the CNV and the homologous HBA and HBG genes. In conclusion, laboratory transiting from conventional DNA analysis to NGS or TGS and following the guidelines towards a single assay will contribute to a better diagnostics approach of thalassemia.
Collapse
Affiliation(s)
- Syahzuwan Hassan
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
- Institute for Medical Research, Shah Alam 40170, Malaysia
| | - Rosnah Bahar
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | - Muhammad Farid Johan
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | | | - Wan Zaidah Abdullah
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
| | - Ezalia Esa
- Institute for Medical Research, Shah Alam 40170, Malaysia
| | | | - Zefarina Zulkafli
- Department of Hematology, School of Medical Sciences, Health Campus, Universiti Sains Malaysia, Kubang Kerian 16150, Malaysia
- Correspondence:
| |
Collapse
|
27
|
Zhou Y, Lauschke VM. Challenges Related to the Use of Next-Generation Sequencing for the Optimization of Drug Therapy. Handb Exp Pharmacol 2023; 280:237-260. [PMID: 35792943 DOI: 10.1007/164_2022_596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Over the last decade, next-generation sequencing (NGS) methods have become increasingly used in various areas of human genomics. In routine clinical care, their use is already implemented in oncology to profile the mutational landscape of a tumor, as well as in rare disease diagnostics. However, its utilization in pharmacogenomics is largely lacking behind. Recent population-scale genome data has revealed that human pharmacogenes carry a plethora of rare genetic variations that are not interrogated by conventional array-based profiling methods and it is estimated that these variants could explain around 30% of the genetically encoded functional pharmacogenetic variability.To interpret the impact of such variants on drug response a multitude of computational tools have been developed, but, while there have been major advancements, it remains to be shown whether their accuracy is sufficient to improve personalized pharmacogenetic recommendations in robust trials. In addition, conventional short-read sequencing methods face difficulties in the interrogation of complex pharmacogenes and high NGS test costs require stringent evaluations of cost-effectiveness to decide about reimbursement by national healthcare programs. Here, we illustrate current challenges and discuss future directions toward the clinical implementation of NGS to inform genotype-guided decision-making.
Collapse
Affiliation(s)
- Yitian Zhou
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden
| | - Volker M Lauschke
- Department of Physiology and Pharmacology, Karolinska Institutet, Stockholm, Sweden.
- Dr Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgart, Germany.
- University of Tuebingen, Tuebingen, Germany.
| |
Collapse
|
28
|
Li Q, Yan B, Lam TW, Luo R. Assembly-free discovery of human novel sequences using long reads. DNA Res 2022; 29:6779932. [DOI: 10.1093/dnares/dsac039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2022] [Revised: 10/19/2022] [Accepted: 10/27/2022] [Indexed: 11/27/2022] Open
Abstract
Abstract
DNA sequences that are absent in the human reference genome are classified as novel sequences. The discovery of these missed sequences is crucial for exploring the genomic diversity of populations and understanding the genetic basis of human diseases. However, various DNA lengths of reads generated from different sequencing technologies can significantly affect the results of novel sequences. In this work, we designed an assembly-free novel sequence (AF-NS) approach to identify novel sequences from Oxford Nanopore Technology long reads. Among the newly detected sequences using AF-NS, more than 95% were omitted from those using long-read assemblers and 85% were not present in short reads of Illumina. We identified the common novel sequences among all the samples and revealed their association with the binding motifs of transcription factors. Regarding the placements of the novel sequences, we found about 70% enriched in repeat regions and generated 430 for one specific subpopulation that might be related to their evolution. Our study demonstrates the advance of the assembly-free approach to capture more novel sequences over other assembler based methods. Combining the long-read data with powerful analytical methods can be a robust way to improve the completeness of novel sequences.
Collapse
Affiliation(s)
- Qiuhui Li
- Department of Computer Science, The University of Hong Kong , Hong Kong , China
| | - Bin Yan
- Department of Computer Science, The University of Hong Kong , Hong Kong , China
| | - Tak-Wah Lam
- Department of Computer Science, The University of Hong Kong , Hong Kong , China
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong , Hong Kong , China
| |
Collapse
|
29
|
Library adaptors with integrated reference controls improve the accuracy and reliability of nanopore sequencing. Nat Commun 2022; 13:6437. [PMID: 36307482 PMCID: PMC9616880 DOI: 10.1038/s41467-022-34028-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 10/11/2022] [Indexed: 12/25/2022] Open
Abstract
Library adaptors are short oligonucleotides that are attached to RNA and DNA samples in preparation for next-generation sequencing (NGS). Adaptors can also include additional functional elements, such as sample indexes and unique molecular identifiers, to improve library analysis. Here, we describe Control Library Adaptors, termed CAPTORs, that measure the accuracy and reliability of NGS. CAPTORs can be integrated within the library preparation of RNA and DNA samples, and their encoded information is retrieved during sequencing. We show how CAPTORs can measure the accuracy of nanopore sequencing, evaluate the quantitative performance of metagenomic and RNA sequencing, and improve normalisation between samples. CAPTORs can also be customised for clinical diagnoses, correcting systematic sequencing errors and improving the diagnosis of pathogenic BRCA1/2 variants in breast cancer. CAPTORs are a simple and effective method to increase the accuracy and reliability of NGS, enabling comparisons between samples, reagents and laboratories, and supporting the use of nanopore sequencing for clinical diagnosis.
Collapse
|
30
|
Srinivas M, O’Sullivan O, Cotter PD, van Sinderen D, Kenny JG. The Application of Metagenomics to Study Microbial Communities and Develop Desirable Traits in Fermented Foods. Foods 2022; 11:3297. [PMCID: PMC9601669 DOI: 10.3390/foods11203297] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The microbial communities present within fermented foods are diverse and dynamic, producing a variety of metabolites responsible for the fermentation processes, imparting characteristic organoleptic qualities and health-promoting traits, and maintaining microbiological safety of fermented foods. In this context, it is crucial to study these microbial communities to characterise fermented foods and the production processes involved. High Throughput Sequencing (HTS)-based methods such as metagenomics enable microbial community studies through amplicon and shotgun sequencing approaches. As the field constantly develops, sequencing technologies are becoming more accessible, affordable and accurate with a further shift from short read to long read sequencing being observed. Metagenomics is enjoying wide-spread application in fermented food studies and in recent years is also being employed in concert with synthetic biology techniques to help tackle problems with the large amounts of waste generated in the food sector. This review presents an introduction to current sequencing technologies and the benefits of their application in fermented foods.
Collapse
Affiliation(s)
- Meghana Srinivas
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- School of Microbiology, University College Cork, T12 CY82 Cork, Ireland
| | - Orla O’Sullivan
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- VistaMilk SFI Research Centre, Fermoy, P61 C996 Cork, Ireland
| | - Paul D. Cotter
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- VistaMilk SFI Research Centre, Fermoy, P61 C996 Cork, Ireland
| | - Douwe van Sinderen
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- School of Microbiology, University College Cork, T12 CY82 Cork, Ireland
| | - John G. Kenny
- Food Biosciences Department, Teagasc Food Research Centre, Moorepark, P61 C996 Cork, Ireland
- APC Microbiome Ireland, University College Cork, T12 CY82 Cork, Ireland
- VistaMilk SFI Research Centre, Fermoy, P61 C996 Cork, Ireland
- Correspondence:
| |
Collapse
|
31
|
Zeng P, Tian Z, Han Y, Zhang W, Zhou T, Peng Y, Hu H, Cai J. Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction. Chin Med 2022; 17:94. [PMID: 35945546 PMCID: PMC9364492 DOI: 10.1186/s13020-022-00644-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 07/19/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many medicinal plants are known for their complex genomes with high ploidy, heterozygosity, and repetitive content which pose severe challenges for genome sequencing of those species. Long reads from Oxford nanopore sequencing technology (ONT) or Pacific Biosciences Single Molecule, Real-Time (SMRT) sequencing offer great advantages in de novo genome assembly, especially for complex genomes with high heterozygosity and repetitive content. Currently, multiple allotetraploid species have sequenced their genomes by long-read sequencing. However, we found that a considerable proportion of these genomes (7.9% on average, maximum 23.7%) could not be covered by NGS (Next Generation Sequencing) reads (uncovered region by NGS reads, UCR) suggesting the questionable and low-quality of those area or genomic areas that can't be sequenced by NGS due to sequencing bias. The underlying causes of those UCR in the genome assembly and solutions to this problem have never been studied. METHODS In the study, we sequenced the tetraploid genome of Veratrum dahuricum (Turcz.) O. Loes (VDL), a Chinese medicinal plant, with ONT platform and assembled the genome with three strategies in parallel. We compared the qualities, coverage, and heterozygosity of the three ONT assemblies with another released assembly of the same individual using reads from PacBio circular consensus sequencing (CCS) technology, to explore the cause of the UCR. RESULTS By mapping the NGS reads against the three ONT assemblies and the CCS assembly, we found that the coverage of those ONT assemblies by NGS reads ranged from 49.15 to 76.31%, much smaller than that of the CCS assembly (99.53%). And alignment between ONT assemblies and CCS assembly showed that most UCR can be aligned with CCS assembly. So, we conclude that the UCRs in ONT assembly are low-quality sequences with a high error rate that can't be aligned with short reads, rather than genomic regions that can't be sequenced by NGS. Further comparison among the intermediate versions of ONT assemblies showed that the most probable origin of those errors is a combination of artificial errors introduced by "self-correction" and initial sequencing error in long reads. We also found that polishing the ONT assembly with CCS reads can correct those errors efficiently. CONCLUSIONS Through analyzing genome features and reads alignment, we have found the causes for the high proportion of UCR in ONT assembly of VDL are sequencing errors and additional errors introduced by self-correction. The high error rates of ONT-raw reads make them not suitable for self-correction prior to allotetraploid genome assembly, as the self-correction will introduce artificial errors to > 5% of the UCR sequences. We suggest high-precision CCS reads be used to polish the assembly to correct those errors effectively for polyploid genomes.
Collapse
Affiliation(s)
- Peng Zeng
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China
| | - Zunzhe Tian
- School of Ecology and Environment, Northwestern Polytechnical University, Xi'an, China
| | - Yuwei Han
- School of Ecology and Environment, Northwestern Polytechnical University, Xi'an, China
| | - Weixiong Zhang
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China
| | - Tinggan Zhou
- School of Ecology and Environment, Northwestern Polytechnical University, Xi'an, China
| | - Yingmei Peng
- School of Ecology and Environment, Northwestern Polytechnical University, Xi'an, China
| | - Hao Hu
- State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical Sciences, University of Macau, Macau, China.
| | - Jing Cai
- School of Ecology and Environment, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
32
|
Zhang T, Zhou J, Gao W, Jia Y, Wei Y, Wang G. Complex genome assembly based on long-read sequencing. Brief Bioinform 2022; 23:6657663. [PMID: 35940845 DOI: 10.1093/bib/bbac305] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 06/20/2022] [Accepted: 07/06/2022] [Indexed: 11/12/2022] Open
Abstract
High-quality genome chromosome-scale sequences provide an important basis for genomics downstream analysis, especially the construction of haplotype-resolved and complete genomes, which plays a key role in genome annotation, mutation detection, evolutionary analysis, gene function research, comparative genomics and other aspects. However, genome-wide short-read sequencing is difficult to produce a complete genome in the face of a complex genome with high duplication and multiple heterozygosity. The emergence of long-read sequencing technology has greatly improved the integrity of complex genome assembly. We review a variety of computational methods for complex genome assembly and describe in detail the theories, innovations and shortcomings of collapsed, semi-collapsed and uncollapsed assemblers based on long reads. Among the three methods, uncollapsed assembly is the most correct and complete way to represent genomes. In addition, genome assembly is closely related to haplotype reconstruction, that is uncollapsed assembly realizes haplotype reconstruction, and haplotype reconstruction promotes uncollapsed assembly. We hope that gapless, telomere-to-telomere and accurate assembly of complex genomes can be truly routinely achieved using only a simple process or a single tool in the future.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jie Zhou
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yanan Wei
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| |
Collapse
|
33
|
Dmitriev AA, Pushkova EN, Melnikova NV. Plant Genome Sequencing: Modern Technologies and Novel Opportunities for Breeding. Mol Biol 2022. [DOI: 10.1134/s0026893322040045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
|
34
|
Lok S, Lau TNH, Trost B, Tong AHY, Wintle RF, Engstrom MD, Stacy E, Waits LP, Scrafford M, Scherer SW. Chromosomal-level reference genome assembly of the North American wolverine ( Gulo gulo luscus): a resource for conservation genomics. G3 GENES|GENOMES|GENETICS 2022; 12:6604289. [PMID: 35674384 PMCID: PMC9339297 DOI: 10.1093/g3journal/jkac138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2022] [Accepted: 05/19/2022] [Indexed: 11/21/2022]
Abstract
We report a chromosomal-level genome assembly of a male North American wolverine (Gulo gulo luscus) from the Kugluktuk region of Nunavut, Canada. The genome was assembled directly from long-reads, comprising: 758 contigs with a contig N50 of 36.6 Mb; contig L50 of 20; base count of 2.39 Gb; and a near complete representation (99.98%) of the BUSCO 5.2.2 set of 9,226 genes. A presumptive chromosomal-level assembly was generated by scaffolding against two chromosomal-level Mustelidae reference genomes, the ermine and the Eurasian river otter, to derive a final scaffold N50 of 144.0 Mb and a scaffold L50 of 7. We annotated a comprehensive set of genes that have been associated with models of aggressive behavior, a trait which the wolverine is purported to have in the popular literature. To support an integrated, genomics-based wildlife management strategy at a time of environmental disruption from climate change, we annotated the principal genes of the innate immune system to provide a resource to study the wolverine’s susceptibility to new infectious and parasitic diseases. As a resource, we annotated genes involved in the modality of infection by the coronaviruses, an important class of viral pathogens of growing concern as shown by the recent spillover infections by severe acute respiratory syndrome coronavirus-2 to naïve wildlife. Tabulation of heterozygous single nucleotide variants in our specimen revealed a heterozygosity level of 0.065%, indicating a relatively diverse genetic pool that would serve as a baseline for the genomics-based conservation of the wolverine, a rare cold-adapted carnivore now under threat.
Collapse
Affiliation(s)
- Si Lok
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
| | - Timothy N H Lau
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
| | - Brett Trost
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
| | - Amy H Y Tong
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto , ON M5S 3E1, Canada
| | - Richard F Wintle
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
| | - Mark D Engstrom
- Department of Natural History, Royal Ontario Museum , Toronto, ON M5S 2C6, Canada
| | - Elise Stacy
- Environmental Science Program, University of Idaho , Moscow, ID 83844, USA
- Wildlife Conservation Society, Arctic Beringia , Fairbanks, AK 99709, USA
| | - Lisette P Waits
- Department of Fish and Wildlife, University of Idaho , Moscow, ID 83844, USA
| | - Matthew Scrafford
- Wildlife Conservation Society Canada , Thunder Bay, ON P7A 4K9, Canada
| | - Stephen W Scherer
- The Centre for Applied Genomics, Peter Gilgan Centre for Research and Learning, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
- Program in Genetics and Genome Biology, The Hospital for Sick Children , Toronto, ON M5G 0A4, Canada
- McLaughlin Centre, University of Toronto , Toronto, ON M5G 0A4, Canada
- Department of Molecular Genetics, Faculty of Medicine, University of Toronto , ON M5S 1A8, Canada
| |
Collapse
|
35
|
Ye S, Yu X, Chen H, Zhang Y, Wu Q, Tan H, Song J, Saqib HSA, Farhadi A, Ikhwanuddin M, Ma H. Full-Length Transcriptome Reconstruction Reveals the Genetic Mechanisms of Eyestalk Displacement and Its Potential Implications on the Interspecific Hybrid Crab (Scylla serrata ♀ × S. paramamosain ♂). BIOLOGY 2022; 11:biology11071026. [PMID: 36101407 PMCID: PMC9312322 DOI: 10.3390/biology11071026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 06/26/2022] [Accepted: 06/27/2022] [Indexed: 11/30/2022]
Abstract
Simple Summary The eyestalk is a key organ in crustaceans that produces neurohormones and regulates a range of physiological functions. Eyestalk displacement was discovered in some first-generation (F1) offspring of the novel interspecific hybrid crab (Scylla serrata ♀ × S. paramamosain ♂). To uncover the genetic mechanism underlying eyestalk displacement and its potential implications, high-quality transcriptome was reconstructed using single-molecule real-time (SMRT) sequencing. A total of 37 significantly differential alternative splicing (DAS) events (17 up-regulated and 20 down-regulated) and 1475 significantly differential expressed transcripts (DETs) (492 up-regulated and 983 down-regulated) were detected in hybrid crabs with displaced eyestalks (DH). The most significant DAS events and DETs were annotated as being endoplasmic reticulum chaperone BiP and leucine-rich repeat protein lrrA-like isoform X2. In addition, the top ten significant gene ontology (GO) terms were related to the cuticle or chitin. Overall, this study highlights the underlying genetic mechanisms of eyestalk displacement and provide useful knowledge for mud crab (Scylla spp.) crossbreeding. Abstract The lack of high-quality juvenile crabs is the greatest impediment to the growth of the mud crab (Scylla paramamosain) industry. To obtain high-quality hybrid offspring, a novel hybrid mud crab (S. serrata ♀ × S. paramamosain ♂) was successfully produced in our previous study. Meanwhile, an interesting phenomenon was discovered, that some first-generation (F1) hybrid offspring’s eyestalks were displaced during the crablet stage I. To uncover the genetic mechanism underlying eyestalk displacement and its potential implications, both single-molecule real-time (SMRT) and Illumina RNA sequencing were implemented. Using a two-step collapsing strategy, three high-quality reconstructed transcriptomes were obtained from purebred mud crabs (S. paramamosain) with normal eyestalks (SPA), hybrid crabs with normal eyestalks (NH), and hybrid crabs with displaced eyestalks (DH). In total, 37 significantly differential alternative splicing (DAS) events (17 up-regulated and 20 down-regulated) and 1475 significantly differential expressed transcripts (DETs) (492 up-regulated and 983 down-regulated) were detected in DH. The most significant DAS events and DETs were annotated as being endoplasmic reticulum chaperone BiP and leucine-rich repeat protein lrrA-like isoform X2. In addition, the top ten significant GO terms were related to the cuticle or chitin. Overall, high-quality reconstructed transcriptomes were obtained for the novel interspecific hybrid crab and provided valuable insights into the genetic mechanisms of eyestalk displacement in mud crab (Scylla spp.) crossbreeding.
Collapse
Affiliation(s)
- Shaopan Ye
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Xiaoyan Yu
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Huiying Chen
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Yin Zhang
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Qingyang Wu
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Huaqiang Tan
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Jun Song
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Hafiz Sohaib Ahmed Saqib
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Ardavan Farhadi
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
| | - Mhd Ikhwanuddin
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
- Institute of Tropical Aquaculture and Fisheries, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
| | - Hongyu Ma
- Guangdong Provincial Key Laboratory of Marine Biotechnology, Shantou University, Shantou 515063, China; (S.Y.); (X.Y.); (H.C.); (Y.Z.); (Q.W.); (H.T.); (J.S.); (H.S.A.S.); (A.F.)
- STU-UMT Joint Shellfish Research Laboratory, Shantou University, Shantou 515063, China;
- Institute of Tropical Aquaculture and Fisheries, Universiti Malaysia Terengganu, Kuala Nerus, Terengganu 21030, Malaysia
- Correspondence: ; Tel.: +86-754-86503471
| |
Collapse
|
36
|
Mc Cartney AM, Shafin K, Alonge M, Bzikadze AV, Formenti G, Fungtammasan A, Howe K, Jain C, Koren S, Logsdon GA, Miga KH, Mikheenko A, Paten B, Shumate A, Soto DC, Sović I, Wood JMD, Zook JM, Phillippy AM, Rhie A. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat Methods 2022; 19:687-695. [PMID: 35361931 PMCID: PMC9812399 DOI: 10.1038/s41592-022-01440-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/04/2022] [Indexed: 01/07/2023]
Abstract
Advances in long-read sequencing technologies and genome assembly methods have enabled the recent completion of the first telomere-to-telomere human genome assembly, which resolves complex segmental duplications and large tandem repeats, including centromeric satellite arrays in a complete hydatidiform mole (CHM13). Although derived from highly accurate sequences, evaluation revealed evidence of small errors and structural misassemblies in the initial draft assembly. To correct these errors, we designed a new repeat-aware polishing strategy that made accurate assembly corrections in large repeats without overcorrection, ultimately fixing 51% of the existing errors and improving the assembly quality value from 70.2 to 73.9 measured from PacBio high-fidelity and Illumina k-mers. By comparing our results to standard automated polishing tools, we outline common polishing errors and offer practical suggestions for genome projects with limited resources. We also show how sequencing biases in both high-fidelity and Oxford Nanopore Technologies reads cause signature assembly errors that can be corrected with a diverse panel of sequencing technologies.
Collapse
Affiliation(s)
- Ann M. Mc Cartney
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH
| | - Kishwar Shafin
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Michael Alonge
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Andrey V. Bzikadze
- Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Giulio Formenti
- Laboratory of Neurogenetics of Language and The Vertebrate Genome Lab, The Rockefeller University, New York, NY, USA
| | | | | | - Chirag Jain
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Department of Computational and Data Sciences, Indian Institute of Science, Bangalore KA, India
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH
| | - Glennis A. Logsdon
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
| | - Karen H. Miga
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alla Mikheenko
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Benedict Paten
- UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Alaina Shumate
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Daniela C. Soto
- Genome Center, MIND Institute, Department of Biochemistry and Molecular Medicine, University of California, Davis, CA, USA
| | - Ivan Sović
- Pacific Biosciences, Menlo Park, CA, USA,Digital BioLogic d.o.o., Ivanić-Grad, Croatia
| | | | - Justin M. Zook
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Adam M. Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Correspondence: ,
| | - Arang Rhie
- Genome Informatics Section, Computational and Statistical Genomics Branch, NHGRI, NIH,Correspondence: ,
| |
Collapse
|
37
|
Perrin A, Van Goethem C, Thèze C, Puechberty J, Guignard T, Lecardonnel B, Lacourt D, Métay C, Isapof A, Whalen S, Ferreiro A, Arne-Bes MC, Quijano-Roy S, Nectoux J, Leturcq F, Richard P, Larrieux M, Bergougnoux A, Pellestor F, Koenig M, Cossée M. Long-Reads Sequencing Strategy to Localize Variants in TTN Repeated Domains. J Mol Diagn 2022; 24:719-726. [PMID: 35580751 DOI: 10.1016/j.jmoldx.2022.04.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2022] [Revised: 03/23/2022] [Accepted: 04/18/2022] [Indexed: 11/19/2022] Open
Abstract
Titin protein is responsible for muscle elasticity. The TTN gene, composed of 364 exons, is subjected to extensive alternative splicing and leads to different isoforms expressed in skeletal and cardiac muscle. Variants in TTN are responsible for myopathies with a wide phenotypic spectrum and autosomal dominant or recessive transmission. The I-band coding domain, highly subject to alternative splicing, contains a three-zone block of repeated sequences with 99% homology. Sequencing and localization of variants in these areas are complex when using short-reads sequencing, a second-generation sequencing technique. We have implemented a protocol based on the third-generation sequencing technology (long-reads sequencing). This new method allows us to localize variants in these repeated areas to improve the diagnosis of TTN-related myopathies and offer the analysis of relatives in postnatal or in prenatal screening.
Collapse
Affiliation(s)
- Aurélien Perrin
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France; PhyMedExp, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Charles Van Goethem
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France
| | - Corinne Thèze
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France
| | - Jacques Puechberty
- Department of Medical Genetics, Arnaud de Villeneuve Hospital, Montpellier, France
| | - Thomas Guignard
- Laboratoire de Génétique Chromosomique, Plateforme ChromoStem, CHU de Montpellier, Université de Montpellier, Montpellier, France
| | - Bérénice Lecardonnel
- Laboratoire de Génétique Chromosomique, Plateforme ChromoStem, CHU de Montpellier, Université de Montpellier, Montpellier, France
| | - Delphine Lacourt
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France
| | - Corinne Métay
- Assistance Publique-Hôpitaux de Paris (AP-HP), UF Molecular Cardiogenetics and Myogenetics, Sorbonne Université and Sorbonne Université UPMC Paris 06-Inserm UMRS974, Research Center in Myology, Pitié-Salpêtrière Hospital, Paris, France
| | - Arnaud Isapof
- Centre de Référence des Maladies Neuromusculaires Nord/Est/Ile de France, Service de Neuropédiatrie, Hôpital Trousseau, Paris, France
| | - Sandra Whalen
- Genetics and Cytogenetics Department, Centre de Référence Déficiences Intellectuelles de Causes Rares, Pitié-Salpétrière, AP-HP, Paris, France
| | - Ana Ferreiro
- AP-HP, Centre de Référence des Pathologies Neuromusculaires Nord-Est-Ile de France, Institut de Myologie, GHU Pitié-Salpêtrière, Paris, France; Basic and Translational Myology Laboratory, Université de Paris BFA, UMR 8251, CNRS, Paris, France
| | | | - Susana Quijano-Roy
- AP-HP, GH Université Paris-Saclay, Neuromuscular Center, Child Neurology and ICU Department, Raymond Poincare Hospital, Garches, France; Université de Versailles, U1179 INSERM-UVSQ, Montigny, France
| | - Juliette Nectoux
- Service de Génétique et Biologie Moléculaires, Hôpital Cochin, DMU BioPhyGen, AP-HP, Centre-Université de Paris, Paris, France
| | - France Leturcq
- Department of Genetics and Molecular Biology, AP-HP, Cochin Hospital, Paris, France
| | - Pascale Richard
- Assistance Publique-Hôpitaux de Paris (AP-HP), UF Molecular Cardiogenetics and Myogenetics, Sorbonne Université and Sorbonne Université UPMC Paris 06-Inserm UMRS974, Research Center in Myology, Pitié-Salpêtrière Hospital, Paris, France
| | - Marion Larrieux
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France
| | - Anne Bergougnoux
- PhyMedExp, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Franck Pellestor
- Laboratoire de Génétique Chromosomique, Plateforme ChromoStem, CHU de Montpellier, Université de Montpellier, Montpellier, France
| | - Michel Koenig
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France; PhyMedExp, University of Montpellier, INSERM, CNRS, Montpellier, France
| | - Mireille Cossée
- Molecular Diagnostic Laboratory, Montpellier University Hospital, Montpellier, France; PhyMedExp, University of Montpellier, INSERM, CNRS, Montpellier, France.
| |
Collapse
|
38
|
The evolution of gene regulation on sex chromosomes. Trends Genet 2022; 38:844-855. [DOI: 10.1016/j.tig.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 04/07/2022] [Accepted: 04/11/2022] [Indexed: 11/20/2022]
|
39
|
Altermann E, Tegetmeyer HE, Chanyi RM. The evolution of bacterial genome assemblies - where do we need to go next? MICROBIOME RESEARCH REPORTS 2022; 1:15. [PMID: 38046358 PMCID: PMC10688829 DOI: 10.20517/mrr.2022.02] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/08/2022] [Accepted: 03/24/2022] [Indexed: 12/05/2023]
Abstract
Genome sequencing has fundamentally changed our ability to decipher and understand the genetic blueprint of life and how it changes over time in response to environmental and evolutionary pressures. The pace of sequencing is still increasing in response to advances in technologies, paving the way from sequenced genes to genomes to metagenomes to metagenome-assembled genomes (MAGs). Our ability to interrogate increasingly complex microbial communities through metagenomes and MAGs is opening up a tantalizing future where we may be able to delve deeper into the mechanisms and genetic responses emerging over time. In the near future, we will be able to detect MAG assembly variations within strains originating from diverging sub-populations, and one of the emerging challenges will be to capture these variations in a biologically relevant way. Here, we present a brief overview of sequencing technologies and the current state of metagenome assemblies to suggest the need to develop new data formats that can capture the genetic variations within strains and communities, which previously remained invisible due to sequencing technology limitations.
Collapse
Affiliation(s)
- Eric Altermann
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
- Massey University, School of Veterinary Science, Palmerston North 4100, New Zealand
| | - Halina E. Tegetmeyer
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Center for Biotechnology, Bielefeld University, Universitaetsstrasse 27, Bielefeld 33615, Germany
| | - Ryan M. Chanyi
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
| |
Collapse
|
40
|
Mueller RC, Ellström P, Howe K, Uliano-Silva M, Kuo RI, Miedzinska K, Warr A, Fedrigo O, Haase B, Mountcastle J, Chow W, Torrance J, Wood JMD, Järhult JD, Naguib MM, Olsen B, Jarvis ED, Smith J, Eöry L, Kraus RHS. A high-quality genome and comparison of short- versus long-read transcriptome of the palaearctic duck Aythya fuligula (tufted duck). Gigascience 2021; 10:giab081. [PMID: 34927191 PMCID: PMC8685854 DOI: 10.1093/gigascience/giab081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 07/15/2021] [Accepted: 11/22/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The tufted duck is a non-model organism that experiences high mortality in highly pathogenic avian influenza outbreaks. It belongs to the same bird family (Anatidae) as the mallard, one of the best-studied natural hosts of low-pathogenic avian influenza viruses. Studies in non-model bird species are crucial to disentangle the role of the host response in avian influenza virus infection in the natural reservoir. Such endeavour requires a high-quality genome assembly and transcriptome. FINDINGS This study presents the first high-quality, chromosome-level reference genome assembly of the tufted duck using the Vertebrate Genomes Project pipeline. We sequenced RNA (complementary DNA) from brain, ileum, lung, ovary, spleen, and testis using Illumina short-read and Pacific Biosciences long-read sequencing platforms, which were used for annotation. We found 34 autosomes plus Z and W sex chromosomes in the curated genome assembly, with 99.6% of the sequence assigned to chromosomes. Functional annotation revealed 14,099 protein-coding genes that generate 111,934 transcripts, which implies a mean of 7.9 isoforms per gene. We also identified 246 small RNA families. CONCLUSIONS This annotated genome contributes to continuing research into the host response in avian influenza virus infections in a natural reservoir. Our findings from a comparison between short-read and long-read reference transcriptomics contribute to a deeper understanding of these competing options. In this study, both technologies complemented each other. We expect this annotation to be a foundation for further comparative and evolutionary genomic studies, including many waterfowl relatives with differing susceptibilities to avian influenza viruses.
Collapse
Affiliation(s)
- Ralf C Mueller
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, 78315, Germany
- Department of Biology, University of Konstanz, Konstanz, 78457, Germany
| | - Patrik Ellström
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, SE-75185, Sweden
| | - Kerstin Howe
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | - Richard I Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Katarzyna Miedzinska
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Amanda Warr
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Olivier Fedrigo
- Vertebrate Genome Laboratory, The Rockefeller University, New York, 10065, NY
| | - Bettina Haase
- Vertebrate Genome Laboratory, The Rockefeller University, New York, 10065, NY
| | | | - William Chow
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | - James Torrance
- Tree of Life, Wellcome Sanger Institute, Cambridge CB10 1SA, UK
| | | | - Josef D Järhult
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, SE-75185, Sweden
| | - Mahmoud M Naguib
- Department of Medical Biochemistry and Microbiology, Zoonosis Science Center, Uppsala University, Uppsala, 75237, Sweden
| | - Björn Olsen
- Department of Medical Sciences, Zoonosis Science Center, Uppsala University, Uppsala, SE-75185, Sweden
| | - Erich D Jarvis
- Vertebrate Genome Laboratory and HHMI, The Rockefeller University, New York, 10065, NY
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Lél Eöry
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian EH25 9RG, UK
| | - Robert H S Kraus
- Department of Migration, Max Planck Institute of Animal Behavior, Radolfzell, 78315, Germany
- Department of Biology, University of Konstanz, Konstanz, 78457, Germany
| |
Collapse
|
41
|
Bartalucci N, Romagnoli S, Vannucchi AM. A blood drop through the pore: nanopore sequencing in hematology. Trends Genet 2021; 38:572-586. [PMID: 34906378 DOI: 10.1016/j.tig.2021.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 11/09/2021] [Accepted: 11/15/2021] [Indexed: 10/19/2022]
Abstract
The development of new sequencing platforms, technologies, and bioinformatics tools in the past decade fostered key discoveries in human genomics. Among the most recent sequencing technologies, nanopore sequencing (NS) has caught the interest of researchers for its intriguing potential and flexibility. This up-to-date review highlights the recent application of NS in the hematology field, focusing on progress and challenges of the technological approaches employed for the identification of pathologic alterations. The molecular and analytic pipelines developed for the analysis of the whole-genome, target regions, and transcriptomics provide a proof of evidence of the unparalleled amount of information that could be retrieved by an innovative approach based on long-read sequencing.
Collapse
Affiliation(s)
- Niccolò Bartalucci
- CRIMM, Center of Research and Innovation of Myeloproliferative Neoplasms, Careggi University Hospital and Department of Experimental and Clinical Medicine, University of Florence, DENOTHE Excellence Center, Florence, Italy
| | - Simone Romagnoli
- CRIMM, Center of Research and Innovation of Myeloproliferative Neoplasms, Careggi University Hospital and Department of Experimental and Clinical Medicine, University of Florence, DENOTHE Excellence Center, Florence, Italy
| | - Alessandro Maria Vannucchi
- CRIMM, Center of Research and Innovation of Myeloproliferative Neoplasms, Careggi University Hospital and Department of Experimental and Clinical Medicine, University of Florence, DENOTHE Excellence Center, Florence, Italy.
| |
Collapse
|
42
|
Zverinova S, Guryev V. Variant calling: Considerations, practices, and developments. Hum Mutat 2021; 43:976-985. [PMID: 34882898 PMCID: PMC9545713 DOI: 10.1002/humu.24311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 11/02/2021] [Accepted: 12/03/2021] [Indexed: 11/10/2022]
Abstract
The success of many clinical, association, or population genetics studies critically relies on properly performed variant calling step. The variety of modern genomics protocols, techniques, and platforms makes our choices of methods and algorithms difficult and there is no "one size fits all" solution for study design and data analysis. In this review, we discuss considerations that need to be taken into account while designing the study and preparing for the experiments. We outline the variety of variant types that can be detected using sequencing approaches and highlight some specific requirements and basic principles of their detection. Finally, we cover interesting developments that enable variant calling for a broad range of applications in the genomics field. We conclude by discussing technological and algorithmic advances that have the potential to change the ways of calling DNA variants in the nearest future.
Collapse
Affiliation(s)
- Stepanka Zverinova
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| | - Victor Guryev
- European Research Institute for the Biology of Ageing, University of Groningen, University Medical Centre Groningen, Groningen, The Netherlands
| |
Collapse
|
43
|
Chen Z, He X. Application of third-generation sequencing in cancer research. MEDICAL REVIEW (BERLIN, GERMANY) 2021; 1:150-171. [PMID: 37724303 PMCID: PMC10388785 DOI: 10.1515/mr-2021-0013] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2021] [Accepted: 08/09/2021] [Indexed: 09/20/2023]
Abstract
In the past several years, nanopore sequencing technology from Oxford Nanopore Technologies (ONT) and single-molecule real-time (SMRT) sequencing technology from Pacific BioSciences (PacBio) have become available to researchers and are currently being tested for cancer research. These methods offer many advantages over most widely used high-throughput short-read sequencing approaches and allow the comprehensive analysis of transcriptomes by identifying full-length splice isoforms and several other posttranscriptional events. In addition, these platforms enable structural variation characterization at a previously unparalleled resolution and direct detection of epigenetic marks in native DNA and RNA. Here, we present a comprehensive summary of important applications of these technologies in cancer research, including the identification of complex structure variants, alternatively spliced isoforms, fusion transcript events, and exogenous RNA. Furthermore, we discuss the impact of the newly developed nanopore direct RNA sequencing (RNA-Seq) approach in advancing epitranscriptome research in cancer. Although the unique challenges still present for these new single-molecule long-read methods, they will unravel many aspects of cancer genome complexity in unprecedented ways and present an encouraging outlook for continued application in an increasing number of different cancer research settings.
Collapse
Affiliation(s)
- Zhiao Chen
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
| | - Xianghuo He
- Fudan University Shanghai Cancer Center and Institutes of Biomedical Sciences, Fudan University, Shanghai, China
- Department of Oncology, Shanghai Medical College, Fudan University, Shanghai, China
- Key Laboratory of Breast Cancer in Shanghai, Fudan University Shanghai Cancer Center, Fudan University, Shanghai, China
| |
Collapse
|
44
|
Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data. BMC Genomics 2021; 22:826. [PMID: 34789167 PMCID: PMC8596897 DOI: 10.1186/s12864-021-08082-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/13/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND SNP arrays, short- and long-read genome sequencing are genome-wide high-throughput technologies that may be used to assay copy number variants (CNVs) in a personal genome. Each of these technologies comes with its own limitations and biases, many of which are well-known, but not all of them are thoroughly quantified. RESULTS We assembled an ensemble of public datasets of published CNV calls and raw data for the well-studied Genome in a Bottle individual NA12878. This assembly represents a variety of methods and pipelines used for CNV calling from array, short- and long-read technologies. We then performed cross-technology comparisons regarding their ability to call CNVs. Different from other studies, we refrained from using the golden standard. Instead, we attempted to validate the CNV calls by the raw data of each technology. CONCLUSIONS Our study confirms that long-read platforms enable recalling CNVs in genomic regions inaccessible to arrays or short reads. We also found that the reproducibility of a CNV by different pipelines within each technology is strongly linked to other CNV evidence measures. Importantly, the three technologies show distinct public database frequency profiles, which differ depending on what technology the database was built on.
Collapse
|
45
|
Multi-Omics Analysis of Gene and Protein Candidates Possibly Related to Tetrodotoxin Accumulation in the Skin of Takifugu flavidus. Mar Drugs 2021; 19:md19110639. [PMID: 34822510 PMCID: PMC8621849 DOI: 10.3390/md19110639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 11/13/2021] [Accepted: 11/14/2021] [Indexed: 11/24/2022] Open
Abstract
Pufferfish is increasingly regarded by many as a delicacy. However, the tetrodotoxin (TTX) that accumulates in its body can be lethal upon consumption by humans. TTX is known to mainly accumulate in pufferfish skin, but the accumulation mechanisms are poorly understood. In this study, we aimed to explore the possible mechanism of TTX accumulation in the skin of the pufferfish Takifugu flavidus following treatment with TTX. Through liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis, we detected 37.3% of toxin accumulated in the skin at the end of the rearing period (168 h). Transcriptome and proteome analyses revealed the mechanism and pathways of TTX accumulation in the skin of T. flavidus in detail. Gene ontology and the Kyoto Encyclopedia of Genes and Genomes analyses strongly suggest that cardiac muscle contraction and adrenergic signaling in cardiomyocyte pathways play an important role in TTX accumulation. Moreover, some upregulated and downregulated genes, which were determined via RNA-Seq, were verified with qPCR analysis. This study is the first to use multi-omics profiling data to identify novel regulatory network mechanisms of TTX accumulation in the skin of pufferfish.
Collapse
|
46
|
Galata V, Busi SB, Kunath BJ, de Nies L, Calusinska M, Halder R, May P, Wilmes P, Laczny CC. Functional meta-omics provide critical insights into long- and short-read assemblies. Brief Bioinform 2021; 22:bbab330. [PMID: 34453168 PMCID: PMC8575027 DOI: 10.1093/bib/bbab330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 07/13/2021] [Accepted: 07/26/2021] [Indexed: 11/12/2022] Open
Abstract
Real-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artifacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only and hybrid assembly approaches on four different metagenomic samples of varying complexity. We demonstrate how different assembly approaches affect gene and protein inference, which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic and metaproteomic data to assess the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions. We propose a reference-independent solution, which exploits the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.
Collapse
Affiliation(s)
- Valentina Galata
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Susheel Bhanu Busi
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Benoît Josef Kunath
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Laura de Nies
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Magdalena Calusinska
- BioSystems and Bioprocessing Engineering, Luxembourg Institute of Science and Technology, Rue du Brill 41, Belvaux L-4422, Luxembourg
| | - Rashi Halder
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Paul Wilmes
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| | - Cédric Christian Laczny
- Luxembourg Centre for Systems Biomedicine, 7, avenue des Hauts-Fourneaux, Esch-sur-Alzette L-4362, Luxembourg
| |
Collapse
|
47
|
Comparative Analysis of PacBio and Oxford Nanopore Sequencing Technologies for Transcriptomic Landscape Identification of Penaeus monodon. Life (Basel) 2021; 11:life11080862. [PMID: 34440606 PMCID: PMC8399832 DOI: 10.3390/life11080862] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 08/07/2021] [Accepted: 08/17/2021] [Indexed: 12/16/2022] Open
Abstract
With the advantages that long-read sequencing platforms such as Pacific Biosciences (Menlo Park, CA, USA) (PacBio) and Oxford Nanopore Technologies (Oxford, UK) (ONT) can offer, various research fields such as genomics and transcriptomics can exploit their benefits. Selecting an appropriate sequencing platform is undoubtedly crucial for the success of the research outcome, thus there is a need to compare these long-read sequencing platforms and evaluate them for specific research questions. This study aims to compare the performance of PacBio and ONT platforms for transcriptomic analysis by utilizing transcriptome data from three different tissues (hepatopancreas, intestine, and gonads) of the juvenile black tiger shrimp, Penaeus monodon. We compared three important features: (i) main characteristics of the sequencing libraries and their alignment with the reference genome, (ii) transcript assembly features and isoform identification, and (iii) correlation of the quantification of gene expression levels for both platforms. Our analyses suggest that read-length bias and differences in sequencing throughput are highly influential factors when using long reads in transcriptome studies. These comparisons can provide a guideline when designing a transcriptome study utilizing these two long-read sequencing technologies.
Collapse
|
48
|
Istace B, Belser C, Falentin C, Labadie K, Boideau F, Deniot G, Maillet L, Cruaud C, Bertrand L, Chèvre AM, Wincker P, Rousseau-Gueutin M, Aury JM. Sequencing and Chromosome-Scale Assembly of Plant Genomes, Brassica rapa as a Use Case. BIOLOGY 2021; 10:732. [PMID: 34439964 PMCID: PMC8389630 DOI: 10.3390/biology10080732] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Revised: 07/27/2021] [Accepted: 07/28/2021] [Indexed: 11/29/2022]
Abstract
With the rise of long-read sequencers and long-range technologies, delivering high-quality plant genome assemblies is no longer reserved to large consortia. Not only sequencing techniques, but also computer algorithms have reached a point where the reconstruction of assemblies at the chromosome scale is now feasible at the laboratory scale. Current technologies, in particular long-range technologies, are numerous, and selecting the most promising one for the genome of interest is crucial to obtain optimal results. In this study, we resequenced the genome of the yellow sarson, Brassica rapa cv. Z1, using the Oxford Nanopore PromethION sequencer and assembled the sequenced data using current assemblers. To reconstruct complete chromosomes, we used and compared three long-range scaffolding techniques, optical mapping, Omni-C, and Pore-C sequencing libraries, commercialized by Bionano Genomics, Dovetail Genomics, and Oxford Nanopore Technologies, respectively, or a combination of the three, in order to evaluate the capability of each technology.
Collapse
Affiliation(s)
- Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Caroline Belser
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Cyril Falentin
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Karine Labadie
- Genoscope, Institut François Jacob, Commissariat à l’Energie Atomique (CEA), Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (K.L.); (C.C.)
| | - Franz Boideau
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Gwenaëlle Deniot
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Loeiz Maillet
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Corinne Cruaud
- Genoscope, Institut François Jacob, Commissariat à l’Energie Atomique (CEA), Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (K.L.); (C.C.)
| | - Laurie Bertrand
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Anne-Marie Chèvre
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| | - Mathieu Rousseau-Gueutin
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France; (C.F.); (F.B.); (G.D.); (L.M.); (A.-M.C.); (M.R.-G.)
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 Rue Gaston Crémieux, 91057 Evry, France; (B.I.); (C.B.); (L.B.); (P.W.)
| |
Collapse
|
49
|
Ciuffreda L, Rodríguez-Pérez H, Flores C. Nanopore sequencing and its application to the study of microbial communities. Comput Struct Biotechnol J 2021; 19:1497-1511. [PMID: 33815688 PMCID: PMC7985215 DOI: 10.1016/j.csbj.2021.02.020] [Citation(s) in RCA: 86] [Impact Index Per Article: 28.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Revised: 02/24/2021] [Accepted: 02/27/2021] [Indexed: 12/14/2022] Open
Abstract
Since its introduction, nanopore sequencing has enhanced our ability to study complex microbial samples through the possibility to sequence long reads in real time using inexpensive and portable technologies. The use of long reads has allowed to address several previously unsolved issues in the field, such as the resolution of complex genomic structures, and facilitated the access to metagenome assembled genomes (MAGs). Furthermore, the low cost and portability of platforms together with the development of rapid protocols and analysis pipelines have featured nanopore technology as an attractive and ever-growing tool for real-time in-field sequencing for environmental microbial analysis. This review provides an up-to-date summary of the experimental protocols and bioinformatic tools for the study of microbial communities using nanopore sequencing, highlighting the most important and recent research in the field with a major focus on infectious diseases. An overview of the main approaches including targeted and shotgun approaches, metatranscriptomics, epigenomics, and epitranscriptomics is provided, together with an outlook to the major challenges and perspectives over the use of this technology for microbial studies.
Collapse
Affiliation(s)
- Laura Ciuffreda
- Research Unit, Hospital Universitario N.S. de Candelaria, Universidad de La Laguna, 38010 Santa Cruz de Tenerife, Spain
| | - Héctor Rodríguez-Pérez
- Research Unit, Hospital Universitario N.S. de Candelaria, Universidad de La Laguna, 38010 Santa Cruz de Tenerife, Spain
| | - Carlos Flores
- Research Unit, Hospital Universitario N.S. de Candelaria, Universidad de La Laguna, 38010 Santa Cruz de Tenerife, Spain
- CIBER de Enfermedades Respiratorias, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Genomics Division, Instituto Tecnológico y de Energías Renovables (ITER), 38600 Santa Cruz de Tenerife, Spain
- Instituto de Tecnologías Biomédicas (ITB), Universidad de La Laguna, 38200 Santa Cruz de Tenerife, Spain
| |
Collapse
|
50
|
Hayrabedyan S, Kostova P, Zlatkov V, Todorova K. Single-cell transcriptomics in the context of long-read nanopore sequencing. BIOTECHNOL BIOTEC EQ 2021. [DOI: 10.1080/13102818.2021.1988868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Affiliation(s)
- Soren Hayrabedyan
- Laboratory of Reproductive OMICs Technologies, Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| | - Petya Kostova
- Gynecology Clinic, National Oncology Hospital, Sofia, Bulgaria
| | - Viktor Zlatkov
- Department of Obstetrics and Gynecology, Faculty of Medicine, Medical University of Sofia, Sofia, Bulgaria
| | - Krassimira Todorova
- Laboratory of Reproductive OMICs Technologies, Institute of Biology and Immunology of Reproduction, Bulgarian Academy of Sciences, Sofia, Bulgaria
| |
Collapse
|