1
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
2
|
Yao Z, Li F, Xie W, Chen J, Wu J, Zhan Y, Wu X, Wang Z, Zhang G. DeepSF-4mC: A deep learning model for predicting DNA cytosine 4mC methylation sites leveraging sequence features. Comput Biol Med 2024; 171:108166. [PMID: 38382385 DOI: 10.1016/j.compbiomed.2024.108166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 02/15/2024] [Accepted: 02/15/2024] [Indexed: 02/23/2024]
Abstract
N4-methylcytosine (4mC) is a DNA modification involving the addition of a methyl group to the fourth nitrogen atom of the cytosine base. This modification may influence gene regulation, providing potential insights into gene control mechanisms. Traditional laboratory methods for detecting 4mC DNA methylation have limitations, but the rise of artificial intelligence has introduced efficient computational strategies for 4mC site prediction. Despite this progress, challenges persist in terms of model performance and interpretability. To tackle these challenges, we propose DeepSF-4mC, a deep learning model specifically designed for predicting DNA cytosine 4mC methylation sites by leveraging sequence features. Our approach incorporates multiple encoding techniques to enhance prediction accuracy, increase model stability, and reduce the computational resources needed. Leveraging transfer learning, we harness existing models to enhance performance through learned representations or fine-tuning. Ensemble learning techniques combine predictions from multiple models, boosting robustness and accuracy. This research contributes to DNA methylation analysis and lays the groundwork for understanding 4mC's multifaceted role in biological processes. The web server for DeepSF-4mC is accessible at: http://deepsf-4mc.top/and the original code can be found at: https://github.com/754131799/DeepSF-4mC.
Collapse
Affiliation(s)
- Zhaomin Yao
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Fei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Weiming Xie
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Jiaming Chen
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Jiezhang Wu
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Ying Zhan
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Xiaodan Wu
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China
| | - Zhiguo Wang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China.
| | - Guoxu Zhang
- Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning, 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China.
| |
Collapse
|
3
|
Ahsan MU, Gouru A, Chan J, Zhou W, Wang K. A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nat Commun 2024; 15:1448. [PMID: 38365920 PMCID: PMC10873387 DOI: 10.1038/s41467-024-45778-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 02/04/2024] [Indexed: 02/18/2024] Open
Abstract
Oxford Nanopore sequencing can detect DNA methylations from ionic current signal of single molecules, offering a unique advantage over conventional methods. Additionally, adaptive sampling, a software-controlled enrichment method for targeted sequencing, allows reduced representation methylation sequencing that can be applied to CpG islands or imprinted regions. Here we present DeepMod2, a comprehensive deep-learning framework for methylation detection using ionic current signal from Nanopore sequencing. DeepMod2 implements both a bidirectional long short-term memory (BiLSTM) model and a Transformer model and can analyze POD5 and FAST5 signal files generated on R9 and R10 flowcells. Additionally, DeepMod2 can run efficiently on central processing unit (CPU) through model pruning and can infer epihaplotypes or haplotype-specific methylation calls from phased reads. We use multiple publicly available and newly generated datasets to evaluate the performance of DeepMod2 under varying scenarios. DeepMod2 has comparable performance to Guppy and Dorado, which are the current state-of-the-art methods from Oxford Nanopore Technologies that remain closed-source. Moreover, we show a high correlation (r = 0.96) between reduced representation and whole-genome Nanopore sequencing. In summary, DeepMod2 is an open-source tool that enables fast and accurate DNA methylation detection from whole-genome or adaptive sequencing data on a diverse range of flowcell types.
Collapse
Affiliation(s)
- Mian Umair Ahsan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Anagha Gouru
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Biology, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Joe Chan
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
| | - Wanding Zhou
- Center for Computational and Genomic Medicine, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA.
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|
4
|
Yin C, Wang R, Qiao J, Shi H, Duan H, Jiang X, Teng S, Wei L. NanoCon: contrastive learning-based deep hybrid network for nanopore methylation detection. Bioinformatics 2024; 40:btae046. [PMID: 38305428 PMCID: PMC10873575 DOI: 10.1093/bioinformatics/btae046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 02/15/2024] [Accepted: 01/30/2024] [Indexed: 02/03/2024] Open
Abstract
MOTIVATION 5-Methylcytosine (5mC), a fundamental element of DNA methylation in eukaryotes, plays a vital role in gene expression regulation, embryonic development, and other biological processes. Although several computational methods have been proposed for detecting the base modifications in DNA like 5mC sites from Nanopore sequencing data, they face challenges including sensitivity to noise, and ignoring the imbalanced distribution of methylation sites in real-world scenarios. RESULTS Here, we develop NanoCon, a deep hybrid network coupled with contrastive learning strategy to detect 5mC methylation sites from Nanopore reads. In particular, we adopted a contrastive learning module to alleviate the issues caused by imbalanced data distribution in nanopore sequencing, offering a more accurate and robust detection of 5mC sites. Evaluation results demonstrate that NanoCon outperforms existing methods, highlighting its potential as a valuable tool in genomic sequencing and methylation prediction. In addition, we also verified the effectiveness of our representation learning ability on two datasets by visualizing the dimension reduction of the features of methylation and nonmethylation sites from our NanoCon. Furthermore, cross-species and cross-5mC methylation motifs experiments indicated the robustness and the ability to perform transfer learning of our model. We hope this work can contribute to the community by providing a powerful and reliable solution for 5mC site detection in genomic studies. AVAILABILITY AND IMPLEMENTATION The project code is available at https://github.com/Challis-yin/NanoCon.
Collapse
Affiliation(s)
- Chenglin Yin
- School of Software, Shandong University, Jinan, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Jianbo Qiao
- School of Software, Shandong University, Jinan, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| | - Xinbo Jiang
- School of Qilu Transportation, Shandong University, Jinan, China
| | - Saisai Teng
- School of Software, Shandong University, Jinan, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
5
|
Nielsen TK, Forero-Junco LM, Kot W, Moineau S, Hansen LH, Riber L. Detection of nucleotide modifications in bacteria and bacteriophages: Strengths and limitations of current technologies and software. Mol Ecol 2023; 32:1236-1247. [PMID: 36052951 DOI: 10.1111/mec.16679] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/27/2022]
Abstract
RNA and DNA modifications occur in eukaryotes and prokaryotes, as well as in their viruses, and serve a wide range of functions, from gene regulation to nucleic acid protection. Although the first nucleotide modification was discovered almost 100 years ago, new and unusual modifications are still being described. Nucleotide modifications have also received more attention lately because of their increased significance, but also because new sequencing approaches have eased their detection. Chiefly, third generation sequencing platforms PacBio and Nanopore offer direct detection of modified bases by measuring deviations of the signals. These unusual modifications are especially prevalent in bacteriophage genomes, the viruses of bacteria, where they mostly appear to protect DNA against degradation from host nucleases. In this Opinion article, we highlight and discuss current approaches to detect nucleotide modifications, including hardwares and softwares, and look onward to future applications, especially for studying unusual, rare, or complex genome modifications in bacteriophages. The ability to distinguish between several types of nucleotide modifications may even shed new light on metagenomic studies.
Collapse
Affiliation(s)
- Tue Kjaergaard Nielsen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | | | - Witold Kot
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Sylvain Moineau
- Département de biochimie, de microbiologie, et de bio-informatique, Faculté des sciences et de génie, Université Laval, Québec, Quebec, Canada
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Leise Riber
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| |
Collapse
|