1
|
Wang Z, Fang Y, Liu Z, Hao N, Zhang HH, Sun X, Que J, Ding H. Adapting Nanopore Sequencing Basecalling Models for Modification Detection via Incremental Learning and Anomaly Detection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.19.572449. [PMID: 38187611 PMCID: PMC10769248 DOI: 10.1101/2023.12.19.572431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
We leverage machine learning approaches to adapt nanopore sequencing basecallers for nucleotide modification detection. We first apply the incremental learning technique to improve the basecalling of modification-rich sequences, which are usually of high biological interests. With sequence backbones resolved, we further run anomaly detection on individual nucleotides to determine their modification status. By this means, our pipeline promises the single-molecule, single-nucleotide and sequence context-free detection of modifications. We benchmark the pipeline using control oligos, further apply it in the basecalling of densely-modified yeast tRNAs and E.coli genomic DNAs, the cross-species detection of N6-methyladenosine (m6A) in mammalian mRNAs, and the simultaneous detection of N1-methyladenosine (m1A) and m6A in human mRNAs. Our IL-AD workflow is available at: https://github.com/wangziyuan66/IL-AD.
Collapse
Affiliation(s)
- Ziyuan Wang
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, Arizona, USA
- These authors contributed equally to this work
| | - Yinshan Fang
- Columbia Center for Human Development, Department of Medicine, Columbia University Medical Center, New York, New York, USA
- These authors contributed equally to this work
| | - Ziyang Liu
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, Arizona, USA
- Statistics and Data Science GIDP, University of Arizona, Tucson, Arizona, USA
| | - Ning Hao
- Statistics and Data Science GIDP, University of Arizona, Tucson, Arizona, USA
- Department of Mathematics, University of Arizona, Tucson, Arizona, USA
| | - Hao Helen Zhang
- Statistics and Data Science GIDP, University of Arizona, Tucson, Arizona, USA
- Department of Mathematics, University of Arizona, Tucson, Arizona, USA
| | - Xiaoxiao Sun
- Statistics and Data Science GIDP, University of Arizona, Tucson, Arizona, USA
- Department of Epidemiology and Biostatistics, University of Arizona, Tucson, Arizona, USA
| | - Jianwen Que
- Columbia Center for Human Development, Department of Medicine, Columbia University Medical Center, New York, New York, USA
| | - Hongxu Ding
- Department of Pharmacy Practice and Science, University of Arizona, Tucson, Arizona, USA
- Statistics and Data Science GIDP, University of Arizona, Tucson, Arizona, USA
| |
Collapse
|
2
|
Zhang J, Sheng H, Hu C, Li F, Cai B, Ma Y, Wang Y, Ma Y. Effects of DNA Methylation on Gene Expression and Phenotypic Traits in Cattle: A Review. Int J Mol Sci 2023; 24:11882. [PMID: 37569258 PMCID: PMC10419045 DOI: 10.3390/ijms241511882] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 07/20/2023] [Accepted: 07/22/2023] [Indexed: 08/13/2023] Open
Abstract
Gene expression in cells is determined by the epigenetic state of chromatin. Therefore, the study of epigenetic changes is very important to understand the regulatory mechanism of genes at the molecular, cellular, tissue and organ levels. DNA methylation is one of the most studied epigenetic modifications, which plays an important role in maintaining genome stability and ensuring normal growth and development. Studies have shown that methylation levels in bovine primordial germ cells, the rearrangement of methylation during embryonic development and abnormal methylation during placental development are all closely related to their reproductive processes. In addition, the application of bovine male sterility and assisted reproductive technology is also related to DNA methylation. This review introduces the principle, development of detection methods and application conditions of DNA methylation, with emphasis on the relationship between DNA methylation dynamics and bovine spermatogenesis, embryonic development, disease resistance and muscle and fat development, in order to provide theoretical basis for the application of DNA methylation in cattle breeding in the future.
Collapse
Affiliation(s)
- Junxing Zhang
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Hui Sheng
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
| | - Chunli Hu
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
| | - Fen Li
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
| | - Bei Cai
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
| | - Yanfen Ma
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
| | - Yachun Wang
- College of Animal Science and Technology, China Agricultural University, Beijing 100193, China
| | - Yun Ma
- Key Laboratory of Ruminant Molecular Cell Breeding of Ningxia Hui Autonomous Region, College of Animal Science and Technology, Ningxia University, Yinchuan 750021, China; (J.Z.); (H.S.); (C.H.); (F.L.); (B.C.); (Y.M.)
| |
Collapse
|
3
|
Nielsen TK, Forero-Junco LM, Kot W, Moineau S, Hansen LH, Riber L. Detection of nucleotide modifications in bacteria and bacteriophages: Strengths and limitations of current technologies and software. Mol Ecol 2023; 32:1236-1247. [PMID: 36052951 DOI: 10.1111/mec.16679] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 08/25/2022] [Accepted: 08/30/2022] [Indexed: 11/27/2022]
Abstract
RNA and DNA modifications occur in eukaryotes and prokaryotes, as well as in their viruses, and serve a wide range of functions, from gene regulation to nucleic acid protection. Although the first nucleotide modification was discovered almost 100 years ago, new and unusual modifications are still being described. Nucleotide modifications have also received more attention lately because of their increased significance, but also because new sequencing approaches have eased their detection. Chiefly, third generation sequencing platforms PacBio and Nanopore offer direct detection of modified bases by measuring deviations of the signals. These unusual modifications are especially prevalent in bacteriophage genomes, the viruses of bacteria, where they mostly appear to protect DNA against degradation from host nucleases. In this Opinion article, we highlight and discuss current approaches to detect nucleotide modifications, including hardwares and softwares, and look onward to future applications, especially for studying unusual, rare, or complex genome modifications in bacteriophages. The ability to distinguish between several types of nucleotide modifications may even shed new light on metagenomic studies.
Collapse
Affiliation(s)
- Tue Kjaergaard Nielsen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | | | - Witold Kot
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Sylvain Moineau
- Département de biochimie, de microbiologie, et de bio-informatique, Faculté des sciences et de génie, Université Laval, Québec, Quebec, Canada
| | - Lars Hestbjerg Hansen
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Leise Riber
- Department of Plant and Environmental Sciences, University of Copenhagen, Frederiksberg, Denmark
| |
Collapse
|
4
|
Simultaneous profiling of histone modifications and DNA methylation via nanopore sequencing. Nat Commun 2022; 13:7939. [PMID: 36566265 PMCID: PMC9789962 DOI: 10.1038/s41467-022-35650-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 12/15/2022] [Indexed: 12/25/2022] Open
Abstract
The interplay between histone modifications and DNA methylation drives the establishment and maintenance of the cellular epigenomic landscape, but it remains challenging to investigate the complex relationship between these epigenetic marks across the genome. Here we describe a nanopore-sequencing-based-method, nanoHiMe-seq, for interrogating the genome-wide localization of histone modifications and DNA methylation from single DNA molecules. nanoHiMe-seq leverages a nonspecific methyltransferase to exogenously label adenine bases proximal to antibody-targeted modified nucleosomes in situ. The labelled adenines and the endogenous methylated CpG sites are simultaneously detected on individual nanopore reads using a hidden Markov model, which is implemented in the nanoHiMe software package. We demonstrate the utility, robustness and sensitivity of nanoHiMe-seq by jointly profiling DNA methylation and histone modifications at low coverage depths, concurrently determining phased patterns of DNA methylation and histone modifications, and probing the intrinsic connectivity between these epigenetic marks across the genome.
Collapse
|
5
|
Nguyen TA, Heng JWJ, Kaewsapsak P, Kok EPL, Stanojević D, Liu H, Cardilla A, Praditya A, Yi Z, Lin M, Aw JGA, Ho YY, Peh KLE, Wang Y, Zhong Q, Heraud-Farlow J, Xue S, Reversade B, Walkley C, Ho YS, Šikić M, Wan Y, Tan MH. Direct identification of A-to-I editing sites with nanopore native RNA sequencing. Nat Methods 2022; 19:833-844. [PMID: 35697834 DOI: 10.1038/s41592-022-01513-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 05/02/2022] [Indexed: 12/26/2022]
Abstract
Inosine is a prevalent RNA modification in animals and is formed when an adenosine is deaminated by the ADAR family of enzymes. Traditionally, inosines are identified indirectly as variants from Illumina RNA-sequencing data because they are interpreted as guanosines by cellular machineries. However, this indirect method performs poorly in protein-coding regions where exons are typically short, in non-model organisms with sparsely annotated single-nucleotide polymorphisms, or in disease contexts where unknown DNA mutations are pervasive. Here, we show that Oxford Nanopore direct RNA sequencing can be used to identify inosine-containing sites in native transcriptomes with high accuracy. We trained convolutional neural network models to distinguish inosine from adenosine and guanosine, and to estimate the modification rate at each editing site. Furthermore, we demonstrated their utility on the transcriptomes of human, mouse and Xenopus. Our approach expands the toolkit for studying adenosine-to-inosine editing and can be further extended to investigate other RNA modifications.
Collapse
Affiliation(s)
- Tram Anh Nguyen
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Jia Wei Joel Heng
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Pornchai Kaewsapsak
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Eng Piew Louis Kok
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Dominik Stanojević
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| | - Hao Liu
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Angelysia Cardilla
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Albert Praditya
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Zirong Yi
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Mingwan Lin
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,National Junior College, Singapore, Singapore
| | - Jong Ghut Ashley Aw
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Yin Ying Ho
- Bioprocessing Technology Institute, Agency for Science Technology and Research, Singapore, Singapore
| | - Kai Lay Esther Peh
- Bioprocessing Technology Institute, Agency for Science Technology and Research, Singapore, Singapore
| | - Yuanming Wang
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore.,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Qixing Zhong
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore
| | - Jacki Heraud-Farlow
- St. Vincent's Institute of Medical Research and Department of Medicine, University of Melbourne, Fitzroy, Victoria, Australia
| | - Shifeng Xue
- Institute of Molecular and Cell Biology, Agency for Science Technology and Research, Singapore, Singapore.,Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Bruno Reversade
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,Institute of Molecular and Cell Biology, Agency for Science Technology and Research, Singapore, Singapore.,Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.,Department of Medical Genetics, School of Medicine (KUSoM), Koç University, Istanbul, Turkey
| | - Carl Walkley
- St. Vincent's Institute of Medical Research and Department of Medicine, University of Melbourne, Fitzroy, Victoria, Australia
| | - Ying Swan Ho
- Bioprocessing Technology Institute, Agency for Science Technology and Research, Singapore, Singapore
| | - Mile Šikić
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,University of Zagreb, Faculty of Electrical Engineering and Computing, Zagreb, Croatia
| | - Yue Wan
- Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Meng How Tan
- School of Chemical and Biomedical Engineering, Nanyang Technological University, Singapore, Singapore. .,Genome Institute of Singapore, Agency for Science Technology and Research, Singapore, Singapore. .,HP-NTU Digital Manufacturing Corporate Lab, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
6
|
Bailey AD, Talkish J, Ding H, Igel H, Duran A, Mantripragada S, Paten B, Ares M. Concerted modification of nucleotides at functional centers of the ribosome revealed by single-molecule RNA modification profiling. eLife 2022; 11:e76562. [PMID: 35384842 PMCID: PMC9045821 DOI: 10.7554/elife.76562] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 04/05/2022] [Indexed: 12/02/2022] Open
Abstract
Nucleotides in RNA and DNA are chemically modified by numerous enzymes that alter their function. Eukaryotic ribosomal RNA (rRNA) is modified at more than 100 locations, particularly at highly conserved and functionally important nucleotides. During ribosome biogenesis, modifications are added at various stages of assembly. The existence of differently modified classes of ribosomes in normal cells is unknown because no method exists to simultaneously evaluate the modification status at all sites within a single rRNA molecule. Using a combination of yeast genetics and nanopore direct RNA sequencing, we developed a reliable method to track the modification status of single rRNA molecules at 37 sites in 18 S rRNA and 73 sites in 25 S rRNA. We use our method to characterize patterns of modification heterogeneity and identify concerted modification of nucleotides found near functional centers of the ribosome. Distinct, undermodified subpopulations of rRNAs accumulate upon loss of Dbp3 or Prp43 RNA helicases, suggesting overlapping roles in ribosome biogenesis. Modification profiles are surprisingly resistant to change in response to many genetic and acute environmental conditions that affect translation, ribosome biogenesis, and pre-mRNA splicing. The ability to capture single-molecule RNA modification profiles provides new insights into the roles of nucleotide modifications in RNA function.
Collapse
Affiliation(s)
- Andrew D Bailey
- Department of Biomolecular Engineering and Santa Cruz Genomics Institute, University of California, Santa CruzSanta CruzUnited States
| | - Jason Talkish
- RNA Center and Department of Molecular, Cell & Developmental Biology, University of California, Santa CruzSanta CruzUnited States
| | - Hongxu Ding
- Department of Biomolecular Engineering and Santa Cruz Genomics Institute, University of California, Santa CruzSanta CruzUnited States
- Department of Pharmacy Practice & Science, College of Pharmacy, University of ArizonaTucsonUnited States
| | - Haller Igel
- RNA Center and Department of Molecular, Cell & Developmental Biology, University of California, Santa CruzSanta CruzUnited States
| | | | | | - Benedict Paten
- Department of Biomolecular Engineering and Santa Cruz Genomics Institute, University of California, Santa CruzSanta CruzUnited States
| | - Manuel Ares
- RNA Center and Department of Molecular, Cell & Developmental Biology, University of California, Santa CruzSanta CruzUnited States
| |
Collapse
|
7
|
Zhao X, Zhang Y, Hang D, Meng J, Wei Z. Detecting RNA modification using direct RNA sequencing: A systematic review. Comput Struct Biotechnol J 2022; 20:5740-5749. [PMID: 36382183 PMCID: PMC9619219 DOI: 10.1016/j.csbj.2022.10.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 10/16/2022] [Accepted: 10/16/2022] [Indexed: 11/28/2022] Open
Abstract
Post-transcriptional RNA modifications are involved in a range of important cellular processes, including the regulation of gene expression and fine-tuning of the functions of RNA molecules. To decipher the context-specific functions of these post-transcriptional modifications, it is crucial to accurately determine their transcriptomic locations and modification levels under a given cellular condition. With the newly emerged sequencing technology, especially nanopore direct RNA sequencing, different RNA modifications can be detected simultaneously with a single molecular level resolution. Here we provide a systematic review of 15 published RNA modification prediction tools based on direct RNA sequencing data, including their computational models, input–output formats, supported modification types, and reported performances. Finally, we also discussed the potential challenges and future improvements of nanopore sequencing-based methods for RNA modification detection.
Collapse
|
8
|
Xie S, Leung AWS, Zheng Z, Zhang D, Xiao C, Luo R, Luo M, Zhang S. Applications and potentials of nanopore sequencing in the (epi)genome and (epi)transcriptome era. Innovation (N Y) 2021; 2:100153. [PMID: 34901902 PMCID: PMC8640597 DOI: 10.1016/j.xinn.2021.100153] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 08/09/2021] [Indexed: 02/08/2023] Open
Abstract
The Human Genome Project opened an era of (epi)genomic research, and also provided a platform for the development of new sequencing technologies. During and after the project, several sequencing technologies continue to dominate nucleic acid sequencing markets. Currently, Illumina (short-read), PacBio (long-read), and Oxford Nanopore (long-read) are the most popular sequencing technologies. Unlike PacBio or the popular short-read sequencers before it, which, as examples of the second or so-called Next-Generation Sequencing platforms, need to synthesize when sequencing, nanopore technology directly sequences native DNA and RNA molecules. Nanopore sequencing, therefore, avoids converting mRNA into cDNA molecules, which not only allows for the sequencing of extremely long native DNA and full-length RNA molecules but also document modifications that have been made to those native DNA or RNA bases. In this review on direct DNA sequencing and direct RNA sequencing using Oxford Nanopore technology, we focus on their development and application achievements, discussing their challenges and future perspective. We also address the problems researchers may encounter applying these approaches in their research topics, and how to resolve them. Nanopore-seq can dissect native DNA/RNA molecules from any organisms at unlimited length A wide variety of algorithms greatly increase the accuracy of signal decoding in Nanopore-Seq Nanopore-Seq significantly facilitates genome assembly and structural variant calling, and can simultaneously detect base modifications These advantages ensure its great potentials in future medical and agricultural practices
Collapse
Affiliation(s)
- Shangqian Xie
- Key Laboratory of Ministry of Education for Genetics and Germplasm Innovation of Tropical Special Trees and Ornamental Plants, College of Forestry, Hainan University, Haikou 570228, China
| | - Amy Wing-Sze Leung
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Zhenxian Zheng
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Dake Zhang
- Beijing Advanced Innovation Centre for Biomedical Engineering, Key Laboratory for Biomechanics and Mechanobiology of Ministry of Education, School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China
| | - Chuanle Xiao
- State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Centre, Sun Yat-sen University, Guangzhou 510060, China
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Ming Luo
- Agriculture and Biotechnology Research Center, Guangdong Provincial Key Laboratory of Applied Botany, Center of Economic Botany, Core Botanical Gardens, South China Botanical Garden, Chinese Academy of Sciences, Guangzhou 510650, China
| | - Shoudong Zhang
- School of Life Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong 999077, China.,Center for Soybean Research of the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, Hong Kong 999077, China
| |
Collapse
|
9
|
Ding H, Anastopoulos I, Bailey AD, Stuart J, Paten B. Towards inferring nanopore sequencing ionic currents from nucleotide chemical structures. Nat Commun 2021; 12:6545. [PMID: 34764310 PMCID: PMC8586022 DOI: 10.1038/s41467-021-26929-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/19/2021] [Indexed: 01/07/2023] Open
Abstract
The characteristic ionic currents of nucleotide kmers are commonly used in analyzing nanopore sequencing readouts. We present a graph convolutional network-based deep learning framework for predicting kmer characteristic ionic currents from corresponding chemical structures. We show such a framework can generalize the chemical information of the 5-methyl group from thymine to cytosine by correctly predicting 5-methylcytosine-containing DNA 6mers, thus shedding light on the de novo detection of nucleotide modifications.
Collapse
Affiliation(s)
- Hongxu Ding
- Department of Biomolecular Engineering, UC Santa Cruz, Santa Cruz, CA, USA.
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
| | - Ioannis Anastopoulos
- Department of Biomolecular Engineering, UC Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Andrew D Bailey
- Department of Biomolecular Engineering, UC Santa Cruz, Santa Cruz, CA, USA
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
| | - Joshua Stuart
- Department of Biomolecular Engineering, UC Santa Cruz, Santa Cruz, CA, USA.
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
| | - Benedict Paten
- Department of Biomolecular Engineering, UC Santa Cruz, Santa Cruz, CA, USA.
- UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.
| |
Collapse
|
10
|
Beyond sequencing: machine learning algorithms extract biology hidden in Nanopore signal data. Trends Genet 2021; 38:246-257. [PMID: 34711425 DOI: 10.1016/j.tig.2021.09.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/01/2021] [Accepted: 09/01/2021] [Indexed: 11/24/2022]
Abstract
Nanopore sequencing provides signal data corresponding to the nucleotide motifs sequenced. Through machine learning-based methods, these signals are translated into long-read sequences that overcome the read size limit of short-read sequencing. However, analyzing the raw nanopore signal data provides many more opportunities beyond just sequencing genomes and transcriptomes: algorithms that use machine learning approaches to extract biological information from these signals allow the detection of DNA and RNA modifications, the estimation of poly(A) tail length, and the prediction of RNA secondary structures. In this review, we discuss how developments in machine learning methodologies contributed to more accurate basecalling and lower error rates, and how these methods enable new biological discoveries. We argue that direct nanopore sequencing of DNA and RNA provides a new dimensionality for genomics experiments and highlight challenges and future directions for computational approaches to extract the additional information provided by nanopore signal data.
Collapse
|
11
|
Identification of differential RNA modifications from nanopore direct RNA sequencing with xPore. Nat Biotechnol 2021; 39:1394-1402. [PMID: 34282325 DOI: 10.1038/s41587-021-00949-w] [Citation(s) in RCA: 117] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2020] [Accepted: 05/10/2021] [Indexed: 12/14/2022]
Abstract
RNA modifications, such as N6-methyladenosine (m6A), modulate functions of cellular RNA species. However, quantifying differences in RNA modifications has been challenging. Here we develop a computational method, xPore, to identify differential RNA modifications from nanopore direct RNA sequencing (RNA-seq) data. We evaluate our method on transcriptome-wide m6A profiling data, demonstrating that xPore identifies positions of m6A sites at single-base resolution, estimates the fraction of modified RNA species in the cell and quantifies the differential modification rate across conditions. We apply xPore to direct RNA-seq data from six cell lines and multiple myeloma patient samples without a matched control sample and find that many m6A sites are preserved across cell types, whereas a subset exhibit significant differences in their modification rates. Our results show that RNA modifications can be identified from direct RNA-seq data with high accuracy, enabling analysis of differential modifications and expression from a single high-throughput experiment.
Collapse
|
12
|
Boemo MA. DNAscent v2: detecting replication forks in nanopore sequencing data with deep learning. BMC Genomics 2021; 22:430. [PMID: 34107894 PMCID: PMC8191041 DOI: 10.1186/s12864-021-07736-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 05/25/2021] [Indexed: 11/20/2022] Open
Abstract
Background Measuring DNA replication dynamics with high throughput and single-molecule resolution is critical for understanding both the basic biology behind how cells replicate their DNA and how DNA replication can be used as a therapeutic target for diseases like cancer. In recent years, the detection of base analogues in Oxford Nanopore Technologies (ONT) sequencing reads has become a promising new method to supersede existing single-molecule methods such as DNA fibre analysis: ONT sequencing yields long reads with high throughput, and sequenced molecules can be mapped to the genome using standard sequence alignment software. Results This paper introduces DNAscent v2, software that uses a residual neural network to achieve fast, accurate detection of the thymidine analogue BrdU with single-nucleotide resolution. DNAscent v2 also comes equipped with an autoencoder that interprets the pattern of BrdU incorporation on each ONT-sequenced molecule into replication fork direction to call the location of replication origins termination sites. DNAscent v2 surpasses previous versions of DNAscent in BrdU calling accuracy, origin calling accuracy, speed, and versatility across different experimental protocols. Unlike NanoMod, DNAscent v2 positively identifies BrdU without the need for sequencing unmodified DNA. Unlike RepNano, DNAscent v2 calls BrdU with single-nucleotide resolution and detects more origins than RepNano from the same sequencing data. DNAscent v2 is open-source and available at https://github.com/MBoemo/DNAscent. Conclusions This paper shows that DNAscent v2 is the new state-of-the-art in the high-throughput, single-molecule detection of replication fork dynamics. These improvements in DNAscent v2 mark an important step towards measuring DNA replication dynamics in large genomes with single-molecule resolution. Looking forward, the increase in accuracy in single-nucleotide resolution BrdU calls will also allow DNAscent v2 to branch out into other areas of genome stability research, particularly the detection of DNA repair. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07736-6).
Collapse
Affiliation(s)
- Michael A Boemo
- Department of Pathology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
13
|
Martisova A, Holcakova J, Izadi N, Sebuyoya R, Hrstka R, Bartosik M. DNA Methylation in Solid Tumors: Functions and Methods of Detection. Int J Mol Sci 2021; 22:ijms22084247. [PMID: 33921911 PMCID: PMC8073724 DOI: 10.3390/ijms22084247] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 04/16/2021] [Accepted: 04/16/2021] [Indexed: 02/06/2023] Open
Abstract
DNA methylation, i.e., addition of methyl group to 5′-carbon of cytosine residues in CpG dinucleotides, is an important epigenetic modification regulating gene expression, and thus implied in many cellular processes. Deregulation of DNA methylation is strongly associated with onset of various diseases, including cancer. Here, we review how DNA methylation affects carcinogenesis process and give examples of solid tumors where aberrant DNA methylation is often present. We explain principles of methods developed for DNA methylation analysis at both single gene and whole genome level, based on (i) sodium bisulfite conversion, (ii) methylation-sensitive restriction enzymes, and (iii) interactions of 5-methylcytosine (5mC) with methyl-binding proteins or antibodies against 5mC. In addition to standard methods, we describe recent advances in next generation sequencing technologies applied to DNA methylation analysis, as well as in development of biosensors that represent their cheaper and faster alternatives. Most importantly, we highlight not only advantages, but also disadvantages and challenges of each method.
Collapse
|
14
|
Yoluç Y, Ammann G, Barraud P, Jora M, Limbach PA, Motorin Y, Marchand V, Tisné C, Borland K, Kellner S. Instrumental analysis of RNA modifications. Crit Rev Biochem Mol Biol 2021; 56:178-204. [PMID: 33618598 DOI: 10.1080/10409238.2021.1887807] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Organisms from all domains of life invest a substantial amount of energy for the introduction of RNA modifications into nearly all transcripts studied to date. Instrumental analysis of RNA can focus on the modified residues and reveal the function of these epitranscriptomic marks. Here, we will review recent advances and breakthroughs achieved by NMR spectroscopy, sequencing, and mass spectrometry of the epitranscriptome.
Collapse
Affiliation(s)
- Yasemin Yoluç
- Department of Chemistry, Ludwig Maximilians University, Munich, Germany
| | - Gregor Ammann
- Department of Chemistry, Ludwig Maximilians University, Munich, Germany
| | - Pierre Barraud
- Expression génétique microbienne, UMR 8261, CNRS, Institut de biologie physico-chimique, IBPC, Université de Paris, Paris, France
| | - Manasses Jora
- Department of Chemistry, University of Cincinnati, Cincinnati, OH, USA
| | - Patrick A Limbach
- Department of Chemistry, University of Cincinnati, Cincinnati, OH, USA
| | - Yuri Motorin
- Université de Lorraine, CNRS, UMR7365 IMoPA, Nancy, France
| | - Virginie Marchand
- Université de Lorraine, CNRS, INSERM, Epitranscriptomics and RNA Sequencing Core facility, UM S2008, IBSLor, Nancy, France
| | - Carine Tisné
- Expression génétique microbienne, UMR 8261, CNRS, Institut de biologie physico-chimique, IBPC, Université de Paris, Paris, France
| | - Kayla Borland
- Department of Chemistry, Ludwig Maximilians University, Munich, Germany
| | - Stefanie Kellner
- Department of Chemistry, Ludwig Maximilians University, Munich, Germany.,Institute of Pharmaceutical Chemistry, Goethe-University, Frankfurt, Germany
| |
Collapse
|
15
|
Analysis of RNA Modifications by Second- and Third-Generation Deep Sequencing: 2020 Update. Genes (Basel) 2021; 12:genes12020278. [PMID: 33669207 PMCID: PMC7919787 DOI: 10.3390/genes12020278] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/11/2021] [Accepted: 02/12/2021] [Indexed: 12/14/2022] Open
Abstract
The precise mapping and quantification of the numerous RNA modifications that are present in tRNAs, rRNAs, ncRNAs/miRNAs, and mRNAs remain a major challenge and a top priority of the epitranscriptomics field. After the keystone discoveries of massive m6A methylation in mRNAs, dozens of deep sequencing-based methods and protocols were proposed for the analysis of various RNA modifications, allowing us to considerably extend the list of detectable modified residues. Many of the currently used methods rely on the particular reverse transcription signatures left by RNA modifications in cDNA; these signatures may be naturally present or induced by an appropriate enzymatic or chemical treatment. The newest approaches also include labeling at RNA abasic sites that result from the selective removal of RNA modification or the enhanced cleavage of the RNA ribose-phosphate chain (perhaps also protection from cleavage), followed by specific adapter ligation. Classical affinity/immunoprecipitation-based protocols use either antibodies against modified RNA bases or proteins/enzymes, recognizing RNA modifications. In this survey, we review the most recent achievements in this highly dynamic field, including promising attempts to map RNA modifications by the direct single-molecule sequencing of RNA by nanopores.
Collapse
|
16
|
Gao Y, Liu X, Wu B, Wang H, Xi F, Kohnen MV, Reddy ASN, Gu L. Quantitative profiling of N 6-methyladenosine at single-base resolution in stem-differentiating xylem of Populus trichocarpa using Nanopore direct RNA sequencing. Genome Biol 2021; 22:22. [PMID: 33413586 PMCID: PMC7791831 DOI: 10.1186/s13059-020-02241-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Accepted: 12/14/2020] [Indexed: 02/08/2023] Open
Abstract
There are no comprehensive methods to identify N6-methyladenosine (m6A) at single-base resolution for every single transcript, which is necessary for the estimation of m6A abundance. We develop a new pipeline called Nanom6A for the identification and quantification of m6A modification at single-base resolution using Nanopore direct RNA sequencing based on an XGBoost model. We validate our method using methylated RNA immunoprecipitation sequencing (MeRIP-Seq) and m6A-sensitive RNA-endoribonuclease-facilitated sequencing (m6A-REF-seq), confirming high accuracy. Using this method, we provide a transcriptome-wide quantification of m6A modification in stem-differentiating xylem and reveal that different alternative polyadenylation (APA) usage shows a different ratio of m6A.
Collapse
Affiliation(s)
- Yubang Gao
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xuqing Liu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Bizhi Wu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Huihui Wang
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Feihu Xi
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Markus V Kohnen
- Basic Forestry and Proteomics Research Center, College of Life Science, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Anireddy S N Reddy
- Department of Biology and Program in Cell and Molecular Biology, Colorado State University, Fort Collins, CO, USA
| | - Lianfeng Gu
- Basic Forestry and Proteomics Research Center, College of Forestry, Fujian Agriculture and Forestry University, Fuzhou, 350002, China.
| |
Collapse
|
17
|
Cechova M. Probably Correct: Rescuing Repeats with Short and Long Reads. Genes (Basel) 2020; 12:48. [PMID: 33396198 PMCID: PMC7823596 DOI: 10.3390/genes12010048] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 12/23/2020] [Accepted: 12/24/2020] [Indexed: 02/07/2023] Open
Abstract
Ever since the introduction of high-throughput sequencing following the human genome project, assembling short reads into a reference of sufficient quality posed a significant problem as a large portion of the human genome-estimated 50-69%-is repetitive. As a result, a sizable proportion of sequencing reads is multi-mapping, i.e., without a unique placement in the genome. The two key parameters for whether or not a read is multi-mapping are the read length and genome complexity. Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". Moreover, identical reads or repeat arrays can be differentiated based on their epigenetic marks, such as methylation patterns, aiding in the assembly process. This is despite the fact that long reads still contain a modest percentage of sequencing errors, disorienting the aligners and assemblers both in accuracy and speed. Here, I review the proposed and implemented solutions to the repeat resolution and the multi-mapping read problem, as well as the downstream consequences of reference choice, repeat masking, and proper representation of sex chromosomes. I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat positioning within pangenomes.
Collapse
Affiliation(s)
- Monika Cechova
- Genetics and Reproductive Biotechnologies, Veterinary Research Institute, Central European Institute of Technology (CEITEC), 621 00 Brno, Czech Republic
| |
Collapse
|