1
|
Diao B, Luo J, Guo Y. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs. Brief Funct Genomics 2024; 23:314-324. [PMID: 38576205 DOI: 10.1093/bfgp/elae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body's normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
Collapse
Affiliation(s)
- Biyu Diao
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Jin Luo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Yu Guo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| |
Collapse
|
2
|
XIONG J, FENG T, YUAN BF. [Advances in mapping analysis of ribonucleic acid modifications through sequencing]. Se Pu 2024; 42:632-645. [PMID: 38966972 PMCID: PMC11224946 DOI: 10.3724/sp.j.1123.2023.12025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Indexed: 07/06/2024] Open
Abstract
Over 170 chemical modifications have been discovered in various types of ribonucleic acids (RNAs), including messenger RNA (mRNA), ribosomal RNA (rRNA), transfer RNA (tRNA), and small nuclear RNA (snRNA). These RNA modifications play crucial roles in a wide range of biological processes such as gene expression regulation, RNA stability maintenance, and protein translation. RNA modifications represent a new dimension of gene expression regulation known as the "epitranscriptome". The discovery of RNA modifications and the relevant writers, erasers, and readers provides an important basis for studies on the dynamic regulation and physiological functions of RNA modifications. Owing to the development of detection technologies for RNA modifications, studies on RNA epitranscriptomes have progressed to the single-base resolution, multilayer, and full-coverage stage. Transcriptome-wide methods help discover new RNA modification sites and are of great importance for elucidating the molecular regulatory mechanisms of epitranscriptomics, exploring the disease associations of RNA modifications, and understanding their clinical applications. The existing RNA modification sequencing technologies can be categorized according to the pretreatment approach and sequencing principle as direct high-throughput sequencing, antibody-enrichment sequencing, enzyme-assisted sequencing, chemical labeling-assisted sequencing, metabolic labeling sequencing, and nanopore sequencing technologies. These methods, as well as studies on the functions of RNA modifications, have greatly expanded our understanding of epitranscriptomics. In this review, we summarize the recent progress in RNA modification detection technologies, focusing on the basic principles, advantages, and limitations of different methods. Direct high-throughput sequencing methods do not require complex RNA pretreatment and allow for the mapping of RNA modifications using conventional RNA sequencing methods. However, only a few RNA modifications can be analyzed by high-throughput sequencing. Antibody enrichment followed by high-throughput sequencing has emerged as a crucial approach for mapping RNA modifications, significantly advancing the understanding of RNA modifications and their regulatory functions in different species. However, the resolution of antibody-enrichment sequencing is limited to approximately 100-200 bp. Although chemical crosslinking techniques can achieve single-base resolution, these methods are often complex, and the specificity of the antibodies used in these methods has raised concerns. In particular, the issue of off-target binding by the antibodies requires urgent attention. Enzyme-assisted sequencing has improved the accuracy of the localization analysis of RNA modifications and enables stoichiometric detection with single-base resolution. However, the enzymes used in this technique show poor reactivity, specificity, and sequence preference. Chemical labeling sequencing has become a widely used approach for profiling RNA modifications, particularly by altering reverse transcription (RT) signatures such as RT stops, misincorporations, and deletions. Chemical-assisted sequencing provides a sequence-independent RNA modification detection strategy that enables the localization of multiple RNA modifications. Additionally, when combined with the biotin-streptavidin affinity method, low-abundance RNA modifications can be enriched and detected. Nevertheless, the specificity of many chemical reactions remains problematic, and the development of specific reaction probes for particular modifications should continue in the future to achieve the precise localization of RNA modifications. As an indirect localization method, metabolic labeling sequencing specifically localizes the sites at which modifying enzymes act, which is of great significance in the study of RNA modification functions. However, this method is limited by the intracellular labeling of RNA and cannot be applied to biological samples such as clinical tissues and blood samples. Nanopore sequencing is a direct RNA-sequencing method that does not require RT or the polymerase chain reaction (PCR). However, challenges in analyzing the data obtained from nanopore sequencing, such as the high rate of false positives, must be resolved. Discussing sequencing analysis methods for various types of RNA modifications is instructive for the future development of novel RNA modification mapping technologies, and will aid studies on the functions of RNA modifications across the entire transcriptome.
Collapse
|
3
|
Zhang Y, Yan H, Wei Z, Hong H, Huang D, Liu G, Qin Q, Rong R, Gao P, Meng J, Ying B. NanoMUD: Profiling of pseudouridine and N1-methylpseudouridine using Oxford Nanopore direct RNA sequencing. Int J Biol Macromol 2024; 270:132433. [PMID: 38759861 DOI: 10.1016/j.ijbiomac.2024.132433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 05/13/2024] [Accepted: 05/14/2024] [Indexed: 05/19/2024]
Abstract
Nanopore direct RNA sequencing provided a promising solution for unraveling the landscapes of modifications on single RNA molecules. Here, we proposed NanoMUD, a computational framework for predicting the RNA pseudouridine modification (Ψ) and its methylated analog N1-methylpseudouridine (m1Ψ), which have critical application in mRNA vaccination, at single-base and single-molecule resolution from direct RNA sequencing data. Electric signal features were fed into a bidirectional LSTM neural network to achieve improved accuracy and predictive capabilities. Motif-specific models (NNUNN, N = A, C, U or G) were trained based on features extracted from designed dataset and achieved superior performance on molecule-level modification prediction (Ψ models: min AUC = 0.86, max AUC = 0.99; m1Ψ models: min AUC = 0.87, max AUC = 0.99). We then aggregated read-level predictions for site stoichiometry estimation. Given the observed sequence-dependent bias in model performance, we trained regression models based on the distribution of modification probabilities for sites with known stoichiometry. The distribution-based site stoichiometry estimation method allows unbiased comparison between different contexts. To demonstrate the feasibility of our work, three case studies on both in vitro and in vivo transcribed RNAs were presented. NanoMUD will make a powerful tool to facilitate the research on modified therapeutic IVT RNAs and provides useful insight to the landscape and stoichiometry of pseudouridine and N1-pseudouridine on in vivo transcribed RNA species.
Collapse
Affiliation(s)
- Yuxin Zhang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Huayuan Yan
- Suzhou Abogen Biosciences Co., Ltd., Suzhou 215123, China
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Haifeng Hong
- Suzhou Abogen Biosciences Co., Ltd., Suzhou 215123, China
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Guopeng Liu
- Suzhou Abogen Biosciences Co., Ltd., Suzhou 215123, China
| | - Qianshan Qin
- Suzhou Abogen Biosciences Co., Ltd., Suzhou 215123, China
| | - Rong Rong
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China
| | - Peng Gao
- Suzhou Abogen Biosciences Co., Ltd., Suzhou 215123, China.
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; AI University Research Centre, Xi'an Jiaotong-Liverpool University, Suzhou 215123, China; Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom.
| | - Bo Ying
- Suzhou Abogen Biosciences Co., Ltd., Suzhou 215123, China.
| |
Collapse
|
4
|
Ye H, Li T, Rigden DJ, Wei Z. m6ACali: machine learning-powered calibration for accurate m6A detection in MeRIP-Seq. Nucleic Acids Res 2024; 52:4830-4842. [PMID: 38634812 PMCID: PMC11109940 DOI: 10.1093/nar/gkae280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 03/18/2024] [Accepted: 04/04/2024] [Indexed: 04/19/2024] Open
Abstract
We present m6ACali, a novel machine-learning framework aimed at enhancing the accuracy of N6-methyladenosine (m6A) epitranscriptome profiling by reducing the impact of non-specific antibody enrichment in MeRIP-Seq. The calibration model serves as a genomic feature-based classifier that refines the identification of m6A sites, distinguishing those genuinely present from those that can be detected in in-vitro transcribed (IVT) control experiments. We find that m6ACali effectively identifies non-specific binding peaks reported by exomePeak2 and MACS2 in novel MeRIP-Seq datasets without the need for paired IVT controls. The model interpretation revealed that off-target antibody binding sites commonly occur at short exons and short mRNAs, originating from high read coverage regions that share the motif sequence with true m6A sites. We also reveal that the ML strategy can efficiently adjust differentially methylated peaks and other antibody-dependent, base-resolution m6A detection techniques. As a result, m6ACali offers a promising method for the universal enhancement of m6A profiles generated by MeRIP-Seq experiments, elevating the benchmark for omics-level m6A data integration.
Collapse
Affiliation(s)
- Haokai Ye
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Tenglong Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA, USA
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L7 8TX Liverpool, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Life Course and Medical Sciences, University of Liverpool, L7 8TX Liverpool, UK
| |
Collapse
|
5
|
Huang Z, Wang J, Sun B, Qi M, Gao S, Liu H. Neutrophil extracellular trap-associated risk index for predicting outcomes and response to Wnt signaling inhibitors in triple-negative breast cancer. Sci Rep 2024; 14:4232. [PMID: 38379084 PMCID: PMC10879157 DOI: 10.1038/s41598-024-54888-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/18/2024] [Indexed: 02/22/2024] Open
Abstract
Triple-negative breast cancer (TNBC) is a type of breast cancer with poor prognosis, which is prone to distant metastasis and therapy resistance. The presence of neutrophil extracellular traps (NETs) contributes to the progression of breast cancer and is an efficient predictor of TNBC. We obtained the bulk and single-cell RNA sequencing data from public databases. Firstly, we identified five NET-related genes and constructed NET-related subgroups. Then, we constructed a risk index with three pivotal genes based on the differentially expressed genes between subgroups. Patients in the high-risk group had worse prognosis, clinicopathological features, and therapy response than low-risk group. Functional enrichment analysis revealed that the low-risk group was enriched in Wnt signaling pathway, and surprisingly, the drug sensitivity prediction showed that Wnt signaling pathway inhibitors had higher drug sensitivity in the low-risk group. Finally, verification experiments in vitro based on MDA-MB-231 and BT-549 cells showed that tumor cells with low-risk scores had less migration, invasion, and proliferative abilities and high drug sensitivity to Wnt signaling pathway inhibitors. In this study, multi-omics analysis revealed that genes associated with NETs may influence the occurrence, progression, and treatment of TNBC. Moreover, the bioinformatics analysis and cell experiments demonstrated that the risk index could predict the population of TNBC likely to benefit from treatment with Wnt signaling pathway inhibitors.
Collapse
Affiliation(s)
- Zhidong Huang
- The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Jinhui Wang
- The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Bo Sun
- The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Mengyang Qi
- The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Shuang Gao
- The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin, China
- Tianjin's Clinical Research Center for Cancer, Tianjin, China
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China
| | - Hong Liu
- The Second Surgical Department of Breast Cancer, Tianjin Medical University Cancer Institute & Hospital, National Clinical Research Center for Cancer, Tianjin, China.
- Tianjin's Clinical Research Center for Cancer, Tianjin, China.
- Key Laboratory of Breast Cancer Prevention and Therapy, Tianjin Medical University, Ministry of Education, Tianjin, China.
| |
Collapse
|
6
|
Liang Z, Ye H, Ma J, Wei Z, Wang Y, Zhang Y, Huang D, Song B, Meng J, Rigden DJ, Chen K. m6A-Atlas v2.0: updated resources for unraveling the N6-methyladenosine (m6A) epitranscriptome among multiple species. Nucleic Acids Res 2024; 52:D194-D202. [PMID: 37587690 PMCID: PMC10768109 DOI: 10.1093/nar/gkad691] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2023] [Revised: 08/02/2023] [Accepted: 08/10/2023] [Indexed: 08/18/2023] Open
Abstract
N 6-Methyladenosine (m6A) is one of the most abundant internal chemical modifications on eukaryote mRNA and is involved in numerous essential molecular functions and biological processes. To facilitate the study of this important post-transcriptional modification, we present here m6A-Atlas v2.0, an updated version of m6A-Atlas. It was expanded to include a total of 797 091 reliable m6A sites from 13 high-resolution technologies and two single-cell m6A profiles. Additionally, three methods (exomePeaks2, MACS2 and TRESS) were used to identify >16 million m6A enrichment peaks from 2712 MeRIP-seq experiments covering 651 conditions in 42 species. Quality control results of MeRIP-seq samples were also provided to help users to select reliable peaks. We also estimated the condition-specific quantitative m6A profiles (i.e. differential methylation) under 172 experimental conditions for 19 species. Further, to provide insights into potential functional circuitry, the m6A epitranscriptomics were annotated with various genomic features, interactions with RNA-binding proteins and microRNA, potentially linked splicing events and single nucleotide polymorphisms. The collected m6A sites and their functional annotations can be freely queried and downloaded via a user-friendly graphical interface at: http://rnamd.org/m6a.
Collapse
Affiliation(s)
- Zhanmin Liang
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian 350004, China
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Haokai Ye
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, Liverpool, UK
| | - Jiongming Ma
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian 350004, China
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, Liverpool, UK
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Life Course and Medical Sciences, University of Liverpool, Liverpool L69 7ZB, UK
| | - Yue Wang
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Department of Computer Science, University of Liverpool, Liverpool L69 7ZB, UK
| | - Yuxin Zhang
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian 350004, China
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, Liverpool, UK
| | - Daiyun Huang
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Department of Computer Science, University of Liverpool, Liverpool L69 7ZB, UK
| | - Bowen Song
- Department of Public Health, School of Medicine & Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing 210023, China
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, Liverpool, UK
| | - Daniel J Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, Liverpool, UK
| | - Kunqi Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian 350004, China
| |
Collapse
|
7
|
Hassan D, Ariyur A, Daulatabad SV, Mir Q, Janga SC. Nm-Nano: a machine learning framework for transcriptome-wide single-molecule mapping of 2´-O-methylation (Nm) sites in nanopore direct RNA sequencing datasets. RNA Biol 2024; 21:1-15. [PMID: 38758523 PMCID: PMC11110688 DOI: 10.1080/15476286.2024.2352192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/01/2024] [Indexed: 05/18/2024] Open
Abstract
2´-O-methylation (Nm) is one of the most abundant modifications found in both mRNAs and noncoding RNAs. It contributes to many biological processes, such as the normal functioning of tRNA, the protection of mRNA against degradation by the decapping and exoribonuclease (DXO) protein, and the biogenesis and specificity of rRNA. Recent advancements in single-molecule sequencing techniques for long read RNA sequencing data offered by Oxford Nanopore technologies have enabled the direct detection of RNA modifications from sequencing data. In this study, we propose a bio-computational framework, Nm-Nano, for predicting the presence of Nm sites in direct RNA sequencing data generated from two human cell lines. The Nm-Nano framework integrates two supervised machine learning (ML) models for predicting Nm sites: Extreme Gradient Boosting (XGBoost) and Random Forest (RF) with K-mer embedding. Evaluation on benchmark datasets from direct RNA sequecing of HeLa and HEK293 cell lines, demonstrates high accuracy (99% with XGBoost and 92% with RF) in identifying Nm sites. Deploying Nm-Nano on HeLa and HEK293 cell lines reveals genes that are frequently modified with Nm. In HeLa cell lines, 125 genes are identified as frequently Nm-modified, showing enrichment in 30 ontologies related to immune response and cellular processes. In HEK293 cell lines, 61 genes are identified as frequently Nm-modified, with enrichment in processes like glycolysis and protein localization. These findings underscore the diverse regulatory roles of Nm modifications in metabolic pathways, protein degradation, and cellular processes. The source code of Nm-Nano can be freely accessed at https://github.com/Janga-Lab/Nm-Nano.
Collapse
Affiliation(s)
- Doaa Hassan
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
- Computers and Systems Department, National Telecommunication Institute, Cairo, Egypt
| | - Aditya Ariyur
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
| | - Swapna Vidhur Daulatabad
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
| | - Quoseena Mir
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
| | - Sarath Chandra Janga
- Department of Biohealth Informatics, Luddy School of Informatics, Computing, and Engineering, Indiana University Indianapolis (IUI), Indianapolis, Indiana, USA
- Centre for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana
| |
Collapse
|
8
|
Lai FL, Gao F. LSA-ac4C: A hybrid neural network incorporating double-layer LSTM and self-attention mechanism for the prediction of N4-acetylcytidine sites in human mRNA. Int J Biol Macromol 2023; 253:126837. [PMID: 37709212 DOI: 10.1016/j.ijbiomac.2023.126837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 08/08/2023] [Accepted: 09/01/2023] [Indexed: 09/16/2023]
Abstract
N4-acetylcytidine (ac4C) is a vital constituent of the epitranscriptome and plays a crucial role in the regulation of mRNA expression. Numerous studies have established correlations between ac4C and the incidence, progression and prognosis of various cancers. Therefore, accurately predicting ac4C sites is an important step towards comprehending the biological functions of this modification and devising effective therapeutic interventions. Wet experiments are primary methods for studying ac4C, but computational methods have emerged as a promising supplement due to their cost-effectiveness and shorter research cycles. However, current models still have inherent limitations in terms of predictive performance and generalization ability. Here, we utilized automated machine learning technology to establish a reliable baseline and constructed a deep hybrid neural network, LSA-ac4C, which combines double-layer Long Short-Term Memory (LSTM) and self-attention mechanism for accurate ac4C sites prediction. Benchmarking comparisons demonstrate that LSA-ac4C exhibits superior performance compared to the current state-of-the-art method, with ACC, MCC and AUROC improving by 2.89 %, 5.96 % and 1.53 %, respectively, on an independent test set. Overall, LSA-ac4C serves as a powerful tool for predicting ac4C sites in human mRNA, thus benefiting research on RNA modification. For the convenience of the research community, a web server has been established at http://tubic.org/ac4C.
Collapse
Affiliation(s)
- Fei-Liao Lai
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China
| | - Feng Gao
- Department of Physics, School of Science, Tianjin University, Tianjin 300072, China; Frontiers Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin 300072, China; SynBio Research Platform, Collaborative Innovation Center of Chemical Science and Engineering (Tianjin), Tianjin 300072, China.
| |
Collapse
|
9
|
Ren J, Chen X, Zhang Z, Shi H, Wu S. DPred_3S: identifying dihydrouridine (D) modification on three species epitranscriptome based on multiple sequence-derived features. Front Genet 2023; 14:1334132. [PMID: 38169665 PMCID: PMC10758487 DOI: 10.3389/fgene.2023.1334132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 11/29/2023] [Indexed: 01/05/2024] Open
Abstract
Introduction: Dihydrouridine (D) is a conserved modification of tRNA among all three life domains. D modification enhances the flexibility of a single nucleotide base in the spatial structure and is disease- and evolution-associated. Recent studies have also suggested the presence of dihydrouridine on mRNA. Methods: To identify D in epitranscriptome, we provided a prediction framework named "DPred_3S" based on the machine learning approach for three species D epitranscriptome, which used epitranscriptome sequencing data as training data for the first time. Results: The optimal features were evaluated by the F-score and integration of different features; our model achieved area under the receiver operating characteristic curve (AUROC) scores 0.955, 0.946, and 0.905 for Saccharomyces cerevisiae, Escherichia coli, and Schizosaccharomyces pombe, respectively. The performances of different machine learning algorithms were also compared in this study. Discussion: The high performances of our model suggest the D sites can be distinguished based on their surrounding sequence, but the lower performance of cross-species prediction may be limited by technique preferences.
Collapse
Affiliation(s)
- Jinjin Ren
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
- Fujian Key Laboratory of Tumor Microbiology, Department of Medical Microbiology, Fujian Medical University, Fuzhou, Fujian, China
| | - Xiaozhen Chen
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
| | - Zhengqian Zhang
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
| | - Haoran Shi
- Institute of Applied Microbiology, Research Center for BioSystems, Land Use, and Nutrition (IFZ), Justus-Liebig-University Giessen, Giessen, Germany
| | - Shuxiang Wu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou, Fujian, China
- Fujian Key Laboratory of Tumor Microbiology, Department of Medical Microbiology, Fujian Medical University, Fuzhou, Fujian, China
| |
Collapse
|
10
|
Aslam I, Shah S, Jabeen S, ELAffendi M, A Abdel Latif A, Ul Haq N, Ali G. A CNN based m5c RNA methylation predictor. Sci Rep 2023; 13:21885. [PMID: 38081880 PMCID: PMC10713599 DOI: 10.1038/s41598-023-48751-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 11/29/2023] [Indexed: 12/18/2023] Open
Abstract
Post-transcriptional modifications of RNA play a key role in performing a variety of biological processes, such as stability and immune tolerance, RNA splicing, protein translation and RNA degradation. One of these RNA modifications is m5c which participates in various cellular functions like RNA structural stability and translation efficiency, got popularity among biologists. By applying biological experiments to detect RNA m5c methylation sites would require much more efforts, time and money. Most of the researchers are using pre-processed RNA sequences of 41 nucleotides where the methylated cytosine is in the center. Therefore, it is possible that some of the information around these motif may have lost. The conventional methods are unable to process the RNA sequence directly due to high dimensionality and thus need optimized techniques for better features extraction. To handle the above challenges the goal of this study is to employ an end-to-end, 1D CNN based model to classify and interpret m5c methylated data sites. Moreover, our aim is to analyze the sequence in its full length where the methylated cytosine may not be in the center. The evaluation of the proposed architecture showed a promising results by outperforming state-of-the-art techniques in terms of sensitivity and accuracy. Our model achieve 96.70% sensitivity and 96.21% accuracy for 41 nucleotides sequences while 96.10% accuracy for full length sequences.
Collapse
Affiliation(s)
- Irum Aslam
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Sajid Shah
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Saima Jabeen
- College of Engineering, AI Research Center, Alfaisal University, Riyadh, 50927, Saudi Arabia.
| | - Mohammed ELAffendi
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| | - Asmaa A Abdel Latif
- Public Health and Community Medicine Department (Industrial medicine and occupational health specialty, Faculty of Medicine, Menoufia University, Shibîn el Kôm, Egypt
| | - Nuhman Ul Haq
- Department of Computer Science, COMSATS University Islamabad, Abbottabad Campus, Abbottabad, 22060, KPK, Pakistan
| | - Gauhar Ali
- EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Rafha, Riyadh, 12435, Saudi Arabia
| |
Collapse
|
11
|
Yang Y, Liu Z, Lu J, Sun Y, Fu Y, Pan M, Xie X, Ge Q. Analysis approaches for the identification and prediction of N6-methyladenosine sites. Epigenetics 2023; 18:2158284. [PMID: 36562485 PMCID: PMC9980620 DOI: 10.1080/15592294.2022.2158284] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The global dynamics in a variety of biological processes can be revealed by mapping transcriptional m6A sites, in particular full-transcriptome m6A. And individual m6A sites have contributed to biological function, which can be evaluated by stoichiometric information obtained from the single nucleotide resolution. Currently, the identification of m6A sites is mainly carried out by experiment and prediction methods, based on high-throughput sequencing and machine learning model respectively. This review summarizes the recent topics and progress made in bioinformatics methods of deciphering the m6A methylation, including the experimental detection of m6A methylation sites, techniques of data analysis, the way of predicting m6A methylation sites, m6A methylation databases, and detection of m6A modification in circRNA. At the end, the essay makes a brief discussion for the development perspective in this area.
Collapse
Affiliation(s)
- Yuwei Yang
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Zhiyu Liu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Junru Lu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yuqing Sun
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Yue Fu
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Min Pan
- Department of Pathology and Pathophysiology School of Medicine, Southeast University, Nanjing, China
| | - Xueying Xie
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| | - Qinyu Ge
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, People's Republic of China
| |
Collapse
|
12
|
Jia J, Wei Z, Cao X. EMDL-ac4C: identifying N4-acetylcytidine based on ensemble two-branch residual connection DenseNet and attention. Front Genet 2023; 14:1232038. [PMID: 37519885 PMCID: PMC10372626 DOI: 10.3389/fgene.2023.1232038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 06/29/2023] [Indexed: 08/01/2023] Open
Abstract
Introduction: N4-acetylcytidine (ac4C) is a critical acetylation modification that has an essential function in protein translation and is associated with a number of human diseases. Methods: The process of identifying ac4C sites by biological experiments is too cumbersome and costly. And the performance of several existing computational models needs to be improved. Therefore, we propose a new deep learning tool EMDL-ac4C to predict ac4C sites, which uses a simple one-hot encoding for a unbalanced dataset using a downsampled ensemble deep learning network to extract important features to identify ac4C sites. The base learner of this ensemble model consists of a modified DenseNet and Squeeze-and-Excitation Networks. In addition, we innovatively add a convolutional residual structure in parallel with the dense block to achieve the effect of two-layer feature extraction. Results: The average accuracy (Acc), mathews correlation coefficient (MCC), and area under the curve Area under curve of EMDL-ac4C on ten independent testing sets are 80.84%, 61.77%, and 87.94%, respectively. Discussion: Multiple experimental comparisons indicate that EMDL-ac4C outperforms existing predictors and it greatly improved the predictive performance of the ac4C sites. At the same time, EMDL-ac4C could provide a valuable reference for the next part of the study. The source code and experimental data are available at: https://github.com/13133989982/EMDLac4C.
Collapse
Affiliation(s)
- Jianhua Jia
- *Correspondence: Jianhua Jia, ; Zhangying Wei,
| | | | | |
Collapse
|
13
|
Li Y, Ren J, Zhang Z, Weng Y, Zhang J, Zou X, Wu S, Hu H. Modification and Expression of mRNA m6A in the Lateral Habenular of Rats after Long-Term Exposure to Blue Light during the Sleep Period. Genes (Basel) 2023; 14:143. [PMID: 36672884 PMCID: PMC9859551 DOI: 10.3390/genes14010143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Revised: 12/26/2022] [Accepted: 12/30/2022] [Indexed: 01/06/2023] Open
Abstract
Artificial lighting, especially blue light, is becoming a public-health risk. Excessive exposure to blue light at night has been reported to be associated with brain diseases. However, the mechanisms underlying neuropathy induced by blue light remain unclear. An early anatomical tracing study described the projection of the retina to the lateral habenula (LHb), whereas more mechanistic reports are available on multiple brain functions and neuropsychiatric disorders in the LHb, which are rarely seen in epigenetic studies, particularly N6-methyladenosine (m6A). The purpose of our study was to first expose Sprague-Dawley rats to blue light (6.11 ± 0.05 mW/cm2, the same irradiance as 200 lx of white light in the control group) for 4 h, and simultaneously provide white light to the control group for the same time to enter a sleep period. The experiment was conducted over 12 weeks. RNA m6A modifications and different mRNA transcriptome profiles were observed in the LHb. We refer to this experimental group as BLS. High-throughput MeRIP-seq and mRNA-seq were performed, and we used bioinformatics to analyze the data. There were 188 genes in the LHb that overlapped between differentially m6A-modified mRNA and differentially expressed mRNA. The Kyoto Encyclopedia of Genes and Genomes and gene ontology analysis were used to enrich neuroactive ligand-receptor interaction, long-term depression, the cyclic guanosine monophosphate-dependent protein kinase G (cGMP-PKG) signaling pathway, and circadian entrainment. The m6A methylation level of the target genes in the BLS group was disordered. In conclusion, this study suggests that the mRNA expression and their m6A of the LHb were abnormal after blue light exposure during the sleep period, and the methylation levels of target genes related to synaptic plasticity were disturbed. This study offers a theoretical basis for the scientific use of light.
Collapse
Affiliation(s)
- Yinhan Li
- Fujian Key Laboratory of Environmental Factors and Cancer, School of Public Health, Fujian Medical University, Fuzhou 350108, China
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou 350108, China
- Key Laboratory of Environment and Health, School of Public Health, Fujian Medical University, Fuzhou 350108, China
| | - Jinjin Ren
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350108, China
| | - Zhaoting Zhang
- School of Public Health, Fujian Medical University, Fuzhou 350108, China
| | - Yali Weng
- School of Public Health, Fujian Medical University, Fuzhou 350108, China
| | - Jian Zhang
- School of Public Health, Fujian Medical University, Fuzhou 350108, China
| | - Xinhui Zou
- School of Public Health, Fujian Medical University, Fuzhou 350108, China
| | - Siying Wu
- Fujian Key Laboratory of Environmental Factors and Cancer, School of Public Health, Fujian Medical University, Fuzhou 350108, China
- Department of Epidemiology and Health Statistics, School of Public Health, Fujian Medical University, Fuzhou 350108, China
- Key Laboratory of Environment and Health, School of Public Health, Fujian Medical University, Fuzhou 350108, China
| | - Hong Hu
- Fujian Key Laboratory of Environmental Factors and Cancer, School of Public Health, Fujian Medical University, Fuzhou 350108, China
- Key Laboratory of Environment and Health, School of Public Health, Fujian Medical University, Fuzhou 350108, China
- Department of Preventive Medicine, School of Public Health, Fujian Medical University, Fuzhou 350108, China
| |
Collapse
|
14
|
Integrated Profiles of Transcriptome and mRNA m6A Modification Reveal the Intestinal Cytotoxicity of Aflatoxin B1 on HCT116 Cells. Genes (Basel) 2022; 14:genes14010079. [PMID: 36672820 PMCID: PMC9858580 DOI: 10.3390/genes14010079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 12/11/2022] [Accepted: 12/19/2022] [Indexed: 12/28/2022] Open
Abstract
Aflatoxin B1 (AFB1) is widely prevalent in foods and animal feeds and is one of the most toxic and carcinogenic aflatoxin subtypes. Existing studies have proved that the intestine is targeted by AFB1, and adverse organic effects have been observed. This study aimed to investigate the relationship between AFB1-induced intestinal toxicity and N6-methyladenosine (m6A) RNA methylation, which involves the post-transcriptional regulation of mRNA expression. The transcriptome-wide m6A methylome and transcriptome profiles in human intestinal cells treated with AFB1 are presented. Methylated RNA immunoprecipitation sequencing and mRNA sequencing were carried out to determine the distinctions in m6A methylation and different genes expressed in AFB1-induced intestinal toxicity. The results showed that there were 2289 overlapping genes of the differentially expressed mRNAs and differentially m6A-methylation-modified mRNAs. After enrichment of the signaling pathways and biological processes, these genes participated in the terms of the cell cycle, endoplasmic reticulum, tight junction, and mitophagy. In conclusion, the study demonstrated that AFB1-induced HCT116 injury was related to the disruptions to the levels of m6A methylation modifications of target genes and the abnormal expression of m6A regulators.
Collapse
|
15
|
Cai L, Liao Z, Li S, Wu R, Li J, Ren F, Zhang H. PLP1 may serve as a potential diagnostic biomarker of uterine fibroids. Front Genet 2022; 13:1045395. [PMID: 36386836 PMCID: PMC9662689 DOI: 10.3389/fgene.2022.1045395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 10/14/2022] [Indexed: 12/05/2022] Open
Abstract
Objective: We aim to identify the crucial genes or potential biomarkers associated with uterine fibroids (UFs), which may provide clinicians with evidence about the diagnostic biomarker of UFs and reveal the mechanism of its progression. Methods: The gene expression and genome-wide DNA methylation profiles were obtained from Gene Expression Omnibus database (GEO). GSE45189, GSE31699, and GSE593 datasets were included. GEO2R and Venn diagrams were used to analyze the differentially expressed genes (DEGs) and extract the hub genes. Gene Ontology (GO) analysis was performed by the online tool Database for Annotation, Visualization, and Integrated Discovery (DAVID). The mRNA and protein expression of hub genes were validated by RT-qPCR, western blot, and immunohistochemistry. The receiver operating characteristic (ROC) curve was used to evaluate the diagnostic value. Results: We detected 22 DEGs between UFs and normal myometrium, which were enriched in cell maturation, apoptotic process, hypoxia, protein binding, and cytoplasm for cell composition. By finding the intersection of the data between differentially expressed mRNA and DNA methylation profiles, 3 hub genes were identified, including transmembrane 4 L six family member 1 (TM4SF1), TNF superfamily member 10 (TNFSF10), and proteolipid protein 1 (PLP1). PLP1 was validated to be up-regulated significantly in UFs both at mRNA and protein levels. The area under the ROC curve (AUC) of PLP1 was 0.956, with a sensitivity of 79.2% and a specificity of 100%. Conclusion: Overall, our results indicate that PLP1 may be a potential diagnostic biomarker for uterine fibroids.
Collapse
Affiliation(s)
- Lei Cai
- Reproductive Medicine Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhiqi Liao
- Reproductive Medicine Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shiyu Li
- Institute of Digestive Disease and Department of Medicine and Therapeutics, State Key Laboratory of Digestive Diseases, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Hong Kong, China
| | - Ruxing Wu
- Reproductive Medicine Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Jie Li
- Reproductive Medicine Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Fang Ren
- Department of Gynecology, First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
- *Correspondence: Hanwang Zhang, ; Fang Ren,
| | - Hanwang Zhang
- Reproductive Medicine Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- *Correspondence: Hanwang Zhang, ; Fang Ren,
| |
Collapse
|