1
|
Huang G, Huang X, Jiang J. Deepm6A-MT: A deep learning-based method for identifying RNA N6-methyladenosine sites in multiple tissues. Methods 2024; 226:1-8. [PMID: 38485031 DOI: 10.1016/j.ymeth.2024.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/20/2024] [Accepted: 03/11/2024] [Indexed: 04/13/2024] Open
Abstract
N6-methyladenosine (m6A) is the most prevalent, abundant, and conserved internal modification in the eukaryotic messenger RNA (mRNAs) and plays a crucial role in the cellular process. Although more than ten methods were developed for m6A detection over the past decades, there were rooms left to improve the predictive accuracy and the efficiency. In this paper, we proposed an improved method for predicting m6A modification sites, which was based on bi-directional gated recurrent unit (Bi-GRU) and convolutional neural networks (CNN), called Deepm6A-MT. The Deepm6A-MT has two input channels. One is to use an embedding layer followed by the Bi-GRU and then by the CNN, and another is to use one-hot encoding, dinucleotide one-hot encoding, and nucleotide chemical property codes. We trained and evaluated the Deepm6A-MT both by the 5-fold cross-validation and the independent test. The empirical tests showed that the Deepm6A-MT achieved the state of the art performance. In addition, we also conducted the cross-species and the cross-tissues tests to further verify the Deepm6A-MT for effectiveness and efficiency. Finally, for the convenience of academic research, we deployed the Deepm6A-MT to the web server, which is accessed at the URL http://www.biolscience.cn/Deepm6A-MT/.
Collapse
Affiliation(s)
- Guohua Huang
- School of Information Technology and Administration, Hunan University of Finance and Economics, Changsha, Hunan 410205, China.
| | - Xiaohong Huang
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Jinyun Jiang
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| |
Collapse
|
2
|
He T, Gao Z, Lin L, Zhang X, Zou Q. Prognostic signature analysis and survival prediction of esophageal cancer based on N6-methyladenosine associated lncRNAs. Brief Funct Genomics 2024; 23:239-248. [PMID: 37465899 DOI: 10.1093/bfgp/elad028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 06/27/2023] [Accepted: 07/04/2023] [Indexed: 07/20/2023] Open
Abstract
Esophageal cancer (ESCA) has a bad prognosis. Long non-coding RNA (lncRNA) impacts on cell proliferation. However, the prognosis function of N6-methyladenosine (m6A)-associated lncRNAs (m6A-lncRNAs) in ESCA remains unknown. Univariate Cox analysis was applied to investigate prognosis related m6A-lncRNAs, based on which the samples were clustered. Wilcoxon rank and Chi-square tests were adopted to compare the clinical traits, survival, pathway activity and immune infiltration in different clusters where overall survival, clinical traits (N stage), tumor-invasive immune cells and pathway activity were found significantly different. Through least absolute shrinkage and selection operator and proportional hazard (Lasso-Cox) model, five m6A-lncRNAs were selected to construct the prognostic signature (m6A-lncSig) and risk score. To investigate the link between risk score and clinical traits or immunological microenvironments, Chi-square test and Spearman correlation analysis were utilized. Risk score was found connected with N stage, tumor stage, different clusters, macrophages M2, B cells naive and T cells CD4 memory resting. Risk score and tumor stage were found as independent prognostic variables. And the constructed nomogram model had high accuracy in predicting prognosis. The obtained m6A-lncSig could be taken as potential prognostic biomarker for ESCA patients. This study offers a theoretical foundation for clinical diagnosis and prognosis of ESCA.
Collapse
Affiliation(s)
- Ting He
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| | - Zhipeng Gao
- Beidahuang Industry Group General Hospital, Harbin 150000, China
| | - Ling Lin
- Yucai School Attached to Sichuan Chengdu No. 7 High School, Chengdu 610503, China
| | - Xu Zhang
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China
- School of Mathematics and Statistics, Southwest University, Chongqing 400715, China
| |
Collapse
|
3
|
Lin L, Long Y, Liu J, Deng D, Yuan Y, Liu L, Tan B, Qi H. FRP-XGBoost: Identification of ferroptosis-related proteins based on multi-view features. Int J Biol Macromol 2024; 262:130180. [PMID: 38360239 DOI: 10.1016/j.ijbiomac.2024.130180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 02/17/2024]
Abstract
Ferroptosis represents a novel form of programmed cell death. Pan-cancer bioinformatics analysis indicates that identifying and modulating ferroptosis offer innovative approaches for preventing and treating diverse tumor pathologies. However, the precise detection of ferroptosis-related proteins via conventional wet-laboratory techniques remains a formidable challenge, largely due to the constraints of existing methodologies. These traditional approaches are not only labor-intensive but also financially burdensome. Consequently, there is an imperative need for the development of more sophisticated and efficient computational tools to facilitate the detection of these proteins. In this paper, we presented a XGBoost and multi-view features-based machine learning prediction method for predicting ferroptosis-related proteins, which was referred to as FRP-XGBoost. In this study, we explored four types of protein feature extraction methods and evaluated their effectiveness in predicting ferroptosis-related proteins using six of the most commonly used traditional classifiers. To enhance the representational power of the hybrid features, we employed a two-step feature selection technique to identify the optimal subset of features. Subsequently, we constructed a prediction model using the XGBoost algorithm. The FRP-XGBoost achieved an accuracy of 96.74 % in 10-fold cross-validation and a further accuracy of 91.52 % in an independent test. The implementation source code of FRP-XGBoost is available at https://github.com/linli5417/FRP-XGBoost.
Collapse
Affiliation(s)
- Li Lin
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Yao Long
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Jinkai Liu
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Dongliang Deng
- Department of Oncology, Chongqing Traditional Chinese Medicine Hospital, Chongqing 400021, China
| | - Yu Yuan
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Lubin Liu
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Bin Tan
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
| | - Hongbo Qi
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China; Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China.
| |
Collapse
|
4
|
Ju H, Bai J, Jiang J, Che Y, Chen X. Comparative evaluation and analysis of DNA N4-methylcytosine methylation sites using deep learning. Front Genet 2023; 14:1254827. [PMID: 37671040 PMCID: PMC10476523 DOI: 10.3389/fgene.2023.1254827] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 07/31/2023] [Indexed: 09/07/2023] Open
Abstract
DNA N4-methylcytosine (4mC) is significantly involved in biological processes, such as DNA expression, repair, and replication. Therefore, accurate prediction methods are urgently needed. Deep learning methods have transformed applications that previously require sequencing expertise into engineering challenges that do not require expertise to solve. Here, we compare a variety of state-of-the-art deep learning models on six benchmark datasets to evaluate their performance in 4mC methylation site detection. We visualize the statistical analysis of the datasets and the performance of different deep-learning models. We conclude that deep learning can greatly expand the potential of methylation site prediction.
Collapse
Affiliation(s)
- Hong Ju
- Heilongjiang Agricultural Engineering Vocational College, Harbin, China
| | - Jie Bai
- Engineering Research Center of Integration and Application of Digital Learning Technology, Ministry of Education, Hangzhou, China
| | - Jing Jiang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yusheng Che
- Heilongjiang Agricultural Engineering Vocational College, Harbin, China
| | - Xin Chen
- Department of Neurosurgical Laboratory, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| |
Collapse
|
5
|
Zhang Y, Zhan L, Li J, Jiang X, Yin L. Insights into N6-methyladenosine (m6A) modification of noncoding RNA in tumor microenvironment. Aging (Albany NY) 2023; 15:3857-3889. [PMID: 37178254 PMCID: PMC10449301 DOI: 10.18632/aging.204679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 04/15/2023] [Indexed: 05/15/2023]
Abstract
N6-methyladenosine (m6A) is the most abundant RNA modification in eukaryotes, and it participates in the regulation of pathophysiological processes in various diseases, including malignant tumors, by regulating the expression and function of both coding and non-coding RNAs (ncRNAs). More and more studies demonstrated that m6A modification regulates the production, stability, and degradation of ncRNAs and that ncRNAs also regulate the expression of m6A-related proteins. Tumor microenvironment (TME) refers to the internal and external environment of tumor cells, which is composed of numerous tumor stromal cells, immune cells, immune factors, and inflammatory factors that are closely related to tumors occurrence and development. Recent studies have suggested that crosstalk between m6A modifications and ncRNAs plays an important role in the biological regulation of TME. In this review, we summarized and analyzed the effects of m6A modification-associated ncRNAs on TME from various perspectives, including tumor proliferation, angiogenesis, invasion and metastasis, and immune escape. Herein, we showed that m6A-related ncRNAs can not only be expected to become detection markers of tumor tissue samples, but can also be wrapped into exosomes and secreted into body fluids, thus exhibiting potential as markers for liquid biopsy. This review provides a deeper understanding of the relationship between m6A-related ncRNAs and TME, which is of great significance to the development of a new strategy for precise tumor therapy.
Collapse
Affiliation(s)
- YanJun Zhang
- College of Pharmacy and Traditional Chinese Medicine, Jiangsu College of Nursing, Huaian, Jiangsu 223005, China
| | - Lijuan Zhan
- College of Pharmacy and Traditional Chinese Medicine, Jiangsu College of Nursing, Huaian, Jiangsu 223005, China
| | - Jing Li
- College of Pharmacy and Traditional Chinese Medicine, Jiangsu College of Nursing, Huaian, Jiangsu 223005, China
| | - Xue Jiang
- College of Pharmacy and Traditional Chinese Medicine, Jiangsu College of Nursing, Huaian, Jiangsu 223005, China
| | - Li Yin
- Department of Biopharmaceutics, Yulin Normal University, Guangxi, Yulin 537000, China
- Bioengineering and Technology Center for Native Medicinal Resources Development, Yulin Normal University, Yulin 537000, China
| |
Collapse
|
6
|
Ao C, Jiao S, Wang Y, Yu L, Zou Q. Biological Sequence Classification: A Review on Data and General Methods. RESEARCH (WASHINGTON, D.C.) 2022; 2022:0011. [PMID: 39285948 PMCID: PMC11404319 DOI: 10.34133/research.0011] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 10/25/2022] [Indexed: 09/19/2024]
Abstract
With the rapid development of biotechnology, the number of biological sequences has grown exponentially. The continuous expansion of biological sequence data promotes the application of machine learning in biological sequences to construct predictive models for mining biological sequence information. There are many branches of biological sequence classification research. In this review, we mainly focus on the function and modification classification of biological sequences based on machine learning. Sequence-based prediction and analysis are the basic tasks to understand the biological functions of DNA, RNA, proteins, and peptides. However, there are hundreds of classification models developed for biological sequences, and the quite varied specific methods seem dizzying at first glance. Here, we aim to establish a long-term support website (http://lab.malab.cn/~acy/BioseqData/home.html), which provides readers with detailed information on the classification method and download links to relevant datasets. We briefly introduce the steps to build an effective model framework for biological sequence data. In addition, a brief introduction to single-cell sequencing data analysis methods and applications in biology is also included. Finally, we discuss the current challenges and future perspectives of biological sequence classification research.
Collapse
Affiliation(s)
- Chunyan Ao
- School of Computer Science and Technology, Xidian University, Xi'an, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Shihu Jiao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
7
|
Wang H, Zhao S, Cheng Y, Bi S, Zhu X. MTDeepM6A-2S: A two-stage multi-task deep learning method for predicting RNA N6-methyladenosine sites of Saccharomyces cerevisiae. Front Microbiol 2022; 13:999506. [PMID: 36274691 PMCID: PMC9579691 DOI: 10.3389/fmicb.2022.999506] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/16/2022] [Indexed: 11/13/2022] Open
Abstract
N6-methyladenosine (m6A) is one of the most important RNA modifications, which is involved in many biological activities. Computational methods have been developed to detect m6A sites due to their high efficiency and low costs. As one of the most widely utilized model organisms, many methods have been developed for predicting m6A sites of Saccharomyces cerevisiae. However, the generalization of these methods was hampered by the limited size of the benchmark datasets. On the other hand, over 60,000 low resolution m6A sites and more than 10,000 base resolution m6A sites of Saccharomyces cerevisiae are recorded in RMBase and m6A-Atlas, respectively. The base resolution m6A sites are often obtained from low resolution results by post calibration. In view of these, we proposed a two-stage deep learning method, named MTDeepM6A-2S, to predict RNA m6A sites of Saccharomyces cerevisiae based on RNA sequence information. In the first stage, a multi-task model with convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) deep framework was built to not only detect the low resolution m6A sites but also assign a reasonable probability for the predicted site. In the second stage, a transfer-learning strategy was used to build the model to predict the base resolution m6A sites from those low resolution m6A sites. The effectiveness of our model was validated on both training and independent test sets. The results show that our model outperforms other state-of-the-art models on the independent test set, which indicates that our model holds high potential to become a useful tool for epitranscriptomics analysis.
Collapse
|
8
|
N(6)-methyladenosine modification: A vital role of programmed cell death in myocardial ischemia/reperfusion injury. Int J Cardiol 2022; 367:11-19. [PMID: 36002042 DOI: 10.1016/j.ijcard.2022.08.042] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 07/08/2022] [Accepted: 08/19/2022] [Indexed: 11/20/2022]
Abstract
N(6)-methyladenosine (m6A) modification is closely associated with myocardial ischemia/reperfusion injury (MIRI). As the most common modification among RNA modifications, the reversible m6A modification is processed by methylase ("writers") and demethylase ("erasers"). The biological effects of RNA modified by m6A are regulated under the corresponding RNA binding proteins (RBPs) ("readers"). m6A modification regulates the whole process of RNA, including transcription, processing, splicing, nuclear export, stability, degradation, and translation. Programmed cell death (PCD) is a regulated mechanism that maintains the internal environment's stability. PCD plays an essential role in MIRI, including apoptosis, autophagy, pyroptosis, ferroptosis, and necroptosis. However, the relationship between PCD modified with m6A and MIRI is still not clear. This review summarizes the regulators of m6A modification and their bioeffects on PCD in MIRI.
Collapse
|
9
|
Ma L, He LN, Kang S, Gu B, Gao S, Zuo Z. Advances in detecting N6-methyladenosine modification in circRNAs. Methods 2022; 205:234-246. [PMID: 35878749 DOI: 10.1016/j.ymeth.2022.07.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/15/2022] [Accepted: 07/18/2022] [Indexed: 12/14/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of noncoding RNAs with covalently single-stranded closed loop structures derived from back-splicing event of linear precursor mRNAs (pre-mRNAs). N6-methyladenosine (m6A), the most abundant epigenetic modification in eukaryotic RNAs, has been shown to play a crucial role in regulating the fate and biological function of circRNAs, and thus affecting various physiological and pathological processes. Accurate identification of m6A modification in circRNAs is an essential step to fully elucidate the crosstalk between m6A and circRNAs. In recent years, the rapid development of high-throughput sequencing technology and bioinformatic methodology has propelled the establishment of a multitude of approaches to detect circRNAs and m6A modification, including in vitro-based and in silico methods. Based on this, the research community has started on a new journey to develop methods for identification of m6A modification in circRNAs. In this review, we provide a comprehensive review and evaluation of the existing methods responsible for detecting circRNAs, m6A modification, and especially, m6A modification in circRNAs, which mainly focused on those developed based on high-throughput technologies and methodology of bioinformatics. This handy reference can help researchers figure out towards which direction this field will go.
Collapse
Affiliation(s)
- Lixia Ma
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Li-Na He
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Shiyang Kang
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
| | - Bianli Gu
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China
| | - Shegan Gao
- State Key Laboratory of Esophageal Cancer Prevention & Treatment, Henan Key Laboratory of Microbiome and Esophageal Cancer Prevention and Treatment, Henan Key Laboratory of Cancer Epigenetics, Cancer Hospital, The First Affiliated Hospital (College of Clinical Medical) of Henan University of Science and Technology, Luoyang, China.
| | - Zhixiang Zuo
- Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China.
| |
Collapse
|
10
|
Li F, Yin J, Lu M, Yang Q, Zeng Z, Zhang B, Li Z, Qiu Y, Dai H, Chen Y, Zhu F. ConSIG: consistent discovery of molecular signature from OMIC data. Brief Bioinform 2022; 23:6618243. [PMID: 35758241 DOI: 10.1093/bib/bbac253] [Citation(s) in RCA: 49] [Impact Index Per Article: 24.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/09/2022] [Accepted: 05/31/2022] [Indexed: 12/12/2022] Open
Abstract
The discovery of proper molecular signature from OMIC data is indispensable for determining biological state, physiological condition, disease etiology, and therapeutic response. However, the identified signature is reported to be highly inconsistent, and there is little overlap among the signatures identified from different biological datasets. Such inconsistency raises doubts about the reliability of reported signatures and significantly hampers its biological and clinical applications. Herein, an online tool, ConSIG, was constructed to realize consistent discovery of gene/protein signature from any uploaded transcriptomic/proteomic data. This tool is unique in a) integrating a novel strategy capable of significantly enhancing the consistency of signature discovery, b) determining the optimal signature by collective assessment, and c) confirming the biological relevance by enriching the disease/gene ontology. With the increasingly accumulated concerns about signature consistency and biological relevance, this online tool is expected to be used as an essential complement to other existing tools for OMIC-based signature discovery. ConSIG is freely accessible to all users without login requirement at https://idrblab.org/consig/.
Collapse
Affiliation(s)
- Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Jiayi Yin
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Mingkun Lu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Qingxia Yang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, 79 QingChun Road, Hangzhou, Zhejiang 310000, China
| | - Haibin Dai
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China
| | - Yuzong Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Biology, The Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.,Qian Xuesen Collaborative Research Center of Astrochemistry and Space Life Sciences, Institute of Drug Discovery Technology, Ningbo University, Ningbo 315211, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
11
|
CNNLSTMac4CPred: A Hybrid Model for N4-Acetylcytidine Prediction. Interdiscip Sci 2022; 14:439-451. [PMID: 35106702 DOI: 10.1007/s12539-021-00500-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2021] [Revised: 12/04/2021] [Accepted: 12/13/2021] [Indexed: 12/23/2022]
Abstract
N4-Acetylcytidine (ac4C) is a highly conserved post-transcriptional and an extensively existing RNA modification, playing versatile roles in the cellular processes. Due to the limitation of techniques and knowledge, large-scale identification of ac4C is still a challenging task. RNA sequences are like sentences containing semantics in the natural language. Inspired by the semantics of language, we proposed a hybrid model for ac4C prediction. The model used long short-term memory and convolution neural network to extract the semantic features hidden in the sequences. The semantic and the two traditional features (k-nucleotide frequencies and pseudo tri-tuple nucleotide composition) were combined to represent ac4C or non-ac4C sequences. The eXtreme Gradient Boosting was used as the learning algorithm. Five-fold cross-validation over the training set consisting of 1160 ac4C and 10,855 non-ac4C sequences obtained the area under the receiver operating characteristic curve (AUROC) of 0.9004, and the independent test over 469 ac4C and 4343 non-ac4C sequences reached an AUROC of 0.8825. The model obtained a sensitivity of 0.6474 in the five-fold cross-validation and 0.6290 in the independent test, outperforming two state-of-the-art methods. The performance of semantic features alone was better than those of k-nucleotide frequencies and pseudo tri-tuple nucleotide composition, implying that ac4C sequences are of semantics. The proposed hybrid model was implemented into a user-friendly web-server which is freely available to scientific communities: http://47.113.117.61/ac4c/ . The presented model and tool are beneficial to identify ac4C on large scale.
Collapse
|
12
|
Yu B, Zhang Y, Wang X, Gao H, Sun J, Gao X. Identification of DNA modification sites based on elastic net and bidirectional gated recurrent unit with convolutional neural network. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103566] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
13
|
He Z, Xu J, Shi H, Wu S. m5CRegpred: Epitranscriptome Target Prediction of 5-Methylcytosine (m5C) Regulators Based on Sequencing Features. Genes (Basel) 2022; 13:genes13040677. [PMID: 35456483 PMCID: PMC9025882 DOI: 10.3390/genes13040677] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 04/02/2022] [Accepted: 04/05/2022] [Indexed: 02/04/2023] Open
Abstract
5-methylcytosine (m5C) is a common post-transcriptional modification observed in a variety of RNAs. m5C has been demonstrated to be important in a variety of biological processes, including RNA structural stability and metabolism. Driven by the importance of m5C modification, many projects focused on the m5C sites prediction were reported before. To better understand the upstream and downstream regulation of m5C, we present a bioinformatics framework, m5CRegpred, to predict the substrate of m5C writer NSUN2 and m5C readers YBX1 and ALYREF for the first time. After features comparison, window lengths selection and algorism comparison on the mature mRNA model, our model achieved AUROC scores 0.869, 0.724 and 0.889 for NSUN2, YBX1 and ALYREF, respectively in an independent test. Our work suggests the substrate of m5C regulators can be distinguished and may help the research of m5C regulators in a special condition, such as substrates prediction of hyper- or hypo-expressed m5C regulators in human disease.
Collapse
Affiliation(s)
- Zhizhou He
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; (Z.H.); (J.X.)
- Department of Molecular, Cell, and Developmental Biology, University of California, Santa Cruz, Santa Cruz, CA 95064, USA
| | - Jing Xu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; (Z.H.); (J.X.)
| | - Haoran Shi
- Research Center for BioSystems, Land Use, and Nutrition (IFZ), Institute of Applied Microbiology, Justus-Liebig-University Giessen, Heinrich-Buff-Ring 26-32, 35392 Giessen, Germany
- Correspondence: (H.S.); (S.W.)
| | - Shuxiang Wu
- Key Laboratory of Ministry of Education for Gastrointestinal Cancer, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China; (Z.H.); (J.X.)
- Fujian Key Laboratory of Tumor Microbiology, Department of Medical Microbiology, School of Basic Medical Sciences, Fujian Medical University, Fuzhou 350004, China
- Correspondence: (H.S.); (S.W.)
| |
Collapse
|
14
|
Le NQK, Ho QT. Deep transformers and convolutional neural network in identifying DNA N6-methyladenine sites in cross-species genomes. Methods 2021; 204:199-206. [PMID: 34915158 DOI: 10.1016/j.ymeth.2021.12.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 11/30/2021] [Accepted: 12/09/2021] [Indexed: 12/19/2022] Open
Abstract
As one of the most common post-transcriptional epigenetic modifications, N6-methyladenine (6 mA), plays an essential role in various cellular processes and disease pathogenesis. Therefore, accurately identifying 6 mA modifications is necessary for a deep understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models were developed with small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we present a novel model based on transformer architecture and deep learning to identify DNA 6 mA sites from the cross-species genome. The model is constructed on a benchmark dataset and explored a feature derived from pre-trained transformer word embedding approaches. Subsequently, a convolutional neural network was employed to learn the generated features and generate the prediction outcomes. As a result, our predictor achieved excellent performance during independent test with the accuracy and Matthews correlation coefficient (MCC) of 79.3% and 0.58, respectively. Overall, its performance achieved better accuracy than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, our model is expected to assist biologists in accurately identifying 6mAs and formulate the novel testable biological hypothesis. We also release source codes and datasets freely at https://github.com/khanhlee/bert-dna for front-end users.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 106, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan.
| | - Quang-Thai Ho
- College of Information & Communication Technology, Can Tho University, Viet Nam; Department of Computer Science and Engineering, Yuan Ze University, Chung-Li, 32003, Taiwan
| |
Collapse
|
15
|
Qin S, Mao Y, Chen X, Xiao J, Qin Y, Zhao L. The functional roles, cross-talk and clinical implications of m6A modification and circRNA in hepatocellular carcinoma. Int J Biol Sci 2021; 17:3059-3079. [PMID: 34421350 PMCID: PMC8375232 DOI: 10.7150/ijbs.62767] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2021] [Accepted: 07/06/2021] [Indexed: 12/13/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is one of the leading causes of cancer-related deaths worldwide. HCC has high rates of death and recurrence, as well as very low survival rates. N6-methyladenosine (m6A) is the most abundant modification in eukaryotic RNAs, and circRNAs are a class of circular noncoding RNAs that are generated by back-splicing and they modulate multiple functions in a variety of cellular processes. Although the carcinogenesis of HCC is complex, emerging evidence has indicated that m6A modification and circRNA play vital roles in HCC development and progression. However, the underlying mechanisms governing HCC, their cross-talk, and clinical implications have not been fully elucidated. Therefore, in this paper, we elucidated the biological functions and molecular mechanisms of m6A modification in the carcinogenesis of HCC by illustrating three different regulatory factors ("writer", "eraser", and "reader") of the m6A modification process. Additionally, we dissected the functional roles of circRNAs in various malignant behaviors of HCC, thereby contributing to HCC initiation, progression and relapse. Furthermore, we demonstrated the cross-talk and interplay between m6A modification and circRNA by revealing the effects of the collaboration of circRNA and m6A modification on HCC progression. Finally, we proposed the clinical potential and implications of m6A modifiers and circRNAs as diagnostic biomarkers and therapeutic targets for HCC diagnosis, treatment and prognosis evaluation.
Collapse
Affiliation(s)
- Sha Qin
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan, China; and Department of Pathology, School of Basic Medical Science, Xiangya School of Medicine, Central South University, Changsha, Hunan, China
| | - Yitao Mao
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, Hunan, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Xue Chen
- Early Clinical Trial Center, Hunan Cancer Hospital and The Affiliated Cancer Hospital of Xiangya School of Medicine, Central South University, Changsha, Hunan, China
| | - Juxiong Xiao
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Yan Qin
- Department of Radiology, Xiangya Hospital, Central South University, Changsha, Hunan, China
| | - Luqing Zhao
- Department of Pathology, Xiangya Hospital, Central South University, Changsha, Hunan, China; and Department of Pathology, School of Basic Medical Science, Xiangya School of Medicine, Central South University, Changsha, Hunan, China.,National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan, China
| |
Collapse
|
16
|
Wang M, Xie J, Xu S. M6A-BiNP: predicting N 6-methyladenosine sites based on bidirectional position-specific propensities of polynucleotides and pointwise joint mutual information. RNA Biol 2021; 18:2498-2512. [PMID: 34161188 PMCID: PMC8632114 DOI: 10.1080/15476286.2021.1930729] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
N6-methyladenosine (m6A) plays an important role in various biological processes. Identifying m6A site is a key step in exploring its biological functions. One of the biggest challenges in identifying m6A sites is how to extract features comprising rich categorical information to distinguish m6A and non-m6A sites. To address this challenge, we propose bidirectional dinucleotide and trinucleotide position-specific propensities, respectively, in this paper. Based on this, we propose two feature-encoding algorithms: Position-Specific Propensities and Pointwise Mutual Information (PSP-PMI) and Position-Specific Propensities and Pointwise Joint Mutual Information (PSP-PJMI). PSP-PMI is based on the bidirectional dinucleotide propensity and the pointwise mutual information, while PSP-PJMI is based on the bidirectional trinucleotide position-specific propensity and the proposed pointwise joint mutual information in this paper. We introduce parameters α and β in PSP-PMI and PSP-PJMI, respectively, to represent the distance from the nucleotide to its forward or backward adjacent nucleotide or dinucleotide, so as to extract features containing local and global classification information. Finally, we propose the M6A-BiNP predictor based on PSP-PMI or PSP-PJMI and SVM classifier. The 10-fold cross-validation experimental results on the benchmark datasets of non-single-base resolution and single-base resolution demonstrate that PSP-PMI and PSP-PJMI can extract features with strong capabilities to identify m6A and non-m6A sites. The M6A-BiNP predictor based on our proposed feature encoding algorithm PSP-PJMI is better than the state-of-the-art predictors, and it is so far the best model to identify m6A and non-m6A sites.
Collapse
Affiliation(s)
- Mingzhao Wang
- College of Life Sciences, Shaanxi Normal University, Xi'an, China.,School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Juanying Xie
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Shengquan Xu
- College of Life Sciences, Shaanxi Normal University, Xi'an, China
| |
Collapse
|