1
|
Zhang Q, Wei Y, Liu L. GraphPro: An interpretable graph neural network-based model for identifying promoters in multiple species. Comput Biol Med 2024; 180:108974. [PMID: 39096613 DOI: 10.1016/j.compbiomed.2024.108974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2024] [Revised: 07/29/2024] [Accepted: 07/30/2024] [Indexed: 08/05/2024]
Abstract
Promoters are DNA sequences that bind with RNA polymerase to initiate transcription, regulating this process through interactions with transcription factors. Accurate identification of promoters is crucial for understanding gene expression regulation mechanisms and developing therapeutic approaches for various diseases. However, experimental techniques for promoter identification are often expensive, time-consuming, and inefficient, necessitating the development of accurate and efficient computational models for this task. Enhancing the model's ability to recognize promoters across multiple species and improving its interpretability pose significant challenges. In this study, we introduce a novel interpretable model based on graph neural networks, named GraphPro, for multi-species promoter identification. Initially, we encode the sequences using k-tuple nucleotide frequency pattern, dinucleotide physicochemical properties, and dna2vec. Subsequently, we construct two feature extraction modules based on convolutional neural networks and graph neural networks. These modules aim to extract specific motifs from the promoters, learn their dependencies, and capture the underlying structural features of the promoters, providing a more comprehensive representation. Finally, a fully connected neural network predicts whether the input sequence is a promoter. We conducted extensive experiments on promoter datasets from eight species, including Human, Mouse, and Escherichia coli. The experimental results show that the average Sn, Sp, Acc and MCC values of GraphPro are 0.9123, 0.9482, 0.8840 and 0.7984, respectively. Compared with previous promoter identification methods, GraphPro not only achieves better recognition accuracy on multiple species, but also outperforms all previous methods in cross-species prediction ability. Furthermore, by visualizing GraphPro's decision process and analyzing the sequences matching the transcription factor binding motifs captured by the model, we validate its significant advantages in biological interpretability. The source code for GraphPro is available at https://github.com/liuliwei1980/GraphPro.
Collapse
Affiliation(s)
- Qi Zhang
- College of Science, Dalian Jiaotong University, Dalian, 116028, China
| | - Yuxiao Wei
- College of Software, Dalian Jiaotong University, Dalian, 116028, China
| | - Liwei Liu
- College of Science, Dalian Jiaotong University, Dalian, 116028, China.
| |
Collapse
|
2
|
Diao B, Luo J, Guo Y. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs. Brief Funct Genomics 2024; 23:314-324. [PMID: 38576205 DOI: 10.1093/bfgp/elae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body's normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
Collapse
Affiliation(s)
- Biyu Diao
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Jin Luo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Yu Guo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| |
Collapse
|
3
|
Hu W, Li Y, Wu Y, Guan L, Li M. A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding. iScience 2024; 27:110030. [PMID: 38868182 PMCID: PMC11167433 DOI: 10.1016/j.isci.2024.110030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 04/23/2024] [Accepted: 05/16/2024] [Indexed: 06/14/2024] Open
Abstract
Enhancers, genomic DNA elements, regulate neighboring gene expression crucial for biological processes like cell differentiation and stress response. However, current machine learning methods for predicting DNA enhancers often underutilize hidden features in gene sequences, limiting model accuracy. Hence, this article proposes the PDCNN model, a deep learning-based enhancer prediction method. PDCNN extracts statistical nucleotide representations from gene sequences, discerning positional distribution information of nucleotides in modifier-like DNA sequences. With a convolutional neural network structure, PDCNN employs dual convolutional and fully connected layers. The cross-entropy loss function iteratively updates using a gradient descent algorithm, enhancing prediction accuracy. Model parameters are fine-tuned to select optimal combinations for training, achieving over 95% accuracy. Comparative analysis with traditional methods and existing models demonstrates PDCNN's robust feature extraction capability. It outperforms advanced machine learning methods in identifying DNA enhancers, presenting an effective method with broad implications for genomics, biology, and medical research.
Collapse
Affiliation(s)
- Wenxing Hu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Yelin Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Yan Wu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| |
Collapse
|
4
|
Hwang H, Jeon H, Yeo N, Baek D. Big data and deep learning for RNA biology. Exp Mol Med 2024; 56:1293-1321. [PMID: 38871816 PMCID: PMC11263376 DOI: 10.1038/s12276-024-01243-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 02/27/2024] [Accepted: 03/05/2024] [Indexed: 06/15/2024] Open
Abstract
The exponential growth of big data in RNA biology (RB) has led to the development of deep learning (DL) models that have driven crucial discoveries. As constantly evidenced by DL studies in other fields, the successful implementation of DL in RB depends heavily on the effective utilization of large-scale datasets from public databases. In achieving this goal, data encoding methods, learning algorithms, and techniques that align well with biological domain knowledge have played pivotal roles. In this review, we provide guiding principles for applying these DL concepts to various problems in RB by demonstrating successful examples and associated methodologies. We also discuss the remaining challenges in developing DL models for RB and suggest strategies to overcome these challenges. Overall, this review aims to illuminate the compelling potential of DL for RB and ways to apply this powerful technology to investigate the intriguing biology of RNA more effectively.
Collapse
Affiliation(s)
- Hyeonseo Hwang
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Hyeonseong Jeon
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea
- Genome4me Inc., Seoul, Republic of Korea
| | - Nagyeong Yeo
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Daehyun Baek
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea.
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
- Genome4me Inc., Seoul, Republic of Korea.
| |
Collapse
|
5
|
Huang G, Huang X, Jiang J. Deepm6A-MT: A deep learning-based method for identifying RNA N6-methyladenosine sites in multiple tissues. Methods 2024; 226:1-8. [PMID: 38485031 DOI: 10.1016/j.ymeth.2024.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/20/2024] [Accepted: 03/11/2024] [Indexed: 04/13/2024] Open
Abstract
N6-methyladenosine (m6A) is the most prevalent, abundant, and conserved internal modification in the eukaryotic messenger RNA (mRNAs) and plays a crucial role in the cellular process. Although more than ten methods were developed for m6A detection over the past decades, there were rooms left to improve the predictive accuracy and the efficiency. In this paper, we proposed an improved method for predicting m6A modification sites, which was based on bi-directional gated recurrent unit (Bi-GRU) and convolutional neural networks (CNN), called Deepm6A-MT. The Deepm6A-MT has two input channels. One is to use an embedding layer followed by the Bi-GRU and then by the CNN, and another is to use one-hot encoding, dinucleotide one-hot encoding, and nucleotide chemical property codes. We trained and evaluated the Deepm6A-MT both by the 5-fold cross-validation and the independent test. The empirical tests showed that the Deepm6A-MT achieved the state of the art performance. In addition, we also conducted the cross-species and the cross-tissues tests to further verify the Deepm6A-MT for effectiveness and efficiency. Finally, for the convenience of academic research, we deployed the Deepm6A-MT to the web server, which is accessed at the URL http://www.biolscience.cn/Deepm6A-MT/.
Collapse
Affiliation(s)
- Guohua Huang
- School of Information Technology and Administration, Hunan University of Finance and Economics, Changsha, Hunan 410205, China.
| | - Xiaohong Huang
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| | - Jinyun Jiang
- College of Information Science and Engineering, Shaoyang University, Shaoyang, Hunan 422000, China
| |
Collapse
|
6
|
Yang M, Zhang S, Zheng Z, Zhang P, Liang Y, Tang S. Employing bimodal representations to predict DNA bendability within a self-supervised pre-trained framework. Nucleic Acids Res 2024; 52:e33. [PMID: 38375921 PMCID: PMC11014357 DOI: 10.1093/nar/gkae099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/10/2024] [Accepted: 02/01/2024] [Indexed: 02/21/2024] Open
Abstract
The bendability of genomic DNA, which measures the DNA looping rate, is crucial for numerous biological processes of DNA. Recently, an advanced high-throughput technique known as 'loop-seq' has made it possible to measure the inherent cyclizability of DNA fragments. However, quantifying the bendability of large-scale DNA is costly, laborious, and time-consuming. To close the gap between rapidly evolving large language models and expanding genomic sequence information, and to elucidate the DNA bendability's impact on critical regulatory sequence motifs such as super-enhancers in the human genome, we introduce an innovative computational model, named MIXBend, to forecast the DNA bendability utilizing both nucleotide sequences and physicochemical properties. In MIXBend, a pre-trained language model DNABERT and convolutional neural network with attention mechanism are utilized to construct both sequence- and physicochemical-based extractors for the sophisticated refinement of DNA sequence representations. These bimodal DNA representations are then fed to a k-mer sequence-physicochemistry matching module to minimize the semantic gap between each modality. Lastly, a self-attention fusion layer is employed for the prediction of DNA bendability. In conclusion, the experimental results validate MIXBend's superior performance relative to other state-of-the-art methods. Additionally, MIXBend reveals both novel and known motifs from the yeast. Moreover, MIXBend discovers significant bendability fluctuations within super-enhancer regions and transcription factors binding sites in the human genome.
Collapse
Affiliation(s)
- Minghao Yang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Shichen Zhang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Zhihang Zheng
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Pengfei Zhang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
| | - Yan Liang
- School of Artificial Intelligence, South China Normal University, Foshan 528225, China
| | - Shaojun Tang
- Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511466, China
- Division of Life Science, Hong Kong University of Science and Technology, Hong Kong SAR 999077, China
| |
Collapse
|
7
|
Lei R, Jia J, Qin L, Wei X. iPro2L-DG: Hybrid network based on improved densenet and global attention mechanism for identifying promoter sequences. Heliyon 2024; 10:e27364. [PMID: 38510021 PMCID: PMC10950492 DOI: 10.1016/j.heliyon.2024.e27364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/24/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
The promoter is a key DNA sequence whose primary function is to control the initiation time and the degree of expression of gene transcription. Accurate identification of promoters is essential for understanding gene expression studies. Traditional sequencing techniques for identifying promoters are costly and time-consuming. Therefore, the development of computational methods to identify promoters has become critical. Since deep learning methods show great potential in identifying promoters, this study proposes a new promoter prediction model, called iPro2L-DG. The iPro2L-DG predictor, based on an improved Densely Connected Convolutional Network (DenseNet) and a Global Attention Mechanism (GAM), is constructed to achieve the prediction of promoters. The promoter sequences are combined feature encoding using C2 encoding and nucleotide chemical property (NCP) encoding. An improved DenseNet extracts advanced feature information from the combined feature encoding. GAM evaluates the importance of advanced feature information in terms of channel and spatial dimensions, and finally uses a Full Connect Neural Network (FNN) to derive prediction probabilities. The experimental results showed that the accuracy of iPro2L-DG in the first layer (promoter identification) was 94.10% with Matthews correlation coefficient value of 0.8833. In the second layer (promoter strength prediction), the accuracy was 89.42% with Matthews correlation coefficient value of 0.7915. The iPro2L-DG predictor significantly outperforms other existing predictors in promoter identification and promoter strength prediction. Therefore, our proposed model iPro2L-DG is the most advanced promoter prediction tool. The source code of the iPro2L-DG model can be found in https://github.com/leirufeng/iPro2L-DG.
Collapse
Affiliation(s)
- Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, 330044, China
| |
Collapse
|
8
|
Jia J, Lei R, Qin L, Wei X. i5mC-DCGA: an improved hybrid network framework based on the CBAM attention mechanism for identifying promoter 5mC sites. BMC Genomics 2024; 25:242. [PMID: 38443802 PMCID: PMC10913688 DOI: 10.1186/s12864-024-10154-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 02/22/2024] [Indexed: 03/07/2024] Open
Abstract
BACKGROUND 5-Methylcytosine (5mC) plays a very important role in gene stability, transcription, and development. Therefore, accurate identification of the 5mC site is of key importance in genetic and pathological studies. However, traditional experimental methods for identifying 5mC sites are time-consuming and costly, so there is an urgent need to develop computational methods to automatically detect and identify these 5mC sites. RESULTS Deep learning methods have shown great potential in the field of 5mC sites, so we developed a deep learning combinatorial model called i5mC-DCGA. The model innovatively uses the Convolutional Block Attention Module (CBAM) to improve the Dense Convolutional Network (DenseNet), which is improved to extract advanced local feature information. Subsequently, we combined a Bidirectional Gated Recurrent Unit (BiGRU) and a Self-Attention mechanism to extract global feature information. Our model can learn feature representations of abstract and complex from simple sequence coding, while having the ability to solve the sample imbalance problem in benchmark datasets. The experimental results show that the i5mC-DCGA model achieves 97.02%, 96.52%, 96.58% and 85.58% in sensitivity (Sn), specificity (Sp), accuracy (Acc) and matthews correlation coefficient (MCC), respectively. CONCLUSIONS The i5mC-DCGA model outperforms other existing prediction tools in predicting 5mC sites, and it is currently the most representative promoter 5mC site prediction tool. The benchmark dataset and source code for the i5mC-DCGA model can be found in https://github.com/leirufeng/i5mC-DCGA .
Collapse
Grants
- Nos. 61761023, 62162032, and 31760315 National Natural Science Foundation of China
- Nos. 61761023, 62162032, and 31760315 National Natural Science Foundation of China
- Nos. 61761023, 62162032, and 31760315 National Natural Science Foundation of China
- Nos. 20202BABL202004 and 20202BAB202007 Natural Science Foundation of Jiangxi Province
- Nos. 20202BABL202004 and 20202BAB202007 Natural Science Foundation of Jiangxi Province
- Nos. 20202BABL202004 and 20202BAB202007 Natural Science Foundation of Jiangxi Province
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
- GJJ190695 and GJJ212419 Scientific Research Plan of the Department of Education of Jiangxi Province, China
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, 333403, Jingdezhen, China.
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, 333403, Jingdezhen, China.
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, 333403, Jingdezhen, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, 330044, Nanchang, China
| |
Collapse
|
9
|
Zhou Y, Wu J, Yao S, Xu Y, Zhao W, Tong Y, Zhou Z. DeepCIP: A multimodal deep learning method for the prediction of internal ribosome entry sites of circRNAs. Comput Biol Med 2023; 164:107288. [PMID: 37542919 DOI: 10.1016/j.compbiomed.2023.107288] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/05/2023] [Accepted: 07/28/2023] [Indexed: 08/07/2023]
Abstract
Circular RNAs (circRNAs) have been found to have the ability to encode proteins through internal ribosome entry sites (IRESs), which are essential RNA regulatory elements for cap-independent translation. Identification of IRES elements in circRNA is crucial for understanding its function. Previous studies have presented IRES predictors based on machine learning techniques, but they were mainly designed for linear RNA IRES. In this study, we proposed DeepCIP (Deep learning method for CircRNA IRES Prediction), a multimodal deep learning approach that employs both sequence and structural information for circRNA IRES prediction. Our results demonstrate the effectiveness of the sequence and structure models used by DeepCIP in feature extraction and suggest that integrating sequence and structural information efficiently improves the accuracy of prediction. The comparison studies indicate that DeepCIP outperforms other comparative methods on the test set and real circRNA IRES dataset. Furthermore, through the integration of an interpretable analysis mechanism, we elucidate the sequence patterns learned by our model, which align with the previous discovery of motifs that facilitate circRNA translation. Thus, DeepCIP has the potential to enhance the study of the coding potential of circRNAs and contribute to the design of circRNA-based drugs. DeepCIP as a standalone program is freely available at https://github.org/zjupgx/DeepCIP.
Collapse
Affiliation(s)
- Yuxuan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Jingcheng Wu
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Shihao Yao
- College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Yulian Xu
- College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Wenbin Zhao
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China
| | - Yunguang Tong
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; College of Life Sciences, China Jiliang University, Hangzhou, 310018, China; Aoming (Hangzhou) Biomedical Co., Ltd., Hangzhou, 310018, China; Zhejiang University Innovation Institute for Artificial Intelligence in Medicine - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China; China Jiliang University - Aoming (Hangzhou) Biomedical Co., Ltd. Joint Laboratory, Hangzhou, 310018, China.
| | - Zhan Zhou
- Innovation Institute for Artificial Intelligence in Medicine and Zhejiang Provincial Key Laboratory of Anti-Cancer Drug Research, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, China; The Fourth Affiliated Hospital, Zhejiang University School of Medicine, Yiwu, 322000, China.
| |
Collapse
|
10
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
11
|
Jia J, Lei R, Qin L, Wu G, Wei X. iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module. Front Genet 2023; 14:1132018. [PMID: 36936423 PMCID: PMC10014624 DOI: 10.3389/fgene.2023.1132018] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open
Abstract
Enhancers play a crucial role in controlling gene transcription and expression. Therefore, bioinformatics puts many emphases on predicting enhancers and their strength. It is vital to create quick and accurate calculating techniques because conventional biomedical tests take too long time and are too expensive. This paper proposed a new predictor called iEnhancer-DCSV built on a modified densely connected convolutional network (DenseNet) and an improved convolutional block attention module (CBAM). Coding was performed using one-hot and nucleotide chemical property (NCP). DenseNet was used to extract advanced features from raw coding. The channel attention and spatial attention modules were used to evaluate the significance of the advanced features and then input into a fully connected neural network to yield the prediction probabilities. Finally, ensemble learning was employed on the final categorization findings via voting. According to the experimental results on the test set, the first layer of enhancer recognition achieved an accuracy of 78.95%, and the Matthews correlation coefficient value was 0.5809. The second layer of enhancer strength prediction achieved an accuracy of 80.70%, and the Matthews correlation coefficient value was 0.6609. The iEnhancer-DCSV method can be found at https://github.com/leirufeng/iEnhancer-DCSV. It is easy to obtain the desired results without using the complex mathematical formulas involved.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- *Correspondence: Jianhua Jia, ; Rufeng Lei,
| | - Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
- *Correspondence: Jianhua Jia, ; Rufeng Lei,
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, China
| |
Collapse
|