1
|
Tang X, Ji L. Predicting Plant miRNA-lncRNA Interactions via a Deep Learning Method. IEEE Trans Nanobioscience 2023; 22:728-733. [PMID: 37167036 DOI: 10.1109/tnb.2023.3275178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
In recent years, due to the contribution to elucidating the functional mechanisms of miRNAs and lncRNAs, the research on miRNA-lncRNA interaction prediction has increased exponentially. However, the prediction research is challenging in bioinformatics domain. It is expensive and time-consuming to verify the interactions by biological experiments. The existing prediction models have some limitations, such as the need to manually extract features, the potential loss of features from pre-treatment approaches, long-distance dependency to be solved, and so on. Additionally, most of the current models prefer to the animal data. However, the establishment of an efficient and accurate plant miRNA-lncRNA interaction prediction model is necessary. In this work, a new deep learning model called PmlIPM is presented to infer plant miRNA-lncRNA associations. PmlIPM is a four-step framework including Input Embedding, Positional Encoding, Multi-Head Attention and Max Pooling. PmlIPM accepts separately input of miRNA and lncRNA to extract sequence features, avoiding information loss caused by direct splicing the two sequences as model inputs. The attention mechanisms give the model the ability to capture long distance features. PmlIPM is compared with the existing models on 2 benchmark datasets. The results show that our model performs better than other methods and obtains AUC scores of 0.8412, 0.8587, 0.9666 and 0.9225 in the four independent test sets of Arabidopsis lyrata (A.ly), Solanum lycopersicum (S.ly), Brachypodium distachyon (B.di) and Solanum tuberosum (S.tu), respectively.
Collapse
|
2
|
Sheng N, Huang L, Gao L, Cao Y, Xie X, Wang Y. A Survey of Computational Methods and Databases for lncRNA-MiRNA Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2810-2826. [PMID: 37030713 DOI: 10.1109/tcbb.2023.3264254] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) are two prevalent non-coding RNAs in current research. They play critical regulatory roles in the life processes of animals and plants. Studies have shown that lncRNAs can interact with miRNAs to participate in post-transcriptional regulatory processes, mainly involved in regulating cancer development, metastatic progression, and drug resistance. Additionally, these interactions have significant effects on plant growth, development, and responses to biotic and abiotic stresses. Deciphering the potential relationships between lncRNAs and miRNAs may provide new insights into our understanding of the biological functions of lncRNAs and miRNAs, and the pathogenesis of complex diseases. In contrast, gathering information on lncRNA-miRNA interactions (LMIs) through biological experiments is expensive and time-consuming. With the accumulation of multi-omics data, computational models are extremely attractive in systematically exploring potential LMIs. To the best of our knowledge, this is the first comprehensive review of computational methods for identifying LMIs. Specifically, we first summarized the available public databases for predicting animal and plant LMIs. Second, we comprehensively reviewed the computational methods for predicting LMIs and classified them into two categories, including network-based methods and sequence-based methods. Third, we analyzed the standard evaluation methods and metrics used in LMI prediction. Finally, we pointed out some problems in the current study and discuss future research directions. Relevant databases and the latest advances in LMI prediction are summarized in a GitHub repository https://github.com/sheng-n/lncRNA-miRNA-interaction-methods, and we'll keep it updated.
Collapse
|
3
|
SGAEMDA: Predicting miRNA-Disease Associations Based on Stacked Graph Autoencoder. Cells 2022; 11:cells11243984. [PMID: 36552748 PMCID: PMC9776508 DOI: 10.3390/cells11243984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 11/30/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
MicroRNA (miRNA)-disease association (MDA) prediction is critical for disease prevention, diagnosis, and treatment. Traditional MDA wet experiments, on the other hand, are inefficient and costly.Therefore, we proposed a multi-layer collaborative unsupervised training base model called SGAEMDA (Stacked Graph Autoencoder-Based Prediction of Potential miRNA-Disease Associations). First, from the original miRNA and disease data, we defined two types of initial features: similarity features and association features. Second, stacked graph autoencoder is then used to learn unsupervised low-dimensional representations of meaningful higher-order similarity features, and we concatenate the association features with the learned low-dimensional representations to obtain the final miRNA-disease pair features. Finally, we used a multilayer perceptron (MLP) to predict scores for unknown miRNA-disease associations. SGAEMDA achieved a mean area under the ROC curve of 0.9585 and 0.9516 in 5-fold and 10-fold cross-validation, which is significantly higher than the other baseline methods. Furthermore, case studies have shown that SGAEMDA can accurately predict candidate miRNAs for brain, breast, colon, and kidney neoplasms.
Collapse
|
4
|
Asim MN, Ibrahim MA, Zehe C, Trygg J, Dengel A, Ahmed S. BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction. Interdiscip Sci 2022; 14:841-862. [PMID: 35947255 PMCID: PMC9581873 DOI: 10.1007/s12539-022-00535-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 06/16/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022]
Abstract
Background and objective: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. Method The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach “Bot-Net” which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA–miRNA interaction prediction. Results BoT-Net outperforms the state-of-the-art lncRNA–miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA–protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. Conclusion In the benchmark lncRNA–miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA–protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA–miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. Availability: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/. Graphic Abstract ![]()
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Christoph Zehe
- Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany
| | - Johan Trygg
- Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany
- Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden
| |
Collapse
|
5
|
Xu M, Chen Y, Lu W, Kong L, Fang J, Li Z, Zhang L, Pian C. SPMLMI: predicting lncRNA-miRNA interactions in humans using a structural perturbation method. PeerJ 2021; 9:e11426. [PMID: 34055486 PMCID: PMC8140594 DOI: 10.7717/peerj.11426] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/18/2021] [Indexed: 01/06/2023] Open
Abstract
Long non-coding RNA (lncRNA)-microRNA (miRNA) interactions are quickly emerging as important mechanisms underlying the functions of non-coding RNAs. Accordingly, predicting lncRNA-miRNA interactions provides an important basis for understanding the mechanisms of action of ncRNAs. However, the accuracy of the established prediction methods is still limited. In this study, we used structural consistency to measure the predictability of interactive links based on a bilayer network by integrating information for known lncRNA-miRNA interactions, an lncRNA similarity network, and an miRNA similarity network. In particular, by using the structural perturbation method, we proposed a framework called SPMLMI to predict potential lncRNA-miRNA interactions based on the bilayer network. We found that the structural consistency of the bilayer network was higher than that of any single network, supporting the utility of bilayer network construction for the prediction of lncRNA-miRNA interactions. Applying SPMLMI to three real datasets, we obtained areas under the curves of 0.9512 ± 0.0034, 0.8767 ± 0.0033, and 0.8653 ± 0.0021 based on 5-fold cross-validation, suggesting good model performance. In addition, the generalizability of SPMLMI was better than that of the previously established methods. Case studies of two lncRNAs (i.e., SNHG14 and MALAT1) further demonstrated the feasibility and effectiveness of the method. Therefore, SPMLMI is a feasible approach to identify novel lncRNA-miRNA interactions underlying complex biological processes.
Collapse
Affiliation(s)
- Mingmin Xu
- College of Agriculture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Yuanyuan Chen
- College of Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Wei Lu
- College of Life Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Lingpeng Kong
- College of Agriculture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Jingya Fang
- College of Agriculture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Zutan Li
- College of Agriculture, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Liangyun Zhang
- College of Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China
| | - Cong Pian
- College of Sciences, Nanjing Agricultural University, Nanjing, Jiangsu, China
| |
Collapse
|
6
|
Kang Q, Meng J, Shi W, Luan Y. Ensemble Deep Learning Based on Multi-level Information Enhancement and Greedy Fuzzy Decision for Plant miRNA-lncRNA Interaction Prediction. Interdiscip Sci 2021; 13:603-614. [PMID: 33900552 DOI: 10.1007/s12539-021-00434-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 04/01/2021] [Accepted: 04/16/2021] [Indexed: 12/18/2022]
Abstract
MicroRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are both non-coding RNAs (ncRNAs) and their interactions play important roles in biological processes. Computational methods, such as machine learning and various bioinformatics tools, can predict potential miRNA-lncRNA interactions, which is significant for studying their mechanisms and biological functions. A growing number of RNA interaction predictors for animal have been reported, but they are unreliable for plant due to the differences of ncRNAs in animal and plant. It is urgent to build a reliable plant predictor, especially for cross-species. This paper proposes an ensemble deep learning model based on multi-level information enhancement and greedy fuzzy decision (PmliPEMG) for plant miRNA-lncRNA interaction prediction. The fusion complex features, multi-scale convolutional long short-term memory networks, and attention mechanism are adopted to enhance the sample information at the feature, scale, and model levels, respectively. An ensemble deep learning model is built based on a novel method (greedy fuzzy decision) which greatly improves the efficiency. The multi-level information enhancement and greedy fuzzy decision are verified to have the positive effects on prediction performance. PmliPEMG can be applied to the cross-species prediction. It shows better performance and stronger generalization ability than state-of-the-art predictors and may provide valuable references for related research.
Collapse
Affiliation(s)
- Qiang Kang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Wenhao Shi
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
7
|
Yang M, Huang L, Xu Y, Lu C, Wang J. Heterogeneous graph inference with matrix completion for computational drug repositioning. Bioinformatics 2020; 36:5456-5464. [PMID: 33331887 DOI: 10.1093/bioinformatics/btaa1024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2020] [Revised: 11/23/2020] [Accepted: 11/26/2020] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Emerging evidence presents that traditional drug discovery experiment is time-consuming and high costs. Computational drug repositioning plays a critical role in saving time and resources for drug research and discovery. Therefore, developing more accurate and efficient approaches is imperative. Heterogeneous graph inference is a classical method in computational drug repositioning, which not only has high convergence precision, but also has fast convergence speed. However, the method has not fully considered the sparsity of heterogeneous association network. In addition, rough similarity measure can reduce the performance in identifying drug-associated indications. RESULTS In this article, we propose a heterogeneous graph inference with matrix completion (HGIMC) method to predict potential indications for approved and novel drugs. First, we use a bounded matrix completion (BMC) model to prefill a part of the missing entries in original drug-disease association matrix. This step can add more positive and formative drug-disease edges between drug network and disease network. Second, Gaussian radial basis function (GRB) is employed to improve the drug and disease similarities since the performance of heterogeneous graph inference more relies on similarity measures. Next, based on the updated drug-disease associations and new similarity measures of drug and disease, we construct a novel heterogeneous drug-disease network. Finally, HGIMC utilizes the heterogeneous network to infer the scores of unknown association pairs, and then recommend the promising indications for drugs. To evaluate the performance of our method, HGIMC is compared with five state-of-the-art approaches of drug repositioning in the 10-fold cross-validation and de novo tests. As the numerical results shown, HGIMC not only achieves a better prediction performance, but also has an excellent computation efficiency. In addition, cases studies also confirm the effectiveness of our method in practical application. AVAILABILITY The HGIMC software is freely available at https://github.com/BioinformaticsCSU/HGIMC, https://hub.docker.com/repository/docker/yangmy84/hgimc, and http://doi.org/10.5281/zenodo.4285640. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mengyun Yang
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, P.R.China.,School of Science, Shaoyang University, Shaoyang, P.R.China
| | - Lan Huang
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| | - Yunpei Xu
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| | - Chengqian Lu
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| | - Jianxin Wang
- The Hunan Provincial Key Lab of Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, P.R.China
| |
Collapse
|
8
|
Wang W, Guan X, Khan MT, Xiong Y, Wei DQ. LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions. Comput Biol Chem 2020; 89:107406. [PMID: 33120126 DOI: 10.1016/j.compbiolchem.2020.107406] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 10/12/2020] [Accepted: 10/15/2020] [Indexed: 02/07/2023]
Abstract
The interactions between miRNAs and long non-coding RNAs (lncRNAs) are subject to intensive recent studies due to its critical role in gene regulations. Computational prediction of lncRNA-miRNA interactions has become a popular alternative strategy to the experimental methods for identification of underlying interactions. It is desirable to develop the machine learning-based models for prediction of lncRNA-miRNA based on the experimentally validated interactions between lncRNAs and miRNAs. The accuracy and robustness of existing models based on machine learning techniques are subject to further improvement. Considering that the attributes of lncRNA and miRNA contribute key importance in the interaction between these two RNAs, a deep learning model, named LMI-DForest, is proposed here by combining the deep forest and autoencoder strategies. Systematic comparison on the experiment validated datasets for lncRNA-miRNA interaction datasets demonstrates that the proposed method consistently shows superior performance over the other machine learning models in the lncRNA-miRNA interaction prediction.
Collapse
Affiliation(s)
- Wei Wang
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Muhammad Tahir Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore Pakistan, Pakistan
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; Peng Cheng Laboratory, Shenzhen, Guangdong, China.
| |
Collapse
|