1
|
Daniel Thomas S, Vijayakumar K, John L, Krishnan D, Rehman N, Revikumar A, Kandel Codi JA, Prasad TSK, S S V, Raju R. Machine Learning Strategies in MicroRNA Research: Bridging Genome to Phenome. OMICS : A JOURNAL OF INTEGRATIVE BIOLOGY 2024; 28:213-233. [PMID: 38752932 DOI: 10.1089/omi.2024.0047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2024]
Abstract
MicroRNAs (miRNAs) have emerged as a prominent layer of regulation of gene expression. This article offers the salient and current aspects of machine learning (ML) tools and approaches from genome to phenome in miRNA research. First, we underline that the complexity in the analysis of miRNA function ranges from their modes of biogenesis to the target diversity in diverse biological conditions. Therefore, it is imperative to first ascertain the miRNA coding potential of genomes and understand the regulatory mechanisms of their expression. This knowledge enables the efficient classification of miRNA precursors and the identification of their mature forms and respective target genes. Second, and because one miRNA can target multiple mRNAs and vice versa, another challenge is the assessment of the miRNA-mRNA target interaction network. Furthermore, long-noncoding RNA (lncRNA)and circular RNAs (circRNAs) also contribute to this complexity. ML has been used to tackle these challenges at the high-dimensional data level. The present expert review covers more than 100 tools adopting various ML approaches pertaining to, for example, (1) miRNA promoter prediction, (2) precursor classification, (3) mature miRNA prediction, (4) miRNA target prediction, (5) miRNA- lncRNA and miRNA-circRNA interactions, (6) miRNA-mRNA expression profiling, (7) miRNA regulatory module detection, (8) miRNA-disease association, and (9) miRNA essentiality prediction. Taken together, we unpack, critically examine, and highlight the cutting-edge synergy of ML approaches and miRNA research so as to develop a dynamic and microlevel understanding of human health and diseases.
Collapse
Affiliation(s)
- Sonet Daniel Thomas
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Krithika Vijayakumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Levin John
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Deepak Krishnan
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Niyas Rehman
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | - Amjesh Revikumar
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Kerala Genome Data Centre, Kerala Development and Innovation Strategic Council, Thiruvananthapuram, Kerala, India
| | - Jalaluddin Akbar Kandel Codi
- Department of Surgical Oncology, Yenepoya Medical College, Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| | | | - Vinodchandra S S
- Department of Computer Science, University of Kerala, Thiruvananthapuram, Kerala, India
| | - Rajesh Raju
- Centre for Integrative Omics Data Science (CIODS), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
- Centre for Systems Biology and Molecular Medicine (CSBMM), Yenepoya (Deemed to Be University), Manglore, Karnataka, India
| |
Collapse
|
2
|
Ma Y, Zhang H, Jin C, Kang C. Predicting lncRNA-protein interactions with bipartite graph embedding and deep graph neural networks. Front Genet 2023; 14:1136672. [PMID: 36845380 PMCID: PMC9948011 DOI: 10.3389/fgene.2023.1136672] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Accepted: 01/30/2023] [Indexed: 02/11/2023] Open
Abstract
Background: Long non-coding RNAs (lncRNAs) play crucial roles in numerous biological processes. Investigation of the lncRNA-protein interaction contributes to discovering the undetected molecular functions of lncRNAs. In recent years, increasingly computational approaches have substituted the traditional time-consuming experiments utilized to crack the possible unknown associations. However, significant explorations of the heterogeneity in association prediction between lncRNA and protein are inadequate. It remains challenging to integrate the heterogeneity of lncRNA-protein interactions with graph neural network algorithms. Methods: In this paper, we constructed a deep architecture based on GNN called BiHo-GNN, which is the first to integrate the properties of homogeneous with heterogeneous networks through bipartite graph embedding. Different from previous research, BiHo-GNN can capture the mechanism of molecular association by the data encoder of heterogeneous networks. Meanwhile, we design the process of mutual optimization between homogeneous and heterogeneous networks, which can promote the robustness of BiHo-GNN. Results: We collected four datasets for predicting lncRNA-protein interaction and compared the performance of current prediction models on benchmarking dataset. In comparison with the performance of other models, BiHo-GNN outperforms existing bipartite graph-based methods. Conclusion: Our BiHo-GNN integrates the bipartite graph with homogeneous graph networks. Based on this model structure, the lncRNA-protein interactions and potential associations can be predicted and discovered accurately.
Collapse
Affiliation(s)
- Yuzhou Ma
- College of Artificial Intelligence, Nankai University, Tianjin, China
| | - Han Zhang
- College of Artificial Intelligence, Nankai University, Tianjin, China,*Correspondence: Han Zhang,
| | - Chen Jin
- College of Computer Science, Nankai University, Tianjin, China
| | - Chuanze Kang
- College of Artificial Intelligence, Nankai University, Tianjin, China
| |
Collapse
|
3
|
Zhang H, Wang Y, Pan Z, Sun X, Mou M, Zhang B, Li Z, Li H, Zhu F. ncRNAInter: a novel strategy based on graph neural network to discover interactions between lncRNA and miRNA. Brief Bioinform 2022; 23:6747810. [PMID: 36198065 DOI: 10.1093/bib/bbac411] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 08/04/2022] [Accepted: 08/23/2022] [Indexed: 12/14/2022] Open
Abstract
In recent years, many studies have illustrated the significant role that non-coding RNA (ncRNA) plays in biological activities, in which lncRNA, miRNA and especially their interactions have been proved to affect many biological processes. Some in silico methods have been proposed and applied to identify novel lncRNA-miRNA interactions (LMIs), but there are still imperfections in their RNA representation and information extraction approaches, which imply there is still room for further improving their performances. Meanwhile, only a few of them are accessible at present, which limits their practical applications. The construction of a new tool for LMI prediction is thus imperative for the better understanding of their relevant biological mechanisms. This study proposed a novel method, ncRNAInter, for LMI prediction. A comprehensive strategy for RNA representation and an optimized deep learning algorithm of graph neural network were utilized in this study. ncRNAInter was robust and showed better performance of 26.7% higher Matthews correlation coefficient than existing reputable methods for human LMI prediction. In addition, ncRNAInter proved its universal applicability in dealing with LMIs from various species and successfully identified novel LMIs associated with various diseases, which further verified its effectiveness and usability. All source code and datasets are freely available at https://github.com/idrblab/ncRNAInter.
Collapse
Affiliation(s)
- Hanyu Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Yunxia Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ziqi Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Minjie Mou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Honglin Li
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China.,Shanghai Key Laboratory of New Drug Design, East China University of Science and Technology, Shanghai 200237, China
| | - Feng Zhu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| |
Collapse
|
4
|
Wang W, Zhang L, Sun J, Zhao Q, Shuai J. Predicting the potential human lncRNA-miRNA interactions based on graph convolution network with conditional random field. Brief Bioinform 2022; 23:6775599. [PMID: 36305458 DOI: 10.1093/bib/bbac463] [Citation(s) in RCA: 134] [Impact Index Per Article: 67.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2022] [Revised: 09/10/2022] [Accepted: 09/27/2022] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNA (lncRNA) and microRNA (miRNA) are two typical types of non-coding RNAs (ncRNAs), their interaction plays an important regulatory role in many biological processes. Exploring the interactions between unknown lncRNA and miRNA can help us better understand the functional expression between lncRNA and miRNA. At present, the interactions between lncRNA and miRNA are mainly obtained through biological experiments, but such experiments are often time-consuming and labor-intensive, it is necessary to design a computational method that can predict the interactions between lncRNA and miRNA. In this paper, we propose a method based on graph convolutional neural (GCN) network and conditional random field (CRF) for predicting human lncRNA-miRNA interactions, named GCNCRF. First, we construct a heterogeneous network using the known interactions of lncRNA and miRNA in the LncRNASNP2 database, the lncRNA/miRNA integration similarity network, and the lncRNA/miRNA feature matrix. Second, the initial embedding of nodes is obtained using a GCN network. A CRF set in the GCN hidden layer can update the obtained preliminary embeddings so that similar nodes have similar embeddings. At the same time, an attention mechanism is added to the CRF layer to reassign weights to nodes to better grasp the feature information of important nodes and ignore some nodes with less influence. Finally, the final embedding is decoded and scored through the decoding layer. Through a 5-fold cross-validation experiment, GCNCRF has an area under the receiver operating characteristic curve value of 0.947 on the main dataset, which has higher prediction accuracy than the other six state-of-the-art methods.
Collapse
Affiliation(s)
- Wenya Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Jianqiang Sun
- School of Automation and Electrical Engineering, Linyi University, Linyi, 276000, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianwei Shuai
- Oujiang Laboratory (Zhejiang Lab for Regenerative Medicine, Vision and Brain Health), and Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325001, China.,Department of Physics, and Fujian Provincial Key Laboratory for Soft Functional Materials Research, Xiamen University, Xiamen, 361005, China.,National Institute for Data Science in Health and Medicine, and State Key Laboratory of Cellular Stress Biology, Innovation Center for Cell Signaling Network, Xiamen University, Xiamen, 361005, China
| |
Collapse
|
5
|
Wang B, Wang X, Zheng X, Han Y, Du X. JSCSNCP-LMA: a method for predicting the association of lncRNA-miRNA. Sci Rep 2022; 12:17030. [PMID: 36220862 PMCID: PMC9552706 DOI: 10.1038/s41598-022-21243-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 09/26/2022] [Indexed: 12/29/2022] Open
Abstract
Non-coding RNAs (ncRNAs) have long been considered the "white elephant" on the genome because they lack the ability to encode proteins. However, in recent years, more and more biological experiments and clinical reports have proved that ncRNAs account for a large proportion in organisms. At the same time, they play a decisive role in the biological processes such as gene expression and cell growth and development. Recently, it has been found that short sequence non-coding RNA(miRNA) and long sequence non-coding RNA(lncRNA) can regulate each other, which plays an important role in various complex human diseases. In this paper, we used a new method (JSCSNCP-LMA) to predict lncRNA-miRNA with unknown associations. This method combined Jaccard similarity algorithm, self-tuning spectral clustering similarity algorithm, cosine similarity algorithm and known lncRNA-miRNA association networks, and used the consistency projection to complete the final prediction. The results showed that the AUC values of JSCSNCP-LMA in fivefold cross validation (fivefold CV) and leave-one-out cross validation (LOOCV) were 0.9145 and 0.9268, respectively. Compared with other models, we have successfully proved its superiority and good extensibility. Meanwhile, the model also used three different lncRNA-miRNA datasets in the fivefold CV experiment and obtained good results with AUC values of 0.9145, 0.9662 and 0.9505, respectively. Therefore, JSCSNCP-LMA will help to predict the associations between lncRNA and miRNA.
Collapse
Affiliation(s)
- Bo Wang
- grid.412616.60000 0001 0002 2355College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006 People’s Republic of China
| | - Xinwei Wang
- grid.412616.60000 0001 0002 2355College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006 People’s Republic of China
| | - Xiaodong Zheng
- grid.412616.60000 0001 0002 2355College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006 People’s Republic of China
| | - Yu Han
- grid.412616.60000 0001 0002 2355College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006 People’s Republic of China
| | - Xiaoxin Du
- grid.412616.60000 0001 0002 2355College of Computer and Control Engineering, Qiqihar University, Qiqihar, 161006 People’s Republic of China
| |
Collapse
|
6
|
Mehra N, Varmeziar A, Chen X, Kronick O, Fisher R, Kota V, Mitchell CS. Cross-Domain Text Mining to Predict Adverse Events from Tyrosine Kinase Inhibitors for Chronic Myeloid Leukemia. Cancers (Basel) 2022; 14:4686. [PMID: 36230609 PMCID: PMC9563938 DOI: 10.3390/cancers14194686] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 09/04/2022] [Accepted: 09/23/2022] [Indexed: 12/02/2022] Open
Abstract
Tyrosine kinase inhibitors (TKIs) are prescribed for chronic myeloid leukemia (CML) and some other cancers. The objective was to predict and rank TKI-related adverse events (AEs), including under-reported or preclinical AEs, using novel text mining. First, k-means clustering of 2575 clinical CML TKI abstracts separated TKIs by significant (p < 0.05) AE type: gastrointestinal (bosutinib); edema (imatinib); pulmonary (dasatinib); diabetes (nilotinib); cardiovascular (ponatinib). Next, we propose a novel cross-domain text mining method utilizing a knowledge graph, link prediction, and hub node network analysis to predict new relationships. Cross-domain text mining of 30+ million articles via SemNet predicted and ranked known and novel TKI AEs. Three physiology-based tiers were formed using unsupervised rank aggregation feature importance. Tier 1 ranked in the top 1%: hematology (anemia, neutropenia, thrombocytopenia, hypocellular marrow); glucose (diabetes, insulin resistance, metabolic syndrome); iron (deficiency, overload, metabolism), cardiovascular (hypertension, heart failure, vascular dilation); thyroid (hypothyroidism, hyperthyroidism, parathyroid). Tier 2 ranked in the top 5%: inflammation (chronic inflammatory disorder, autoimmune, periodontitis); kidney (glomerulonephritis, glomerulopathy, toxic nephropathy). Tier 3 ranked in the top 10%: gastrointestinal (bowel regulation, hepatitis, pancreatitis); neuromuscular (autonomia, neuropathy, muscle pain); others (secondary cancers, vitamin deficiency, edema). Results suggest proactive TKI patient AE surveillance levels: regular surveillance for tier 1, infrequent surveillance for tier 2, and symptom-based surveillance for tier 3.
Collapse
Affiliation(s)
- Nidhi Mehra
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Armon Varmeziar
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Xinyu Chen
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Olivia Kronick
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Rachel Fisher
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
| | - Vamsi Kota
- Division of Hematology and Oncology, Georgia Cancer Center, Augusta University, Augusta, GA 30912, USA
| | - Cassie S. Mitchell
- Laboratory for Pathology Dynamics, Department of Biomedical Engineering, Georgia Institute of Technology, Emory University School of Medicine, Atlanta, GA 30332, USA
- Center for Machine Learning, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|
7
|
Asim MN, Ibrahim MA, Zehe C, Trygg J, Dengel A, Ahmed S. BoT-Net: a lightweight bag of tricks-based neural network for efficient LncRNA–miRNA interaction prediction. Interdiscip Sci 2022; 14:841-862. [PMID: 35947255 PMCID: PMC9581873 DOI: 10.1007/s12539-022-00535-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 06/16/2022] [Accepted: 07/12/2022] [Indexed: 11/30/2022]
Abstract
Background and objective: Interactions of long non-coding ribonucleic acids (lncRNAs) with micro-ribonucleic acids (miRNAs) play an essential role in gene regulation, cellular metabolic, and pathological processes. Existing purely sequence based computational approaches lack robustness and efficiency mainly due to the high length variability of lncRNA sequences. Hence, the prime focus of the current study is to find optimal length trade-offs between highly flexible length lncRNA sequences. Method The paper at hand performs in-depth exploration of diverse copy padding, sequence truncation approaches, and presents a novel idea of utilizing only subregions of lncRNA sequences to generate fixed-length lncRNA sequences. Furthermore, it presents a novel bag of tricks-based deep learning approach “Bot-Net” which leverages a single layer long-short-term memory network regularized through DropConnect to capture higher order residue dependencies, pooling to retain most salient features, normalization to prevent exploding and vanishing gradient issues, learning rate decay, and dropout to regularize precise neural network for lncRNA–miRNA interaction prediction. Results BoT-Net outperforms the state-of-the-art lncRNA–miRNA interaction prediction approach by 2%, 8%, and 4% in terms of accuracy, specificity, and matthews correlation coefficient. Furthermore, a case study analysis indicates that BoT-Net also outperforms state-of-the-art lncRNA–protein interaction predictor on a benchmark dataset by accuracy of 10%, sensitivity of 19%, specificity of 6%, precision of 14%, and matthews correlation coefficient of 26%. Conclusion In the benchmark lncRNA–miRNA interaction prediction dataset, the length of the lncRNA sequence varies from 213 residues to 22,743 residues and in the benchmark lncRNA–protein interaction prediction dataset, lncRNA sequences vary from 15 residues to 1504 residues. For such highly flexible length sequences, fixed length generation using copy padding introduces a significant level of bias which makes a large number of lncRNA sequences very much identical to each other and eventually derail classifier generalizeability. Empirical evaluation reveals that within 50 residues of only the starting region of long lncRNA sequences, a highly informative distribution for lncRNA–miRNA interaction prediction is contained, a crucial finding exploited by the proposed BoT-Net approach to optimize the lncRNA fixed length generation process. Availability: BoT-Net web server can be accessed at https://sds_genetic_analysis.opendfki.de/lncmiRNA/. Graphic Abstract ![]()
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Christoph Zehe
- Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany
| | - Johan Trygg
- Sartorius Stedim Cellca GmbH, 88471, Laupheim, Baden-Wurttemberg, Germany
- Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden
| | - Andreas Dengel
- Department of Computer Science, Technical University of Kaiserslautern, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, 67663, Kaiserslautern, Rhineland-Palatinate, Germany
- Computational Life Science Cluster (CLiC), Umea University, 90187, Umea, Sweden
| |
Collapse
|
8
|
Song J, Tian S, Yu L, Yang Q, Xing Y, Zhang C, Dai Q, Duan X. MD-MLI: Prediction of miRNA-lncRNA Interaction by Using Multiple Features and Hierarchical Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1724-1733. [PMID: 33125334 DOI: 10.1109/tcbb.2020.3034922] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Long non-coding RNA(lncRNA) can interact with microRNA(miRNA) and play an important role in inhibiting or activating the expression of target genes and the occurrence and development of tumors. Accumulating studies focus on the prediction of miRNA-lncRNA interaction, and mostly are concerned with biological experiments and machine learning methods. These methods are found with long cycles, high costs, and requiring over much human intervention. In this paper, a data-driven hierarchical deep learning framework was proposed, which was composed of a capsule network, an independent recurrent neural network with attention mechanism and bi-directional long short-term memory network. This framework combines the advantages of different networks, uses multiple sequence-derived features of the original sequence and features of secondary structure to mine the dependency between features, and devotes to obtain better results. In the experiment, five-fold cross-validation was used to evaluate the performance of the model, and the zea mays data set was compared with the different model to obtain better classification effect. In addition, sorghum, brachypodium distachyon and bryophyte data sets were used to test the model, and the accuracy reached 0.9850, 0.9859 and 0.9777, respectively, which verified the model's good generalization ability.
Collapse
|
9
|
Jiang H, Huang Y. An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network. BMC Bioinformatics 2022; 23:9. [PMID: 34983364 PMCID: PMC8726520 DOI: 10.1186/s12859-021-04553-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 12/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. RESULTS In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. CONCLUSIONS The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.
Collapse
Affiliation(s)
- Hanjing Jiang
- Key Laboratory of Image Information Processing and Intelligent Control of Education Ministry of China, Institute of Artificial Intelligence, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Yabing Huang
- Department of Pathology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei, China.
| |
Collapse
|
10
|
Zhou H, Gao Y, Li X, Shang S, Wang P, Zhi H, Guo S, Sun D, Liu H, Li X, Zhang Y, Ning S. Identifying and characterizing lincRNA genomic clusters reveals its cooperative functions in human cancer. J Transl Med 2021; 19:509. [PMID: 34906173 PMCID: PMC8672572 DOI: 10.1186/s12967-021-03179-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 12/03/2021] [Indexed: 02/01/2023] Open
Abstract
Background Emerging evidence has revealed that some long intergenic non-coding RNAs (lincRNAs) are likely to form clusters on the same chromosome, and lincRNA genomic clusters might play critical roles in the pathophysiological mechanism. However, the comprehensive investigation of lincRNA clustering is rarely studied, particularly the characterization of their functional significance across different cancer types. Methods In this study, we firstly constructed a computational method basing a sliding window approach for systematically identifying lincRNA genomic clusters. We then dissected these lincRNA genomic clusters to identify common characteristics in cooperative expression, conservation among divergent species, targeted miRNAs, and CNV frequency. Next, we performed comprehensive analyses in differentially-expressed patterns and overall survival outcomes for patients from The Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) across multiple cancer types. Finally, we explored the underlying mechanisms of lincRNA genomic clusters by functional enrichment analysis, pathway analysis, and drug-target interaction. Results We identified lincRNA genomic clusters according to the algorithm. Clustering lincRNAs tended to be co-expressed, highly conserved, targeted by more miRNAs, and with similar deletion and duplication frequency, suggesting that lincRNA genomic clusters may exert their effects by acting in combination. We further systematically explored conserved and cancer-specific lincRNA genomic clusters, indicating they were involved in some important mechanisms of disease occurrence through diverse approaches. Furthermore, lincRNA genomic clusters can serve as biomarkers with potential clinical significance and involve in specific pathological processes in the development of cancer. Moreover, a lincRNA genomic cluster named Cluster127 in DLK1-DIO3 imprinted locus was discovered, which contained MEG3, MEG8, MEG9, MIR381HG, LINC02285, AL132709.5, and AL132709.1. Further analysis indicated that Cluster127 may have the potential for predicting prognosis in cancer and could play their roles by participating in the regulation of PI3K-AKT signaling pathway. Conclusions Clarification of the lincRNA genomic clusters specific roles in human cancers could be beneficial for understanding the molecular pathogenesis of different cancer types. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-03179-5.
Collapse
Affiliation(s)
- Hanxiao Zhou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Yue Gao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xin Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Shipeng Shang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Peng Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Hui Zhi
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Shuang Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Dailin Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Hongjia Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| | - Yunpeng Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| | - Shangwei Ning
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, Heilongjiang, China.
| |
Collapse
|
11
|
Zhan WL, Gao N, Tu GL, Tang H, Gao L, Xia Y. LncRNA LINC00689 Promotes the Tumorigenesis of Glioma via Mediation of miR-526b-3p/IGF2BP1 Axis. Neuromolecular Med 2021; 23:383-394. [PMID: 33389570 DOI: 10.1007/s12017-020-08635-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2020] [Accepted: 11/22/2020] [Indexed: 12/14/2022]
Abstract
Glioma ranks first among the aggressive brain tumors all over the world. LncRNA LINC00689 has been confirmed to play key roles in the progression of cancers, and LINC00689 was upregulated in glioma. However, the biological function of LINC00689 in glioma is unclear. qRT-PCR was applied to detect the expressions of LINC00689 and miR-526b-3p in glioma cells. Dual-luciferase report was performed to examine the relation among LINC00689, miR-526b-3p, and insulin-like growth factor 2 mRNA-binding protein 1 (IGF2BP1). Then, the growth, migration, and invasion of glioma cells were detected by colony formation, flow cytometry, and transwell assay, respectively. The expressions of p21, cleaved caspase 3, and MAPK signaling-related proteins in glioma cells were tested by western blotting. Finally, xenograft mice model was established to detect the effect of LINC00689 on tumor growth of glioma in vivo. LINC00689 was upregulated in glioma cells, while miR-526b-3p was downregulated. In addition, LINC00689 bound to miR-526b-3p, and IGFBP1 was targeted by miR-526b-3p. Moreover, LINC00689 knockdown or upregulation of miR-526b-3p inhibited the proliferation of glioma cells and induced the apoptosis. Consistently, the migration and invasion of glioma cells were notably reduced by LINC00689 shRNA/miR-526-3p mimics. miR-526b-3p inhibitor or IGF2BP1 upregulation could reverse the effect of LINC00689 knockdown or miR-526b-3p mimics. Finally, knockdown of LINC00689 inhibited the tumor growth of glioma in vivo through regulating miR-526b-3p/IGF2BP1/MAPK axis. In conclusion, silencing of LINC00689 could inhibit the tumorigenesis of glioma via mediation of miR-526b-3p/IGF2BP1 axis. LINC00689 may serve as a new target for the treatment of glioma.
Collapse
Affiliation(s)
- Wen-Liang Zhan
- Department of Neurosurgery, Affiliated Haikou Hospital at Xiangya Medical College, Central South University, No.43, People's Avenue, Haidian Island, Haikou, 570208, Hainan Province, People's Republic of China
| | - Ning Gao
- Department of Neurosurgery, Affiliated Haikou Hospital at Xiangya Medical College, Central South University, No.43, People's Avenue, Haidian Island, Haikou, 570208, Hainan Province, People's Republic of China
| | - Guo-Long Tu
- Department of Neurosurgery, Affiliated Haikou Hospital at Xiangya Medical College, Central South University, No.43, People's Avenue, Haidian Island, Haikou, 570208, Hainan Province, People's Republic of China
| | - Hong Tang
- Department of Neurosurgery, Affiliated Haikou Hospital at Xiangya Medical College, Central South University, No.43, People's Avenue, Haidian Island, Haikou, 570208, Hainan Province, People's Republic of China
| | - Ling Gao
- Department of Neurosurgery, Affiliated Haikou Hospital at Xiangya Medical College, Central South University, No.43, People's Avenue, Haidian Island, Haikou, 570208, Hainan Province, People's Republic of China
| | - Ying Xia
- Department of Neurosurgery, Affiliated Haikou Hospital at Xiangya Medical College, Central South University, No.43, People's Avenue, Haidian Island, Haikou, 570208, Hainan Province, People's Republic of China.
| |
Collapse
|
12
|
Zhao C, Qiu Y, Zhou S, Liu S, Zhang W, Niu Y. Graph embedding ensemble methods based on the heterogeneous network for lncRNA-miRNA interaction prediction. BMC Genomics 2020; 21:867. [PMID: 33334307 PMCID: PMC7745483 DOI: 10.1186/s12864-020-07238-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Researchers discover LncRNA-miRNA regulatory paradigms modulate gene expression patterns and drive major cellular processes. Identification of lncRNA-miRNA interactions (LMIs) is critical to reveal the mechanism of biological processes and complicated diseases. Because conventional wet experiments are time-consuming, labor-intensive and costly, a few computational methods have been proposed to expedite the identification of lncRNA-miRNA interactions. However, little attention has been paid to fully exploit the structural and topological information of the lncRNA-miRNA interaction network. RESULTS In this paper, we propose novel lncRNA-miRNA prediction methods by using graph embedding and ensemble learning. First, we calculate lncRNA-lncRNA sequence similarity and miRNA-miRNA sequence similarity, and then we combine them with the known lncRNA-miRNA interactions to construct a heterogeneous network. Second, we adopt several graph embedding methods to learn embedded representations of lncRNAs and miRNAs from the heterogeneous network, and construct the ensemble models using two ensemble strategies. For the former, we consider individual graph embedding based models as base predictors and integrate their predictions, and develop a method, named GEEL-PI. For the latter, we construct a deep attention neural network (DANN) to integrate various graph embeddings, and present an ensemble method, named GEEL-FI. The experimental results demonstrate both GEEL-PI and GEEL-FI outperform other state-of-the-art methods. The effectiveness of two ensemble strategies is validated by further experiments. Moreover, the case studies show that GEEL-PI and GEEL-FI can find novel lncRNA-miRNA associations. CONCLUSION The study reveals that graph embedding and ensemble learning based method is efficient for integrating heterogeneous information derived from lncRNA-miRNA interaction network and can achieve better performance on LMI prediction task. In conclusion, GEEL-PI and GEEL-FI are promising for lncRNA-miRNA interaction prediction.
Collapse
Affiliation(s)
- Chengshuai Zhao
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yang Qiu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Shuang Zhou
- School of Computer Science, Wuhan University, Wuhan, 430072, China
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Yanqing Niu
- School of Mathematics and Statistics, South-Central University for Nationalities, Wuhan, 430074, China.
| |
Collapse
|
13
|
Alam T, Al-Absi HRH, Schmeier S. Deep Learning in LncRNAome: Contribution, Challenges, and Perspectives. Noncoding RNA 2020; 6:E47. [PMID: 33266128 PMCID: PMC7711891 DOI: 10.3390/ncrna6040047] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 10/27/2020] [Accepted: 11/06/2020] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNA), the pervasively transcribed part of the mammalian genome, have played a significant role in changing our protein-centric view of genomes. The abundance of lncRNAs and their diverse roles across cell types have opened numerous avenues for the research community regarding lncRNAome. To discover and understand lncRNAome, many sophisticated computational techniques have been leveraged. Recently, deep learning (DL)-based modeling techniques have been successfully used in genomics due to their capacity to handle large amounts of data and produce relatively better results than traditional machine learning (ML) models. DL-based modeling techniques have now become a choice for many modeling tasks in the field of lncRNAome as well. In this review article, we summarized the contribution of DL-based methods in nine different lncRNAome research areas. We also outlined DL-based techniques leveraged in lncRNAome, highlighting the challenges computational scientists face while developing DL-based models for lncRNAome. To the best of our knowledge, this is the first review article that summarizes the role of DL-based techniques in multiple areas of lncRNAome.
Collapse
Affiliation(s)
- Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Hamada R. H. Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Sebastian Schmeier
- School of Natural and Computational Sciences, Massey University, Auckland 0632, New Zealand;
| |
Collapse
|
14
|
Wang W, Guan X, Khan MT, Xiong Y, Wei DQ. LMI-DForest: A deep forest model towards the prediction of lncRNA-miRNA interactions. Comput Biol Chem 2020; 89:107406. [PMID: 33120126 DOI: 10.1016/j.compbiolchem.2020.107406] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 10/12/2020] [Accepted: 10/15/2020] [Indexed: 02/07/2023]
Abstract
The interactions between miRNAs and long non-coding RNAs (lncRNAs) are subject to intensive recent studies due to its critical role in gene regulations. Computational prediction of lncRNA-miRNA interactions has become a popular alternative strategy to the experimental methods for identification of underlying interactions. It is desirable to develop the machine learning-based models for prediction of lncRNA-miRNA based on the experimentally validated interactions between lncRNAs and miRNAs. The accuracy and robustness of existing models based on machine learning techniques are subject to further improvement. Considering that the attributes of lncRNA and miRNA contribute key importance in the interaction between these two RNAs, a deep learning model, named LMI-DForest, is proposed here by combining the deep forest and autoencoder strategies. Systematic comparison on the experiment validated datasets for lncRNA-miRNA interaction datasets demonstrates that the proposed method consistently shows superior performance over the other machine learning models in the lncRNA-miRNA interaction prediction.
Collapse
Affiliation(s)
- Wei Wang
- School of Mathematical Sciences, Shanghai Jiao Tong University, Shanghai, China
| | - Xiaoqing Guan
- Institute of Interdisciplinary Integrative Medicine Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Muhammad Tahir Khan
- Institute of Molecular Biology and Biotechnology, The University of Lahore Pakistan, Pakistan
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China; Peng Cheng Laboratory, Shenzhen, Guangdong, China.
| |
Collapse
|
15
|
Yang S, Wang Y, Lin Y, Shao D, He K, Huang L. LncMirNet: Predicting LncRNA-miRNA Interaction Based on Deep Learning of Ribonucleic Acid Sequences. Molecules 2020; 25:E4372. [PMID: 32977679 PMCID: PMC7583909 DOI: 10.3390/molecules25194372] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 09/19/2020] [Accepted: 09/22/2020] [Indexed: 12/22/2022] Open
Abstract
Long non-coding RNA (LncRNA) and microRNA (miRNA) are both non-coding RNAs that play significant regulatory roles in many life processes. There is cumulating evidence showing that the interaction patterns between lncRNAs and miRNAs are highly related to cancer development, gene regulation, cellular metabolic process, etc. Contemporaneously, with the rapid development of RNA sequence technology, numerous novel lncRNAs and miRNAs have been found, which might help to explore novel regulated patterns. However, the increasing unknown interactions between lncRNAs and miRNAs may hinder finding the novel regulated pattern, and wet experiments to identify the potential interaction are costly and time-consuming. Furthermore, few computational tools are available for predicting lncRNA-miRNA interaction based on a sequential level. In this paper, we propose a hybrid sequence feature-based model, LncMirNet (lncRNA-miRNA interactions network), to predict lncRNA-miRNA interactions via deep convolutional neural networks (CNN). First, four categories of sequence-based features are introduced to encode lncRNA/miRNA sequences including k-mer (k = 1, 2, 3, 4), composition transition distribution (CTD), doc2vec, and graph embedding features. Then, to fit the CNN learning pattern, a histogram-dd method is incorporated to fuse multiple types of features into a matrix. Finally, LncMirNet attained excellent performance in comparison with six other state-of-the-art methods on a real dataset collected from lncRNASNP2 via five-fold cross validation. LncMirNet increased accuracy and area under curve (AUC) by more than 3%, respectively, over that of the other tools, and improved the Matthews correlation coefficient (MCC) by more than 6%. These results show that LncMirNet can obtain high confidence in predicting potential interactions between lncRNAs and miRNAs.
Collapse
Affiliation(s)
- Sen Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (S.Y.); (D.S.); (K.H.); (L.H.)
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (S.Y.); (D.S.); (K.H.); (L.H.)
- School of Artificial Intelligence, Jilin University, Changchun 130012, China;
| | - Yu Lin
- School of Artificial Intelligence, Jilin University, Changchun 130012, China;
| | - Dan Shao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (S.Y.); (D.S.); (K.H.); (L.H.)
| | - Kai He
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (S.Y.); (D.S.); (K.H.); (L.H.)
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (S.Y.); (D.S.); (K.H.); (L.H.)
| |
Collapse
|
16
|
Yi HC, You ZH, Huang DS, Guo ZH, Chan KCC, Li Y. Learning Representations to Predict Intermolecular Interactions on Large-Scale Heterogeneous Molecular Association Network. iScience 2020; 23:101261. [PMID: 32580123 PMCID: PMC7317230 DOI: 10.1016/j.isci.2020.101261] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 04/29/2020] [Accepted: 06/08/2020] [Indexed: 02/07/2023] Open
Abstract
Molecular components that are functionally interdependent in human cells constitute molecular association networks. Disease can be caused by disturbance of multiple molecular interactions. New biomolecular regulatory mechanisms can be revealed by discovering new biomolecular interactions. To this end, a heterogeneous molecular association network is formed by systematically integrating comprehensive associations between miRNAs, lncRNAs, circRNAs, mRNAs, proteins, drugs, microbes, and complex diseases. We propose a machine learning method for predicting intermolecular interactions, named MMI-Pred. More specifically, a network embedding model is developed to fully exploit the network behavior of biomolecules, and attribute features are also calculated. Then, these discriminative features are combined to train a random forest classifier to predict intermolecular interactions. MMI-Pred achieves an outstanding performance of 93.50% accuracy in hybrid associations prediction under 5-fold cross-validation. This work provides systematic landscape and machine learning method to model and infer complex associations between various biological components.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Zhen-Hao Guo
- Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Keith C C Chan
- Department of Computing, Hong Kong Polytechnic University, Hong Kong SAR 999077, China
| | - Yangming Li
- College of Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
17
|
Guo ZH, You ZH, Wang YB, Huang DS, Yi HC, Chen ZH. Bioentity2vec: Attribute- and behavior-driven representation for predicting multi-type relationships between bioentities. Gigascience 2020; 9:giaa032. [PMID: 32533701 PMCID: PMC7293023 DOI: 10.1093/gigascience/giaa032] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2019] [Revised: 01/06/2020] [Accepted: 03/13/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND The explosive growth of genomic, chemical, and pathological data provides new opportunities and challenges for humans to thoroughly understand life activities in cells. However, there exist few computational models that aggregate various bioentities to comprehensively reveal the physical and functional landscape of biological systems. RESULTS We constructed a molecular association network, which contains 18 edges (relationships) between 8 nodes (bioentities). Based on this, we propose Bioentity2vec, a new method for representing bioentities, which integrates information about the attributes and behaviors of a bioentity. Applying the random forest classifier, we achieved promising performance on 18 relationships, with an area under the curve of 0.9608 and an area under the precision-recall curve of 0.9572. CONCLUSIONS Our study shows that constructing a network with rich topological and biological information is important for systematic understanding of the biological landscape at the molecular level. Our results show that Bioentity2vec can effectively represent biological entities and provides easily distinguishable information about classification tasks. Our method is also able to simultaneously predict relationships between single types and multiple types, which will accelerate progress in biological experimental research and industrial product development.
Collapse
Affiliation(s)
- Zhen-Hao Guo
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yan-Bin Wang
- School of Cyber Science and Technology, Zhejiang University, Hangzhou 310000, Zhejiang, China
| | - De-Shuang Huang
- Computer Science Department, Tongji University, Shanghai 200000, China
| | - Hai-Cheng Yi
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhan-Heng Chen
- XinJiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, No. 40-1, Beijing South Road, Urumqi, Xinjiang, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
18
|
Peng L, Liu F, Yang J, Liu X, Meng Y, Deng X, Peng C, Tian G, Zhou L. Probing lncRNA-Protein Interactions: Data Repositories, Models, and Algorithms. Front Genet 2020; 10:1346. [PMID: 32082358 PMCID: PMC7005249 DOI: 10.3389/fgene.2019.01346] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2019] [Accepted: 12/09/2019] [Indexed: 12/31/2022] Open
Abstract
Identifying lncRNA-protein interactions (LPIs) is vital to understanding various key biological processes. Wet experiments found a few LPIs, but experimental methods are costly and time-consuming. Therefore, computational methods are increasingly exploited to capture LPI candidates. We introduced relevant data repositories, focused on two types of LPI prediction models: network-based methods and machine learning-based methods. Machine learning-based methods contain matrix factorization-based techniques and ensemble learning-based techniques. To detect the performance of computational methods, we compared parts of LPI prediction models on Leave-One-Out cross-validation (LOOCV) and fivefold cross-validation. The results show that SFPEL-LPI obtained the best performance of AUC. Although computational models have efficiently unraveled some LPI candidates, there are many limitations involved. We discussed future directions to further boost LPI predictive performance.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Fuxing Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Jialiang Yang
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Xiaojun Liu
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Yajie Meng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiaojun Deng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Cheng Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Geng Tian
- Department of Sciences, Genesis (Beijing) Co. Ltd., Beijing, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| |
Collapse
|
19
|
Zhang W, Tang G, Zhou S, Niu Y. LncRNA-miRNA interaction prediction through sequence-derived linear neighborhood propagation method with information combination. BMC Genomics 2019; 20:946. [PMID: 31856716 PMCID: PMC6923828 DOI: 10.1186/s12864-019-6284-y] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
BACKGROUND Researchers discover lncRNAs can act as decoys or sponges to regulate the behavior of miRNAs. Identification of lncRNA-miRNA interactions helps to understand the functions of lncRNAs, especially their roles in complicated diseases. Computational methods can save time and reduce cost in identifying lncRNA-miRNA interactions, but there have been only a few computational methods. RESULTS In this paper, we propose a sequence-derived linear neighborhood propagation method (SLNPM) to predict lncRNA-miRNA interactions. First, we calculate the integrated lncRNA-lncRNA similarity and the integrated miRNA-miRNA similarity by combining known lncRNA-miRNA interactions, lncRNA sequences and miRNA sequences. We consider two similarity calculation strategies respectively, namely similarity-based information combination (SC) and interaction profile-based information combination (PC). Second, the integrated lncRNA similarity-based graph and the integrated miRNA similarity-based graph are respectively constructed, and the label propagation processes are implemented on two graphs to score lncRNA-miRNA pairs. Finally, the weighted averages of their outputs are adopted as final predictions. Therefore, we construct two editions of SLNPM: sequence-derived linear neighborhood propagation method based on similarity information combination (SLNPM-SC) and sequence-derived linear neighborhood propagation method based on interaction profile information combination (SLNPM-PC). The experimental results show that SLNPM-SC and SLNPM-PC predict lncRNA-miRNA interactions with higher accuracy compared with other state-of-the-art methods. The case studies demonstrate that SLNPM-SC and SLNPM-PC help to find novel lncRNA-miRNA interactions for given lncRNAs or miRNAs. CONCLUSION The study reveals that known interactions bring the most important information for lncRNA-miRNA interaction prediction, and sequences of lncRNAs (miRNAs) also provide useful information. In conclusion, SLNPM-SC and SLNPM-PC are promising for lncRNA-miRNA interaction prediction.
Collapse
Affiliation(s)
- Wen Zhang
- College of informatics, Huazhong Agricultural University, Wuhan, 430070 China
| | - Guifeng Tang
- School of Computer Science, Wuhan University, Wuhan, 430072 China
| | - Shuang Zhou
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
| | - Yanqing Niu
- School of Mathematics and Statistics, South-Central University for Nationalities, Wuhan, 430074 China
| |
Collapse
|
20
|
Zhang P, Meng J, Luan Y, Liu C. Plant miRNA-lncRNA Interaction Prediction with the Ensemble of CNN and IndRNN. Interdiscip Sci 2019; 12:82-89. [PMID: 31811618 DOI: 10.1007/s12539-019-00351-w] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2019] [Revised: 10/11/2019] [Accepted: 11/19/2019] [Indexed: 12/22/2022]
Abstract
Non-coding RNA (ncRNA) plays an important role in regulating biological activities of animals and plants, and the representative ones are microRNA (miRNA) and long non-coding RNA (lncRNA). Recent research has found that predicting the interaction between miRNA and lncRNA is the primary task for elucidating their functional mechanisms. Due to the small scale of data, a large amount of noise, and the limitations of human factors, the prediction accuracy and reliability of traditional feature-based classification methods are often affected. Besides, the structure of plant ncRNA is complex. This paper proposes an ensemble deep-learning model based on convolutional neural network (CNN) and independently recurrent neural network (IndRNN) for predicting the interaction between miRNA and lncRNA of plants, namely, CIRNN. The model uses CNN to explore the functional features of gene sequences automatically, leverages IndRNN to obtain the representation of sequence features, and learns the dependencies among sequences; thus, it overcomes the inaccuracy caused by human factors in traditional feature engineering. The experiment results show that the proposed model is superior to shallow machine-learning and existing deep-learning models when dealing with large-scale data, especially for the long sequence.
Collapse
Affiliation(s)
- Peng Zhang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Chanjuan Liu
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| |
Collapse
|
21
|
Yi HC, You ZH, Guo ZH. Construction and Analysis of Molecular Association Network by Combining Behavior Representation and Node Attributes. Front Genet 2019; 10:1106. [PMID: 31788002 PMCID: PMC6854842 DOI: 10.3389/fgene.2019.01106] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Accepted: 10/15/2019] [Indexed: 11/13/2022] Open
Abstract
A key aim of post-genomic biomedical research is to systematically understand and model complex biomolecular activities based on a systematic perspective. Biomolecular interactions are widespread and interrelated, multiple biomolecules coordinate to sustain life activities, any disturbance of these complex connections can lead to abnormal of life activities or complex diseases. However, many existing researches usually only focus on individual intermolecular interactions. In this work, we revealed, constructed, and analyzed a large-scale molecular association network of multiple biomolecules in human by integrating associations among lncRNAs, miRNAs, proteins, drugs, and diseases, in which various associations are interconnected and any type of associations can be predicted. We propose Molecular Association Network (MAN)–High-Order Proximity preserved Embedding (HOPE), a novel network representation learning based method to fully exploit latent feature of biomolecules to accurately predict associations between molecules. More specifically, network representation learning algorithm HOPE was applied to learn behavior feature of nodes in the association network. Attribute features of nodes were also adopted. Then, a machine learning model CatBoost was trained to predict potential association between any nodes. The performance of our method was evaluated under five-fold cross validation. A case study to predict miRNA-disease associations was also conducted to verify the prediction capability. MAN-HOPE achieves high accuracy of 93.3% and area under the receiver operating characteristic curve of 0.9793. The experimental results demonstrate the novelty of our systematic understanding of the intermolecular associations, and enable systematic exploration of the landscape of molecular interactions that shape specialized cellular functions.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Zhen-Hao Guo
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| |
Collapse
|
22
|
Huang YA, Huang ZA, You ZH, Zhu Z, Huang WZ, Guo JX, Yu CQ. Predicting lncRNA-miRNA Interaction via Graph Convolution Auto-Encoder. Front Genet 2019; 10:758. [PMID: 31555320 PMCID: PMC6727066 DOI: 10.3389/fgene.2019.00758] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Accepted: 07/17/2019] [Indexed: 12/14/2022] Open
Abstract
The interaction of miRNA and lncRNA is known to be important for gene regulations. However, the number of known lncRNA-miRNA interactions is still very limited and there are limited computational tools available for predicting new ones. Considering that lncRNAs and miRNAs share internal patterns in the partnership between each other, the underlying lncRNA-miRNA interactions could be predicted by utilizing the known ones, which could be considered as a semi-supervised learning problem. It is shown that the attributes of lncRNA and miRNA have a close relationship with the interaction between each other. Effective use of side information could be helpful for improving the performance especially when the training samples are limited. In view of this, we proposed an end-to-end prediction model called GCLMI (Graph Convolution for novel lncRNA-miRNA Interactions) by combining the techniques of graph convolution and auto-encoder. Without any preprocessing process on the feature information, our method can incorporate raw data of node attributes with the topology of the interaction network. Based on a real dataset collected from a public database, the results of experiments conducted on k-fold cross validations illustrate the robustness and effectiveness of the prediction performance of the proposed prediction model. We prove the graph convolution layer as designed in the proposed model able to effectively integrate the input data by filtering the graph with node features. The proposed model is anticipated to yield highly potential lncRNA-miRNA interactions in the scenario that different types of numerical features describing lncRNA or miRNA are provided by users, serving as a useful computational tool.
Collapse
Affiliation(s)
- Yu-An Huang
- College of Electronics and Information Engineering, Xijing University, Xi'an, China
| | - Zhi-An Huang
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Zhu-Hong You
- College of Electronics and Information Engineering, Xijing University, Xi'an, China
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Wen-Zhun Huang
- College of Electronics and Information Engineering, Xijing University, Xi'an, China
| | - Jian-Xin Guo
- College of Electronics and Information Engineering, Xijing University, Xi'an, China
| | - Chang-Qing Yu
- College of Electronics and Information Engineering, Xijing University, Xi'an, China
| |
Collapse
|