1
|
Lan W, Li C, Chen Q, Yu N, Pan Y, Zheng Y, Chen YPP. LGCDA: Predicting CircRNA-Disease Association Based on Fusion of Local and Global Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1413-1422. [PMID: 38607720 DOI: 10.1109/tcbb.2024.3387913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/14/2024]
Abstract
CircRNA has been shown to be involved in the occurrence of many diseases. Several computational frameworks have been proposed to identify circRNA-disease associations. Despite the existing computational methods have obtained considerable successes, these methods still require to be improved as their performance may degrade due to the sparsity of the data and the problem of memory overflow. We develop a novel computational framework called LGCDA to predict circRNA-disease associations by fusing local and global features to solve the above mentioned problems. First, we construct closed local subgraphs by using k-hop closed subgraph and label the subgraphs to obtain rich graph pattern information. Then, the local features are extracted by using graph neural network (GNN). In addition, we fuse Gaussian interaction profile (GIP) kernel and cosine similarity to obtain global features. Finally, the score of circRNA-disease associations is predicted by using the multilayer perceptron (MLP) based on local and global features. We perform five-fold cross validation on five datasets for model evaluation and our model surpasses other advanced methods.
Collapse
|
2
|
Xuan P, Wang W, Cui H, Wang S, Nakaguchi T, Zhang T. Mask-Guided Target Node Feature Learning and Dynamic Detailed Feature Enhancement for lncRNA-Disease Association Prediction. J Chem Inf Model 2024; 64:6662-6675. [PMID: 39112431 DOI: 10.1021/acs.jcim.4c00652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/27/2024]
Abstract
Identifying new relevant long noncoding RNAs (lncRNAs) for various human diseases can facilitate the exploration of the causes and progression of these diseases. Recently, several graph inference methods have been proposed to predict disease-related lncRNAs by exploiting the topological structure and node attributes within graphs. However, these methods did not prioritize the target lncRNA and disease nodes over auxiliary nodes like miRNA nodes, potentially limiting their ability to fully utilize the features of the target nodes. We propose a new method, mask-guided target node feature learning and dynamic detailed feature enhancement for lncRNA-disease association prediction (MDLD), to enhance node feature learning for improved lncRNA-disease association prediction. First, we designed a heterogeneous graph masked transformer autoencoder to guide feature learning, focusing more on the features of target lncRNA (disease) nodes. The target nodes were increasingly masked as training progressed, which helps develop a more robust prediction model. Second, we developed a graph convolutional network with dynamic residuals (GCNDR) to learn and integrate the heterogeneous topology and features of all lncRNA, disease, and miRNA nodes. GCNDR employs an interlayer residual strategy and a residual evolution strategy to mitigate oversmoothing caused by multilayer graph convolution. The interlayer residual strategy estimates the importance of node features learned in the previous GCN encoding layer for nodes in the current encoding layer. Additionally, since there are dependencies in the importance of features of individual lncRNA (disease, miRNA) nodes across multiple encoding layers, a gated recurrent unit-based strategy is proposed to encode these dependencies. Finally, we designed a perspective-level attention mechanism to obtain more informative features of lncRNA and disease node pairs from the perspectives of mask-enhanced and dynamic-enhanced node features. Cross-validation experimental results demonstrated that MDLD outperformed 10 other state-of-the-art prediction methods. Ablation experiments and case studies on candidate lncRNAs for three diseases further proved the technical contributions of MDLD and its capability to discover disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Wei Wang
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
3
|
Xuan P, Lu S, Cui H, Wang S, Nakaguchi T, Zhang T. Learning Association Characteristics by Dynamic Hypergraph and Gated Convolution Enhanced Pairwise Attributes for Prediction of Disease-Related lncRNAs. J Chem Inf Model 2024; 64:3569-3578. [PMID: 38523267 DOI: 10.1021/acs.jcim.4c00245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
As the long non-coding RNAs (lncRNAs) play important roles during the incurrence and development of various human diseases, identifying disease-related lncRNAs can contribute to clarifying the pathogenesis of diseases. Most of the recent lncRNA-disease association prediction methods utilized the multi-source data about the lncRNAs and diseases. A single lncRNA may participate in multiple disease processes, and multiple lncRNAs usually are involved in the same disease process synergistically. However, the previous methods did not completely exploit the biological characteristics to construct the informative prediction models. We construct a prediction model based on adaptive hypergraph and gated convolution for lncRNA-disease association prediction (AGLDA), to embed and encode the biological characteristics about lncRNA-disease associations, the topological features from the entire heterogeneous graph perspective, and the gated enhanced pairwise features. First, the strategy for constructing hyperedges is designed to reflect the biological characteristic that multiple lncRNAs are involved in multiple disease processes. Furthermore, each hyperedge has its own biological perspective, and multiple hyperedges are beneficial for revealing the diverse relationships among multiple lncRNAs and diseases. Second, we encode the biological features of each lncRNA (disease) node using a strategy based on dynamic hypergraph convolutional networks. The strategy may adaptively learn the features of the hyperedges and formulate the dynamically evolved hypergraph topological structure. Third, a group convolutional network is established to integrate the entire heterogeneous topological structure and multiple types of node attributes within an lncRNA-disease-miRNA graph. Finally, a gated convolutional strategy is proposed to enhance the informative features of the lncRNA-disease node pairs. The comparison experiments indicate that AGLDA outperforms seven advanced prediction methods. The ablation studies confirm the effectiveness of major innovations, and the case studies validate AGLDA's ability in application for discovering potential disease-related lncRNA candidates.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Siyuan Lu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
4
|
Han GS, Gao Q, Peng LZ, Tang J. Hessian Regularized [Formula: see text]-Nonnegative Matrix Factorization and Deep Learning for miRNA-Disease Associations Prediction. Interdiscip Sci 2024; 16:176-191. [PMID: 38099958 DOI: 10.1007/s12539-023-00594-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 11/05/2023] [Accepted: 11/07/2023] [Indexed: 02/22/2024]
Abstract
Since the identification of microRNAs (miRNAs), empirical research has demonstrated their crucial involvement in the functioning of organisms. Investigating miRNAs significantly bolsters efforts related to averting, diagnosing, and treating intricate human maladies. Yet, exploring every conceivable miRNA-disease association consumes significant resources and time within conventional wet experiments. On the computational front, forecasting potential miRNA-disease connections serves as a valuable source of preliminary insights for medical investigators. As a result, we have developed a novel matrix factorization model known as Hessian-regularized [Formula: see text] nonnegative matrix factorization in combination with deep learning for predicting associations between miRNAs and diseases, denoted as [Formula: see text]-NMF-DF. In particular, we introduce a novel iterative fusion approach to integrate all similarities. This method effectively diminishes the sparsity of the initial miRNA-disease associations matrix. Additionally, we devise a mixed model framework that utilizes deep learning, matrix decomposition, and singular value decomposition to capture and depict the intricate nonlinear features of miRNA and disease. The prediction performance of the six matrix factorization methods is improved by comparison and analysis, similarity matrix fusion, data preprocessing, and parameter adjustment. The AUC and AUPR obtained by the new matrix factorization model under fivefold cross validation are comparative or better with other matrix factorization models. Finally, we select three diseases including lung tumor, bladder tumor and breast tumor for case analysis, and further extend the matrix factorization model based on deep learning. The results show that the hybrid algorithm combining matrix factorization with deep learning proposed in this paper can predict miRNAs related to different diseases with high accuracy.
Collapse
Affiliation(s)
- Guo-Sheng Han
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China.
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China.
| | - Qi Gao
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Ling-Zhi Peng
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| | - Jing Tang
- Department of Mathematics and Computational Science, Xiangtan University, Xiangtan, 411105, China
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, 411105, China
| |
Collapse
|
5
|
Xu F, Liu S, Zhao A, Shang M, Wang Q, Jiang S, Cheng Q, Chen X, Zhai X, Zhang J, Wang X, Yan J. iFLAS: positive-unlabeled learning facilitates full-length transcriptome-based identification and functional exploration of alternatively spliced isoforms in maize. THE NEW PHYTOLOGIST 2024; 241:2606-2620. [PMID: 38291701 DOI: 10.1111/nph.19554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Accepted: 01/06/2024] [Indexed: 02/01/2024]
Abstract
The advent of full-length transcriptome sequencing technologies has accelerated the discovery of novel splicing isoforms. However, existing alternative splicing (AS) tools are either tailored for short-read RNA-Seq data or designed for human and animal studies. The disparities in AS patterns between plants and animals still pose a challenge to the reliable identification and functional exploration of novel isoforms in plants. Here, we developed integrated full-length alternative splicing analysis (iFLAS), a plant-optimized AS toolkit that introduced a semi-supervised machine learning method known as positive-unlabeled (PU) learning to accurately identify novel isoforms. iFLAS also enables the investigation of AS functions from various perspectives, such as differential AS, poly(A) tail length, and allele-specific AS (ASAS) analyses. By applying iFLAS to three full-length transcriptome sequencing datasets, we systematically identified and functionally characterized maize (Zea mays) AS patterns. We found intron retention not only introduces premature termination codons, resulting in lower expression levels of isoforms, but may also regulate the length of 3'UTR and poly(A) tail, thereby affecting the functional differentiation of isoforms. Moreover, we observed distinct ASAS patterns in two genes within heterosis offspring, highlighting their potential value in breeding. These results underscore the broad applicability of iFLAS in plant full-length transcriptome-based AS research.
Collapse
Affiliation(s)
- Feng Xu
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Songyu Liu
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Anwen Zhao
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Meiqi Shang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Qian Wang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Shuqin Jiang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Qian Cheng
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Xingming Chen
- Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China
| | - Xiaoguang Zhai
- Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China
| | - Jianan Zhang
- Molbreeding Biotechnology Co., Ltd, Shijiazhuang, Hebei Province, 051430, China
| | - Xiangfeng Wang
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| | - Jun Yan
- State Key Laboratory of Maize Bio-Breeding, National Maize Improvement Center, Frontiers Science Center for Molecular Design Breeding, College of Agronomy and Biotechnology, China Agricultural University, Beijing, 100094, China
| |
Collapse
|
6
|
Rinaldi S, Moroni E, Rozza R, Magistrato A. Frontiers and Challenges of Computing ncRNAs Biogenesis, Function and Modulation. J Chem Theory Comput 2024; 20:993-1018. [PMID: 38287883 DOI: 10.1021/acs.jctc.3c01239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2024]
Abstract
Non-coding RNAs (ncRNAs), generated from nonprotein coding DNA sequences, constitute 98-99% of the human genome. Non-coding RNAs encompass diverse functional classes, including microRNAs, small interfering RNAs, PIWI-interacting RNAs, small nuclear RNAs, small nucleolar RNAs, and long non-coding RNAs. With critical involvement in gene expression and regulation across various biological and physiopathological contexts, such as neuronal disorders, immune responses, cardiovascular diseases, and cancer, non-coding RNAs are emerging as disease biomarkers and therapeutic targets. In this review, after providing an overview of non-coding RNAs' role in cell homeostasis, we illustrate the potential and the challenges of state-of-the-art computational methods exploited to study non-coding RNAs biogenesis, function, and modulation. This can be done by directly targeting them with small molecules or by altering their expression by targeting the cellular engines underlying their biosynthesis. Drawing from applications, also taken from our work, we showcase the significance and role of computer simulations in uncovering fundamental facets of ncRNA mechanisms and modulation. This information may set the basis to advance gene modulation tools and therapeutic strategies to address unmet medical needs.
Collapse
Affiliation(s)
- Silvia Rinaldi
- National Research Council of Italy (CNR) - Institute of Chemistry of OrganoMetallic Compounds (ICCOM), c/o Area di Ricerca CNR di Firenze Via Madonna del Piano 10, 50019 Sesto Fiorentino, Florence, Italy
| | - Elisabetta Moroni
- National Research Council of Italy (CNR) - Institute of Chemical Sciences and Technologies (SCITEC), via Mario Bianco 9, 20131 Milano, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR) - Institute of Material Foundry (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
7
|
Shakyawar SK, Sajja BR, Patel JC, Guda C. iCluF: an unsupervised iterative cluster-fusion method for patient stratification using multiomics data. BIOINFORMATICS ADVANCES 2024; 4:vbae015. [PMID: 38698887 PMCID: PMC11063539 DOI: 10.1093/bioadv/vbae015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 12/10/2023] [Accepted: 01/26/2024] [Indexed: 05/05/2024]
Abstract
Motivation Patient stratification is crucial for the effective treatment or management of heterogeneous diseases, including cancers. Multiomic technologies facilitate molecular characterization of human diseases; however, the complexity of data warrants the need for the development of robust data integration tools for patient stratification using machine-learning approaches. Results iCluF iteratively integrates three types of multiomic data (mRNA, miRNA, and DNA methylation) using pairwise patient similarity matrices built from each omic data. The intermediate omic-specific neighborhood matrices implement iterative matrix fusion and message passing among the similarity matrices to derive a final integrated matrix representing all the omics profiles of a patient, which is used to further cluster patients into subtypes. iCluF outperforms other methods with significant differences in the survival profiles of 8581 patients belonging to 30 different cancers in TCGA. iCluF also predicted the four intrinsic subtypes of Breast Invasive Carcinomas with adjusted rand index and Fowlkes-Mallows scores of 0.72 and 0.83, respectively. The Gini importance score showed that methylation features were the primary decisive players, followed by mRNA and miRNA to identify disease subtypes. iCluF can be applied to stratify patients with any disease containing multiomic datasets. Availability and implementation Source code and datasets are available at https://github.com/GudaLab/iCluF_core.
Collapse
Affiliation(s)
- Sushil K Shakyawar
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Balasrinivasa R Sajja
- Department of Radiology, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Jai Chand Patel
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, United States
| | - Chittibabu Guda
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE 68198, United States
- Department of Genetics, Cell Biology and Anatomy, Center for Biomedical Informatics Research and Innovation, University of Nebraska Medical Center, Omaha, NE 68198-5805, United States
| |
Collapse
|
8
|
Yao D, Li B, Zhan X, Zhan X, Yu L. GCNFORMER: graph convolutional network and transformer for predicting lncRNA-disease associations. BMC Bioinformatics 2024; 25:5. [PMID: 38166659 PMCID: PMC10763317 DOI: 10.1186/s12859-023-05625-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2023] [Accepted: 12/18/2023] [Indexed: 01/05/2024] Open
Abstract
BACKGROUND A growing body of researches indicate that the disrupted expression of long non-coding RNA (lncRNA) is linked to a range of human disorders. Therefore, the effective prediction of lncRNA-disease association (LDA) can not only suggest solutions to diagnose a condition but also save significant time and labor costs. METHOD In this work, we proposed a novel LDA predicting algorithm based on graph convolutional network and transformer, named GCNFORMER. Firstly, we integrated the intraclass similarity and interclass connections between miRNAs, lncRNAs and diseases, and built a graph adjacency matrix. Secondly, to completely obtain the features between various nodes, we employed a graph convolutional network for feature extraction. Finally, to obtain the global dependencies between inputs and outputs, we used a transformer encoder with a multiheaded attention mechanism to forecast lncRNA-disease associations. RESULTS The results of fivefold cross-validation experiment on the public dataset revealed that the AUC and AUPR of GCNFORMER achieved 0.9739 and 0.9812, respectively. We compared GCNFORMER with six advanced LDA prediction models, and the results indicated its superiority over the other six models. Furthermore, GCNFORMER's effectiveness in predicting potential LDAs is underscored by case studies on breast cancer, colon cancer and lung cancer. CONCLUSIONS The combination of graph convolutional network and transformer can effectively improve the performance of LDA prediction model and promote the in-depth development of this research filed.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China.
| | - Bailin Li
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South, University of Science and Technology, Shenzhen, 518055, China
| | - Liyang Yu
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
| |
Collapse
|
9
|
Lu Q, Liang Y, Meng X, Zhao Y, Fan H, Hou S. The Role of Long Noncoding RNAs in Intestinal Health and Diseases: A Focus on the Intestinal Barrier. Biomolecules 2023; 13:1674. [PMID: 38002356 PMCID: PMC10669616 DOI: 10.3390/biom13111674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 11/04/2023] [Accepted: 11/16/2023] [Indexed: 11/26/2023] Open
Abstract
The gut is the body's largest immune organ, and the intestinal barrier prevents harmful substances such as bacteria and toxins from passing through the gastrointestinal mucosa. Intestinal barrier dysfunction is closely associated with various diseases. However, there are currently no FDA-approved therapies targeting the intestinal epithelial barriers. Long noncoding RNAs (lncRNAs), a class of RNA transcripts with a length of more than 200 nucleotides and no coding capacity, are essential for the development and regulation of a variety of biological processes and diseases. lncRNAs are involved in the intestinal barrier function and homeostasis maintenance. This article reviews the emerging role of lncRNAs in the intestinal barrier and highlights the potential applications of lncRNAs in the treatment of various intestinal diseases by reviewing the literature on cells, animal models, and clinical patients. The aim is to explore potential lncRNAs involved in the intestinal barrier and provide new ideas for the diagnosis and treatment of intestinal barrier damage-associated diseases in the clinical setting.
Collapse
Affiliation(s)
- Qianying Lu
- Institute of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China; (Q.L.); (Y.L.); (X.M.); (S.H.)
- Tianjin Key Laboratory of Disaster Medicine Technology, Tianjin 300072, China
| | - Yangfan Liang
- Institute of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China; (Q.L.); (Y.L.); (X.M.); (S.H.)
- Tianjin Key Laboratory of Disaster Medicine Technology, Tianjin 300072, China
| | - Xiangyan Meng
- Institute of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China; (Q.L.); (Y.L.); (X.M.); (S.H.)
- Tianjin Key Laboratory of Disaster Medicine Technology, Tianjin 300072, China
| | - Yanmei Zhao
- Institute of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China; (Q.L.); (Y.L.); (X.M.); (S.H.)
- Tianjin Key Laboratory of Disaster Medicine Technology, Tianjin 300072, China
| | - Haojun Fan
- Institute of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China; (Q.L.); (Y.L.); (X.M.); (S.H.)
- Tianjin Key Laboratory of Disaster Medicine Technology, Tianjin 300072, China
| | - Shike Hou
- Institute of Disaster and Emergency Medicine, Tianjin University, Tianjin 300072, China; (Q.L.); (Y.L.); (X.M.); (S.H.)
- Tianjin Key Laboratory of Disaster Medicine Technology, Tianjin 300072, China
| |
Collapse
|
10
|
Wang S, Hui C, Zhang T, Wu P, Nakaguchi T, Xuan P. Graph Reasoning Method Based on Affinity Identification and Representation Decoupling for Predicting lncRNA-Disease Associations. J Chem Inf Model 2023; 63:6947-6958. [PMID: 37906529 DOI: 10.1021/acs.jcim.3c01214] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
An increasing number of studies have shown that dysregulation of lncRNAs is related to the occurrence of various diseases. Most of the previous methods, however, are designed based on homogeneity assumption that the representation of a target lncRNA (or disease) node should be updated by aggregating the attributes of its neighbor nodes. However, the assumption ignores the affinity nodes that are far from the target node. We present a novel prediction method, GAIRD, to fully leverage the heterogeneous information in the network and the decoupled node features. The first major innovation is a random walk strategy based on width-first searching and depth-first searching. Different from previous methods that only focus on homogeneous information, our new strategy learns both the homogeneous information within local neighborhoods and the heterogeneous information within higher-order neighborhoods. The second innovation is a representation decoupling module to extract the purer attributes and the purer topologies. Third, a module based on group convolution and deep separable convolution is developed to promote the pairwise intrachannel and interchannel feature learning. The experimental results show that GAIRD outperforms comparing state-of-the-art methods, and the ablation studies prove the contributions of major innovations. We also performed case studies on 3 diseases to further demonstrate the effectiveness of the GAIRD model in applications.
Collapse
Affiliation(s)
- Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Cui Hui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
- Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou 515063, China
| |
Collapse
|
11
|
Cao Y, Xiao J, Sheng N, Qu Y, Wang Z, Sun C, Mu X, Huang Z, Li X. X-LDA: An interpretable and knowledge-informed heterogeneous graph learning framework for LncRNA-disease association prediction. Comput Biol Med 2023; 167:107634. [PMID: 39491920 DOI: 10.1016/j.compbiomed.2023.107634] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/06/2023] [Accepted: 10/23/2023] [Indexed: 11/05/2024]
Abstract
The identification of disease-related long noncoding RNAs (lncRNAs) is beneficial to unravel the intricacies of gene expression regulation and epigenetic signatures. Computational methods provide a cost-effective means to explore lncRNA-disease associations (LDAs). However, these methods often lack interpretability, leaving their predictions less convincing to biological and medical researchers. We propose an interpretable and knowledge-informed heterogeneous graph learning framework based on graph patch convolution and integrated gradients to predict LDAs and provides intuitive explanations for its predictions, called X-LDA. The heterogeneous graph is the foundation of the predictions of LDAs, we construct the knowledge-informed heterogeneous graph including LDAs drawn from biological experiments, lncRNA similarities rooted in gene sequences, disease similarities constructed based on disease categorizations. To integrate diverse biological premises and facilitate interpretability, we define nine distinct graph patch types, which encapsulate essential topological relationships within lncRNA-disease node pairs. X-LDA is designed to employ parameter sharing and multi-convolution kernels to grasp common and multiple perspectives of the graph patches, respectively. This approach culminates in the fusion of various semantic information into context embeddings. These post-hoc explanations hinge on graph patch features and integrated gradients, shedding light on the underlying factors driving predictions. Cross validation experiment on the dataset curated from databases and literatures demonstrates that the superior performance of X-LDA in comparison to nine state-of-the-art methods of three categories. X-LDA achieves a larger average area under the receiver operating curve 0.9891 (by at least 6.68%), and a larger average area under the precision-recall curve 0.7907 (by at least 23.2%) than competitive methods. The results of our well-designed ablation and interpretability experiments and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis demonstrate X-LDA's robustness, learnability, predictability, and interpretability. The applicability of X-LDA is also demonstrated through a case study involving the investigation of associated lncRNAs in prostate cancer, colorectal cancer, and breast cancer.
Collapse
Affiliation(s)
- Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Jun Xiao
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Nan Sheng
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Yinwei Qu
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Zhihang Wang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China
| | - Chang Sun
- College of Computer Science, Nankai University, Tianjin, 300071, China
| | - Xuechen Mu
- School of Mathematics, Jilin University, Changchun, 130012, China
| | - Zhenyu Huang
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| | - Xuan Li
- College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
| |
Collapse
|
12
|
Sheng QJ, Tan Y, Zhang L, Wu ZP, Wang B, He XY. Heterogeneous graph framework for predicting the association between lncRNA and disease and case on uterine fibroid. Comput Biol Med 2023; 165:107331. [PMID: 37619322 DOI: 10.1016/j.compbiomed.2023.107331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 07/24/2023] [Accepted: 08/07/2023] [Indexed: 08/26/2023]
Abstract
Long non-coding RNAs (lncRNAs) play crucial regulatory roles in various cellular processes, including gene expression, chromatin remodeling, and protein localization. Dysregulation of lncRNAs has been linked to several diseases, making it essential to understand their functions in disease mechanisms and therapeutic strategies. However, traditional experimental methods for studying lncRNA function are time-consuming, expensive, and offer limited insights. In recent years, computational methods have emerged as valuable tools for predicting lncRNA functions and their associations with diseases. However, many existing methods focus on constructing separate networks for lncRNA and disease similarity, resulting in information loss and insufficient processing capacity for isolated nodes. To address this, we developed 'RGLD' by combining Random Walk with restarting (RWR), Graph Neural Network (GNN), and Graph Attention Networks (GAT) to predict lncRNA-disease associations in a heterogeneous network. RGLD achieved an impressive AUC of 0.88, outperforming other methods. It can also predict novel associations between lncRNAs and diseases. RGLD identified HOTAIR, MEG3, and PVT1 as lncRNAs associated with uterine fibroids. Biological experiments directly or indirectly verified the involvement of these three lncRNAs in uterine fibroids, validating the accuracy of RGLD's predictions. Furthermore, we extensively discussed the functions of the target genes regulated by these lncRNAs in uterine fibroids, providing evidence for their role in the development and progression of the disease.
Collapse
Affiliation(s)
- Qing-Jing Sheng
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai, China; Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai, China
| | - Yuan Tan
- Department of Integrated Traditional Chinese Medicine (TCM) & Western Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai, China; Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai, China
| | - Liyuan Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Zhi-Ping Wu
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai, China; Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai, China
| | - Beiying Wang
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai, China; Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai, China
| | - Xiao-Ying He
- Department of Gynecology, Shanghai First Maternity and Infant Hospital, School of Medicine, Tong Ji University, Shanghai, China; Shanghai Key Laboratory of Maternal and Fetal Medicine, Shanghai First Maternity and Infant Hospital, Shanghai, China.
| |
Collapse
|
13
|
Sheng N, Wang Y, Huang L, Gao L, Cao Y, Xie X, Fu Y. Multi-task prediction-based graph contrastive learning for inferring the relationship among lncRNAs, miRNAs and diseases. Brief Bioinform 2023; 24:bbad276. [PMID: 37529914 DOI: 10.1093/bib/bbad276] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 07/09/2023] [Accepted: 07/11/2023] [Indexed: 08/03/2023] Open
Abstract
MOTIVATION Identifying the relationships among long non-coding RNAs (lncRNAs), microRNAs (miRNAs) and diseases is highly valuable for diagnosing, preventing, treating and prognosing diseases. The development of effective computational prediction methods can reduce experimental costs. While numerous methods have been proposed, they often to treat the prediction of lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs) and lncRNA-miRNA interactions (LMIs) as separate task. Models capable of predicting all three relationships simultaneously remain relatively scarce. Our aim is to perform multi-task predictions, which not only construct a unified framework, but also facilitate mutual complementarity of information among lncRNAs, miRNAs and diseases. RESULTS In this work, we propose a novel unsupervised embedding method called graph contrastive learning for multi-task prediction (GCLMTP). Our approach aims to predict LDAs, MDAs and LMIs by simultaneously extracting embedding representations of lncRNAs, miRNAs and diseases. To achieve this, we first construct a triple-layer lncRNA-miRNA-disease heterogeneous graph (LMDHG) that integrates the complex relationships between these entities based on their similarities and correlations. Next, we employ an unsupervised embedding model based on graph contrastive learning to extract potential topological feature of lncRNAs, miRNAs and diseases from the LMDHG. The graph contrastive learning leverages graph convolutional network architectures to maximize the mutual information between patch representations and corresponding high-level summaries of the LMDHG. Subsequently, for the three prediction tasks, multiple classifiers are explored to predict LDA, MDA and LMI scores. Comprehensive experiments are conducted on two datasets (from older and newer versions of the database, respectively). The results show that GCLMTP outperforms other state-of-the-art methods for the disease-related lncRNA and miRNA prediction tasks. Additionally, case studies on two datasets further demonstrate the ability of GCLMTP to accurately discover new associations. To ensure reproducibility of this work, we have made the datasets and source code publicly available at https://github.com/sheng-n/GCLMTP.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
- School of Artificial Intelligence, Jilin University, 130012 Changchun, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Ling Gao
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, 130012 Changchun, China
| | - Xuping Xie
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, 130012 Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, UK
| |
Collapse
|
14
|
Xuan P, Bai H, Cui H, Zhang X, Nakaguchi T, Zhang T. Specific topology and topological connection sensitivity enhanced graph learning for lncRNA-disease association prediction. Comput Biol Med 2023; 164:107265. [PMID: 37531860 DOI: 10.1016/j.compbiomed.2023.107265] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/26/2023] [Accepted: 07/16/2023] [Indexed: 08/04/2023]
Abstract
Predicting disease-related candidate long noncoding RNAs (lncRNAs) is beneficial for exploring disease pathogenesis due to the close relations between lncRNAs and the occurrence and development of human diseases. It is a long-term and challenging task to adequately extract specific and local topologies in individual lncRNA network and individual disease network, and integrate the information of the connection relationships. We propose a new graph learning-based prediction method to encode specific and local topologies from each individual network, neighbor topologies with different connection relationships, and pairwise attributes. We first construct a lncRNA network composed of all the lncRNA nodes and their similarities, and a single disease network that contains all the disease nodes and disease similarities. Then, a network-aware graph convolutional autoencoder is constructed to encode the specific and local topologies of each network. Secondly, a heterogeneous network is established to embed all lncRNA, disease, and miRNA nodes and their various connections. Afterwards, a connection-sensitive graph neural network is designed to deeply integrate the neighbor node attributes and connection characteristics in the heterogeneous network and learn neighbor topological representations. We also construct both connection-level and topology representation-level attention mechanisms to extract informative connections and topological representations. Finally, we build a multi-layer convolutional neural networks with weighted residuals to adaptively complement the detailed features to pairwise attribute encoding. Comprehensive experiments and comparison results demonstrated that NCPred outperforms seven state-of-the-art prediction methods. The ablation studies demonstrated the importance of local topology learning, neighbor topology learning, and pairwise attribute encoding. Case studies on prostate, lung, and breast cancers further revealed NCPred's capacity to screen potential candidate disease-related lncRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- Department of Computer Science, School of Engineering, Shantou University, Shantou, China
| | - Honglei Bai
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, Australia
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin, China; School of Mathematical Science, Heilongjiang University, Harbin, China.
| |
Collapse
|
15
|
Biyu H, GuangWen T, Ming Z, Lixin G, Mengshan L. A lncRNA-disease association prediction model based on the two-step PU learning and fully connected neural networks. Heliyon 2023; 9:e17726. [PMID: 37539215 PMCID: PMC10395133 DOI: 10.1016/j.heliyon.2023.e17726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 06/13/2023] [Accepted: 06/26/2023] [Indexed: 08/05/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) have been shown to play a regulatory role in various processes of human diseases. However, lncRNA experiments are inefficient, time-consuming and highly subjective, so that the number of experimentally verified associations between lncRNA and diseases is limited. In the era of big data, numerous machine learning methods have been proposed to predict the potential association between lncRNA and diseases, but the characteristics of the associated data were seldom explored. In these methods, negative samples are randomly selected for model training and the model is prone to learn the potential positive association error, thus affecting the prediction accuracy. In this paper, we proposed a cyclic optimization model of predicting lncRNA-disease associations (COPTLDA in short). In COPTLDA, the two-step training strategy is adopted to search for the samples with the greater probability of being negative examples from unlabeled samples and the determined samples are treated as negative samples, which are combined together with known positive samples to train the model. The searching and training steps are repeated until the best model is obtained as the final prediction model. In order to evaluate the performance of the model, 30% of the known positive samples are used to calculate the model accuracy and 10% of positive samples are used to calculate the recall rate of the model. The sampling strategy used in this paper can improve the accuracy and the AUC value reaches 0.9348. The results of case studies showed that the model could predict the potential associations between lncRNA and malignant tumors such as colorectal cancer, gastric cancer, and breast cancer. The predicted top 20 associated lncRNAs included 10 colorectal cancer lncRNAs, 2 gastric cancer lncRNAs, and 8 breast cancer lncRNAs.
Collapse
Affiliation(s)
| | | | | | | | - Li Mengshan
- Corresponding author. Gannan Normal University, China.
| |
Collapse
|
16
|
Zhong H, Luo J, Tang L, Liao S, Lu Z, Lin G, Murphy RW, Liu L. Association filtering and generative adversarial networks for predicting lncRNA-associated disease. BMC Bioinformatics 2023; 24:234. [PMID: 37277721 DOI: 10.1186/s12859-023-05368-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 05/29/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND Long non-coding RNA (lncRNA) closely associates with numerous biological processes, and with many diseases. Therefore, lncRNA-disease association prediction helps obtain relevant biological information and understand pathogenesis, and thus better diagnose preventable diseases. RESULTS Herein, we offer the LDAF_GAN method for predicting lncRNA-associated disease based on association filtering and generative adversarial networks. Experimentation used two types of data: lncRNA-disease associated data without lncRNA sequence features, and fused lncRNA sequence features. LDAF_GAN uses a generator and discriminator, and differs from the original GAN by the addition of a filtering operation and negative sampling. Filtering allows the generator output to filter out unassociated diseases before being fed into the discriminator. Thus, the results generated by the model focuses only on lncRNAs associated with disease. Negative sampling takes a portion of disease terms with 0 from the association matrix as negative samples, which are assumed to be unassociated with lncRNA. A regular term is added to the loss function to avoid producing a vector with all values of 1, which can fool the discriminator. Thus, the model requires that generated positive samples are close to 1, and negative samples are close to 0. The model achieved a superior fitting effect; LDAF_GAN had superior performance in predicting fivefold cross-validations on the two datasets with AUC values of 0.9265 and 0.9278, respectively. In the case study, LDAF_GAN predicted disease association for six lncRNAs-H19, MALAT1, XIST, ZFAS1, UCA1, and ZEB1-AS1-and with the top ten predictions of 100%, 80%, 90%, 90%, 100%, and 90%, respectively, which were reported by previous studies. CONCLUSION LDAF_GAN efficiently predicts the potential association of existing lncRNAs and the potential association of new lncRNAs with diseases. The results of fivefold cross-validation, tenfold cross-validation, and case studies suggest that the model has great predictive potential for lncRNA-disease association prediction.
Collapse
Affiliation(s)
- Hua Zhong
- School of Information Science, Yunnan Normal University, Kunming, China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-resource, School of Ecology and Environment and School of Life Sciences, Yunnan University, Kunming, China
| | - Lin Tang
- Key Laboratory of Educational lnformation for Nationalities Ministry of Education, Yunnan University, Kunming, China
| | - Shicheng Liao
- School of Information Science, Yunnan Normal University, Kunming, China
| | - Zhonghao Lu
- School of Information Science, Yunnan Normal University, Kunming, China
| | - Guoliang Lin
- School of Medicine, Yunnan University, Kunming, China
| | - Robert W Murphy
- Reptilia Zoo and Education Centre, 2501 Rutherford Rd., Vaughan, ON, L4K 2N6, Canada
| | - Lin Liu
- School of Information Science, Yunnan Normal University, Kunming, China.
| |
Collapse
|
17
|
Zhang GZ, Gao YL. BRWMC: Predicting lncRNA-disease associations based on bi-random walk and matrix completion on disease and lncRNA networks. Comput Biol Chem 2023; 103:107833. [PMID: 36812824 DOI: 10.1016/j.compbiolchem.2023.107833] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2022] [Revised: 12/29/2022] [Accepted: 02/15/2023] [Indexed: 02/19/2023]
Abstract
Many experiments have proved that long non-coding RNAs (lncRNAs) in humans have been implicated in disease development. The prediction of lncRNA-disease association is essential in promoting disease treatment and drug development. It is time-consuming and laborious to explore the relationship between lncRNA and diseases in the laboratory. The computation-based approach has clear advantages and has become a promising research direction. This paper proposes a new lncRNA disease association prediction algorithm BRWMC. Firstly, BRWMC constructed several lncRNA (disease) similarity networks based on different measurement angles and fused them into an integrated similarity network by similarity network fusion (SNF). In addition, the random walk method is used to preprocess the known lncRNA-disease association matrix and calculate the estimated scores of potential lncRNA-disease associations. Finally, the matrix completion method accurately predicts the potential lncRNA-disease associations. Under the framework of leave-one-out cross-validation and 5-fold cross-validation, the AUC values obtained by BRWMC are 0.9610 and 0.9739, respectively. In addition, case studies of three common diseases show that BRWMC is a reliable method for prediction.
Collapse
Affiliation(s)
- Guo-Zheng Zhang
- School of Computer Science, Qufu Normal University, Rizhao, China
| | - Ying-Lian Gao
- Qufu Normal University Library, Qufu Normal University, Rizhao, China.
| |
Collapse
|
18
|
Xuan P, Zhao Y, Cui H, Zhan L, Jin Q, Zhang T, Nakaguchi T. Semantic Meta-Path Enhanced Global and Local Topology Learning for lncRNA-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1480-1491. [PMID: 36173783 DOI: 10.1109/tcbb.2022.3209571] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Since abnormal expression of long non-coding RNAs (lncRNAs) is associated with various human diseases, identifying disease-related lncRNAs helps reveal the pathogenesis of diseases. Existing methods for lncRNA-disease association prediction mainly focus on multi-sourced data related to lncRNAs and diseases. The rich semantic information of meta-paths, composed of multiple kinds of connections between lncRNA and disease nodes, is neglected. We propose a new prediction method, MGLDA, to encode and integrate the semantics of multiple meta-paths, the global topology of heterogeneous graph, and pairwise attributes of lncRNA and disease nodes. First, a tri-layer heterogeneous graph is constructed to associate multi-sourced data across the lncRNA, disease, and miRNA nodes. Afterwards, we establish multiple meta-paths connecting the lncRNA and disease nodes to derive and denote various semantics. Each meta-path contains its specific semantics formulated by an embedding strategy, and each embedding covers local topology formed by the diverse semantic connections among the lncRNA, disease, and miRNA nodes. We construct multiple graph convolutional autoencoders (GCA) with topology-level attention to learn global and multiple local topologies from the tri-layer graph and each meta-path, respectively. The topology-level attention mechanism can learn the importance of various global and local topologies for adaptive pairwise topology fusion. Finally, a convolutional autoencoder learns the attribute representations of lncRNA-disease pairs, which integrates the learnt detailed and representative pairwise features. Experimental results show that MGLDA outperforms other state-of-the-art prediction methods in comparison and retrieves more real lncRNA-disease associations in the top-ranked candidates. The ablation study also demonstrates the important contributions of the local and global topology learning, and pairwise attribute learning. Case studies on three diseases further demonstrate MGLDA's ability to identify potential disease-related lncRNAs.
Collapse
|
19
|
Noncoding RNA Regulation of Hormonal and Metabolic Systems in the Fruit Fly Drosophila. Metabolites 2023; 13:metabo13020152. [PMID: 36837772 PMCID: PMC9967906 DOI: 10.3390/metabo13020152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/12/2023] [Accepted: 01/16/2023] [Indexed: 01/22/2023] Open
Abstract
The importance of RNAs is commonly recognised thanks to protein-coding RNAs, whereas non-coding RNAs (ncRNAs) were conventionally regarded as 'junk'. In the last decade, ncRNAs' significance and roles are becoming noticeable in various biological activities, including those in hormonal and metabolic regulation. Among the ncRNAs: microRNA (miRNA) is a small RNA transcript with ~20 nucleotides in length; long non-coding RNA (lncRNA) is an RNA transcript with >200 nucleotides; and circular RNA (circRNA) is derived from back-splicing of pre-mRNA. These ncRNAs can regulate gene expression levels at epigenetic, transcriptional, and post-transcriptional levels through various mechanisms in insects. A better understanding of these crucial regulators is essential to both basic and applied entomology. In this review, we intend to summarise and discuss the current understanding and knowledge of miRNA, lncRNA, and circRNA in the best-studied insect model, the fruit fly Drosophila.
Collapse
|
20
|
Yao D, Nong L, Qin M, Wu S, Yao S. Identifying circRNA-miRNA interaction based on multi-biological interaction fusion. Front Microbiol 2022; 13:987930. [PMID: 36620017 PMCID: PMC9815023 DOI: 10.3389/fmicb.2022.987930] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
CircRNA is a new type of non-coding RNA with a closed loop structure. More and more biological experiments show that circRNA plays important roles in many diseases by regulating the target genes of miRNA. Therefore, correct identification of the potential interaction between circRNA and miRNA not only helps to understand the mechanism of the disease, but also contributes to the diagnosis, treatment, and prognosis of the disease. In this study, we propose a model (IIMCCMA) by using network embedding and matrix completion to predict the potential interaction of circRNA-miRNA. Firstly, the corresponding adjacency matrix is constructed based on the experimentally verified circRNA-miRNA interaction, circRNA-cancer interaction, and miRNA-cancer interaction. Then, the Gaussian kernel function and the cosine function are used to calculate the circRNA Gaussian interaction profile kernel similarity, circRNA functional similarity, miRNA Gaussian interaction profile kernel similarity, and miRNA functional similarity. In order to reduce the influence of noise and redundant information in known interactions, this model uses network embedding to extract the potential feature vectors of circRNA and miRNA, respectively. Finally, an improved inductive matrix completion algorithm based on the feature vectors of circRNA and miRNA is used to identify potential interactions between circRNAs and miRNAs. The 10-fold cross-validation experiment is utilized to prove the predictive ability of the IIMCCMA. The experimental results show that the AUC value and AUPR value of the IIMCCMA model are higher than other state-of-the-art algorithms. In addition, case studies show that the IIMCCMA model can correctly identify the potential interactions between circRNAs and miRNAs.
Collapse
Affiliation(s)
- Dunwei Yao
- Department of Gastroenterology, The People’s Hospital of Baise, Baise, China,The Southwest Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China
| | - Lidan Nong
- Department of Child Healthcare, Baise Maternal and Child Hospital, Baise, China
| | - Minzhen Qin
- Department of Gastroenterology, The People’s Hospital of Baise, Baise, China,The Southwest Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China
| | - Shengbin Wu
- The Southwest Affiliated Hospital of Youjiang Medical University for Nationalities, Baise, China,Department of Pulmonary and Critical Care Medicine, The People's Hospital of Baise, Baise, China
| | - Shunhan Yao
- Medical College of Guangxi University, Nanning, China,*Correspondence: Shunhan Yao,
| |
Collapse
|
21
|
Dai L, Zhu R, Liu J, Li F, Wang J, Shang J. MSF-UBRW: An Improved Unbalanced Bi-Random Walk Method to Infer Human lncRNA-Disease Associations. Genes (Basel) 2022; 13:2032. [PMID: 36360269 PMCID: PMC9690797 DOI: 10.3390/genes13112032] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/24/2022] [Accepted: 10/28/2022] [Indexed: 09/08/2024] Open
Abstract
Long-non-coding RNA (lncRNA) is a transcription product that exerts its biological functions through a variety of mechanisms. The occurrence and development of a series of human diseases are closely related to abnormal expression levels of lncRNAs. Scientists have developed many computational models to identify the lncRNA-disease associations (LDAs). However, many potential LDAs are still unknown. In this paper, a novel method, namely MSF-UBRW (multiple similarities fusion based on unbalanced bi-random walk), is designed to explore new LDAs. First, two similarities (functional similarity and Gaussian Interaction Profile kernel similarity) of lncRNAs are calculated and fused linearly, also for disease data. Then, the known association matrix is preprocessed. Next, the linear neighbor similarities of lncRNAs and diseases are calculated, respectively. After that, the potential associations are predicted based on unbalanced bi-random walk. The fusion of multiple similarities improves the prediction performance of MSF-UBRW to a large extent. Finally, the prediction ability of the MSF-UBRW algorithm is measured by two statistical methods, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV). The AUCs of 0.9391 in LOOCV and 0.9183 (±0.0054) in 5-fold CV confirmed the reliable prediction ability of the MSF-UBRW method. Case studies of three common diseases also show that the MSF-UBRW method can infer new LDAs effectively.
Collapse
Affiliation(s)
| | | | | | | | | | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao 276826, China
| |
Collapse
|
22
|
Shi H, Zhang X, Tang L, Liu L. Heterogeneous graph neural network for lncRNA-disease association prediction. Sci Rep 2022; 12:17519. [PMID: 36266433 PMCID: PMC9585029 DOI: 10.1038/s41598-022-22447-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 10/14/2022] [Indexed: 01/12/2023] Open
Abstract
Identifying lncRNA-disease associations is conducive to the diagnosis, treatment and prevention of diseases. Due to the expensive and time-consuming methods verified by biological experiments, prediction methods based on computational models have gradually become an important means of lncRNA-disease associations discovery. However, existing methods still have challenges to make full use of network topology information to identify potential associations between lncRNA and disease in multi-source data. In this study, we propose a novel method called HGNNLDA for lncRNA-disease association prediction. First, HGNNLDA constructs a heterogeneous network composed of lncRNA similarity network, lncRNA-disease association network and lncRNA-miRNA association network; Then, on this heterogeneous network, various types of strong correlation neighbors with fixed size are sampled for each node by restart random walk; Next, the embedding information of lncRNA and disease in each lncRNA-disease association pair is obtained by the method of type-based neighbor aggregation and all types combination though heterogeneous graph neural network, in which attention mechanism is introduced considering that different types of neighbors will make different contributions to the prediction of lncRNA-disease association. As a result, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) under fivefold cross-validation (5FCV) are 0.9786 and 0.8891, respectively. Compared with five state-of-art prediction models, HGNNLDA has better prediction performance. In addition, in two types of case studies, it is further verified that our method can effectively predict the potential lncRNA-disease associations, and have ability to predict new diseases without any known lncRNAs.
Collapse
Affiliation(s)
- Hong Shi
- School of Information, Yunan Normal University, Kunming, 650092 China
| | - Xiaomeng Zhang
- School of Information, Yunan Normal University, Kunming, 650092 China
| | - Lin Tang
- grid.410739.80000 0001 0723 6903Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming, 650092 China
| | - Lin Liu
- School of Information, Yunan Normal University, Kunming, 650092 China
| |
Collapse
|
23
|
Yao D, Zhang T, Zhan X, Zhang S, Zhan X, Zhang C. Geometric complement heterogeneous information and random forest for predicting lncRNA-disease associations. Front Genet 2022; 13:995532. [PMID: 36092871 PMCID: PMC9448985 DOI: 10.3389/fgene.2022.995532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 08/01/2022] [Indexed: 11/20/2022] Open
Abstract
More and more evidences have showed that the unnatural expression of long non-coding RNA (lncRNA) is relevant to varieties of human diseases. Therefore, accurate identification of disease-related lncRNAs can help to understand lncRNA expression at the molecular level and to explore more effective treatments for diseases. Plenty of lncRNA-disease association prediction models have been raised but it is still a challenge to recognize unknown lncRNA-disease associations. In this work, we have proposed a computational model for predicting lncRNA-disease associations based on geometric complement heterogeneous information and random forest. Firstly, geometric complement heterogeneous information was used to integrate lncRNA-miRNA interactions and miRNA-disease associations verified by experiments. Secondly, lncRNA and disease features consisted of their respective similarity coefficients were fused into input feature space. Thirdly, an autoencoder was adopted to project raw high-dimensional features into low-dimension space to learn representation for lncRNAs and diseases. Finally, the low-dimensional lncRNA and disease features were fused into input feature space to train a random forest classifier for lncRNA-disease association prediction. Under five-fold cross-validation, the AUC (area under the receiver operating characteristic curve) is 0.9897 and the AUPR (area under the precision-recall curve) is 0.7040, indicating that the performance of our model is better than several state-of-the-art lncRNA-disease association prediction models. In addition, case studies on colon and stomach cancer indicate that our model has a good ability to predict disease-related lncRNAs.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- *Correspondence: Dengju Yao,
| | - Tao Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaojuan Zhan
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Shuli Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South University of Science and Technology, Shenzhen, China
| | - Chao Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
24
|
Xie G, Zhu Y, Lin Z, Sun Y, Gu G, Li J, Wang W. HBRWRLDA: predicting potential lncRNA-disease associations based on hypergraph bi-random walk with restart. Mol Genet Genomics 2022; 297:1215-1228. [PMID: 35752742 DOI: 10.1007/s00438-022-01909-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 05/20/2022] [Indexed: 10/17/2022]
Abstract
Accumulating evidence indicates that the regulation of long non-coding RNAs (lncRNAs) is closely related to a variety of diseases. Identifying meaningful lncRNA-disease associations will help to contribute to the understanding of the molecular mechanisms underlying these diseases. However, only a limited number of associations between lncRNAs and diseases have been inferred from traditional biological experiments due to the high cost and highly specialized. Therefore, computational methods are increasingly used to reduce time of biological experiments and complement biological research. In this paper, a computational method called HBRWRLDA is proposed to predict lncRNA-disease associations. First, HBRWRLDA models the relationships between multiple nodes using hypergraphs, which allows HBRWRLDA to integrate the expression similarity of lncRNAs and the semantic similarity of diseases to construct hypergraphs. Then, a bi-random walk on hypergraphs is used to predict potential lncRNA-disease associations. HBRWRLDA achieves a higher area under the curve value of 0.9551 and [Formula: see text], respectively, compared with the other five advanced methods under the framework of one-leave cross validation (LOOCV) and five-fold cross-validation (5-fold CV). In addition, the prediction effect of HBRWRLDA was confirmed case studies of three diseases: renal cell carcinoma, gastric cancer, and hepatocellular carcinoma. Case studies demonstrates the capacity of HBRWRLDA to identify potentially disease-associated lncRNAs. Overall, HBRWRLDA is excellent at predicting potential lncRNA-disease associations and could be useful in conducting further biological experiments by helping researchers identify candidates of lncRNA-disease association.
Collapse
Affiliation(s)
- Guobo Xie
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Yinting Zhu
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Zhiyi Lin
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China.
| | - Yuping Sun
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Guosheng Gu
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Jianming Li
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| | - Weiming Wang
- School of Computing, Guangdong University of Technology, Guangzhou, 510000, China
| |
Collapse
|
25
|
Liu Y, Yu Y, Zhao S. Dual Attention Mechanisms and Feature Fusion Networks Based Method for Predicting LncRNA-Disease Associations. Interdiscip Sci 2022; 14:358-371. [PMID: 35067893 DOI: 10.1007/s12539-021-00492-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2021] [Revised: 11/02/2021] [Accepted: 11/07/2021] [Indexed: 11/30/2022]
Abstract
LncRNAs play a part in numerous momentous processes of biology such as disease diagnoses, preventions and treatments. The associations between various diseases and lncRNAs are one of the crucial approaches to learn the role and status of lncRNAs in human diseases. With the researches on lncRNA and diseases, multiple methods based on neural network have been employed to predict these associations. However, the deep and complicated characteristic representations of lncRNA-disease associations were failed to be extracted, and the discriminative contributions of the interactions, correlations, and similarities among miRNAs diseases, and lncRNAs for the correlation predictions were ignored. In this paper, based on the multibiology premise of lncRNAs, miRNAs, and diseases, a dual attention network was proposed to predict the model of lncRNA-disease associations for miRNAs, the disease characteristic matrix, and lncRNAs. Through two attention modules, we enable the model to learn the nonlinear, more complex and useful features of lncRNA, miRNA, and disease characteristic matrix. For the feature embedding matrix composed of lncRNA-disease, the connection between lncRNA-disease feature embedding matrix and lncRNA, miRNA, and disease characteristic matrix was enhanced through deconvolution and feature fusion layer. Compared with several latest methods, the method proposed in this paper can produce better performance. Researches on the cases of osteosarcoma, lung cancer, and gastric cancer have confirmed the effective recognition of potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Yu Liu
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China. .,Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Darul Ehsan, Malaysia.
| | - Yingying Yu
- Dalian Key Lab of Digital Technology for National Culture, Dalian Minzu University, Dalian, 116600, China
| | - Shimin Zhao
- Guangxi Vocational and Technical College, Nanning, 530000, Guangxi, China
| |
Collapse
|
26
|
Zhong J, Zhou W, Kang J, Fang Z, Xie M, Xiao Q, Peng W. DNRLCNN: A CNN Framework for Identifying MiRNA-Disease Associations Using Latent Feature Matrix Extraction with Positive Samples. Interdiscip Sci 2022; 14:607-622. [PMID: 35428965 DOI: 10.1007/s12539-022-00509-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/24/2022] [Accepted: 03/01/2022] [Indexed: 06/14/2023]
Abstract
Emerging evidence indicates that miRNAs have strong relationships with many human diseases. Investigating the associations will contribute to elucidating the activities of miRNAs and pathogenesis mechanisms, and providing new opportunities for disease diagnosis and drug discovery. Therefore, it is of significance to identify potential associations between miRNAs and diseases. The existing databases about the miRNA-disease associations (MDAs) only provide the known MDAs, which can be regarded as positive samples. However, the unknown MDAs are not sufficient to regard as reliable negative samples. To deal with this uncertainty, we proposed a convolutional neural network (CNN) framework, named DNRLCNN, based on a latent feature matrix extracted by only positive samples to predict MDAs. First, by only considering the positive samples into the calculation process, we captured the latent feature matrix for complex interactions between miRNAs and diseases in low-dimensional space. Then, we constructed a feature vector for each miRNA and disease pair based on the feature representation. Finally, we adopted a modified CNN for the feature vector to predict MDAs. As a result, our model achieves better performance than other state-of-the-art methods which based CNN in fivefold cross-validation on both miRNA-disease association prediction task (average AUC of 0.9030) and miRNA-phenotype association prediction task (average AUC of 0. 9442). In addition, we carried out case studies on two human diseases, and all the top-50 predicted miRNAs for lung neoplasms are confirmed by HMDD v3.2 and dbDEMC 2.0 databases, 98% of the top-50 predicted miRNAs for heart failure are confirmed. The experiment results show that our model has the capability of inferring potential disease-related miRNAs.
Collapse
Affiliation(s)
- Jiancheng Zhong
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | - Wubin Zhou
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Jiedong Kang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Zhuo Fang
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China
| | - Qiu Xiao
- College of Information Science and Engineering, Hunan Normal University, Changsha, 410083, China.
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China.
| |
Collapse
|
27
|
Liang Y, Zhang ZQ, Liu NN, Wu YN, Gu CL, Wang YL. MAGCNSE: predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinformatics 2022; 23:189. [PMID: 35590258 PMCID: PMC9118755 DOI: 10.1186/s12859-022-04715-w] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/05/2022] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Many long non-coding RNAs (lncRNAs) have key roles in different human biologic processes and are closely linked to numerous human diseases, according to cumulative evidence. Predicting potential lncRNA-disease associations can help to detect disease biomarkers and perform disease analysis and prevention. Establishing effective computational methods for lncRNA-disease association prediction is critical. RESULTS In this paper, we propose a novel model named MAGCNSE to predict underlying lncRNA-disease associations. We first obtain multiple feature matrices from the multi-view similarity graphs of lncRNAs and diseases utilizing graph convolutional network. Then, the weights are adaptively assigned to different feature matrices of lncRNAs and diseases using the attention mechanism. Next, the final representations of lncRNAs and diseases is acquired by further extracting features from the multi-channel feature matrices of lncRNAs and diseases using convolutional neural network. Finally, we employ a stacking ensemble classifier, consisting of multiple traditional machine learning classifiers, to make the final prediction. The results of ablation studies in both representation learning methods and classification methods demonstrate the validity of each module. Furthermore, we compare the overall performance of MAGCNSE with that of six other state-of-the-art models, the results show that it outperforms the other methods. Moreover, we verify the effectiveness of using multi-view data of lncRNAs and diseases. Case studies further reveal the outstanding ability of MAGCNSE in the identification of potential lncRNA-disease associations. CONCLUSIONS The experimental results indicate that MAGCNSE is a useful approach for predicting potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Ze-Qun Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Nian-Nian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Ya-Nan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Chang-Long Gu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Ying-Long Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
28
|
Lan W, Lai D, Chen Q, Wu X, Chen B, Liu J, Wang J, Chen YPP. LDICDL: LncRNA-Disease Association Identification Based on Collaborative Deep Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1715-1723. [PMID: 33125333 DOI: 10.1109/tcbb.2020.3034910] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
It has been proved that long noncoding RNA (lncRNA) plays critical roles in many human diseases. Therefore, inferring associations between lncRNAs and diseases can contribute to disease diagnosis, prognosis and treatment. To overcome the limitation of traditional experimental methods such as expensive and time-consuming, several computational methods have been proposed to predict lncRNA-disease associations by fusing different biological data. However, the prediction performance of lncRNA-disease associations identification needs to be improved. In this study, we propose a computational model (named LDICDL) to identify lncRNA-disease associations based on collaborative deep learning. It uses an automatic encoder to denoise multiple lncRNA feature information and multiple disease feature information, respectively. Then, the matrix decomposition algorithm is employed to predict the potential lncRNA-disease associations. In addition, to overcome the limitation of matrix decomposition, the hybrid model is developed to predict associations between new lncRNA (or disease) and diseases (or lncRNA). The ten-fold cross validation and de novo test are applied to evaluate the performance of method. The experimental results show LDICDL outperforms than other state-of-the-art methods in prediction performance.
Collapse
|
29
|
Xie G, Jiang J, Sun Y. LDA-LNSUBRW: lncRNA-Disease Association Prediction Based on Linear Neighborhood Similarity and Unbalanced bi-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:989-997. [PMID: 32870798 DOI: 10.1109/tcbb.2020.3020595] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Increasing number of experiments show that lncRNAs are involved in many biological processes, and their mutations and disorders are associated with many diseases. However, verifying the relationships between lncRNAs and diseases is time consuming and laborio. Searching for effective computational methods will contribute to our understanding of the underlying mechanisms of disease and identifying biomarkers of diseases. Therefore, we proposed a method called lncRNA-disease association prediction based on linear neighborhood similarity and unbalanced bi-random walk (LDA-LNSUBRW). Given that the known lncRNA-disease associations are rare, a pretreatment step should be performed to obtain the interaction possibility of unknown cases, so as to help us predict the potential associations. In the framework of leave-one-out cross-validation (LOOCV)and fivefold cross-validation (5-fold CV), LDA-LNSUBRW achieved effective performance with AUC of 0.8874 and 0.8632 ± 0.0051, respectively. The experimental results in this paper show that the proposed method is superior to five other state-of-the-art methods. In addition, case studies of three diseases (lung cancer, breast cancer, and osteosarcoma)were carried out to illustrate that LDA-LNSUBRW could predict the relevant lncRNAs.
Collapse
|
30
|
Chen M, Deng Y, Li A, Tan Y. Inferring Latent Disease-lncRNA Associations by Label-Propagation Algorithm and Random Projection on a Heterogeneous Network. Front Genet 2022; 13:798632. [PMID: 35186029 PMCID: PMC8854791 DOI: 10.3389/fgene.2022.798632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 01/18/2022] [Indexed: 11/13/2022] Open
Abstract
Long noncoding RNA (lncRNA), a type of more than 200 nucleotides non-coding RNA, is related to various complex diseases. To precisely identify the potential lncRNA–disease association is important to understand the disease pathogenesis, to develop new drugs, and to design individualized diagnosis and treatment methods for different human diseases. Compared with the complexity and high cost of biological experiments, computational methods can quickly and effectively predict potential lncRNA–disease associations. Thus, it is a promising avenue to develop computational methods for lncRNA-disease prediction. However, owing to the low prediction accuracy ofstate of the art methods, it is vastly challenging to accurately and effectively identify lncRNA-disease at present. This article proposed an integrated method called LPARP, which is based on label-propagation algorithm and random projection to address the issue. Specifically, the label-propagation algorithm is initially used to obtain the estimated scores of lncRNA–disease associations, and then random projections are used to accurately predict disease-related lncRNAs.The empirical experiments showed that LAPRP achieved good prediction on three golddatasets, which is superior to existing state-of-the-art prediction methods. It can also be used to predict isolated diseases and new lncRNAs. Case studies of bladder cancer, esophageal squamous-cell carcinoma, and colorectal cancer further prove the reliability of the method. The proposed LPARP algorithm can predict the potential lncRNA–disease interactions stably and effectively with fewer data. LPARP can be used as an effective and reliable tool for biomedical research.
Collapse
|
31
|
Li J, Kong M, Wang D, Yang Z, Hao X. Prediction of lncRNA-Disease Associations via Closest Node Weight Graphs of the Spatial Neighborhood Based on the Edge Attention Graph Convolutional Network. Front Genet 2022; 12:808962. [PMID: 35058974 PMCID: PMC8763691 DOI: 10.3389/fgene.2021.808962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 11/29/2021] [Indexed: 11/24/2022] Open
Abstract
Accumulated evidence of biological clinical trials has shown that long non-coding RNAs (lncRNAs) are closely related to the occurrence and development of various complex human diseases. Research works on lncRNA–disease relations will benefit to further understand the pathogenesis of human complex diseases at the molecular level, but only a small proportion of lncRNA–disease associations has been confirmed. Considering the high cost of biological experiments, exploring potential lncRNA–disease associations with computational approaches has become very urgent. In this study, a model based on closest node weight graph of the spatial neighborhood (CNWGSN) and edge attention graph convolutional network (EAGCN), LDA-EAGCN, was developed to uncover potential lncRNA–disease associations by integrating disease semantic similarity, lncRNA functional similarity, and known lncRNA–disease associations. Inspired by the great success of the EAGCN method on the chemical molecule property recognition problem, the prediction of lncRNA–disease associations could be regarded as a component recognition problem of lncRNA–disease characteristic graphs. The CNWGSN features of lncRNA–disease associations combined with known lncRNA–disease associations were introduced to train EAGCN, and correlation scores of input data were predicted with EAGCN for judging whether the input lncRNAs would be associated with the input diseases. LDA-EAGCN achieved a reliable AUC value of 0.9853 in the ten-fold cross-over experiments, which was the highest among five state-of-the-art models. Furthermore, case studies of renal cancer, laryngeal carcinoma, and liver cancer were implemented, and most of the top-ranking lncRNA–disease associations have been proven by recently published experimental literature works. It can be seen that LDA-EAGCN is an effective model for predicting potential lncRNA–disease associations. Its source code and experimental data are available at https://github.com/HGDKMF/LDA-EAGCN.
Collapse
Affiliation(s)
- Jianwei Li
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China.,Hebei Province Key Laboratory of Big Data Calculation, Hebei University of Technology, Tianjin, China
| | - Mengfan Kong
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Duanyang Wang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Zhenwu Yang
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Xiaoke Hao
- Institute of Computational Medicine, School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| |
Collapse
|
32
|
Li J, Yang Z, Wang D, Li Z. WAFNRLTG: A Novel Model for Predicting LncRNA Target Genes Based on Weighted Average Fusion Network Representation Learning Method. Front Cell Dev Biol 2022; 9:820342. [PMID: 35127729 PMCID: PMC8807548 DOI: 10.3389/fcell.2021.820342] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Accepted: 12/14/2021] [Indexed: 11/29/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) do not encode proteins, yet they have been well established to be involved in complex regulatory functions, and lncRNA regulatory dysfunction can lead to a variety of human complex diseases. LncRNAs mostly exert their functions by regulating the expressions of target genes, and accurate prediction of potential lncRNA target genes would be helpful to further understanding the functional annotations of lncRNAs. Considering the limitations in traditional computational methods for predicting lncRNA target genes, a novel model which was named Weighted Average Fusion Network Representation learning for predicting LncRNA Target Genes (WAFNRLTG) was proposed. First, a novel heterogeneous network was constructed by integrating lncRNA sequence similarity network, mRNA sequence similarity network, lncRNA-mRNA interaction network, lncRNA-miRNA interaction network and mRNA-miRNA interaction network. Next, four popular network representation learning methods were utilized to gain the representation vectors of lncRNA and mRNA nodes. Then, the representations of lncRNAs and target genes in the heterogeneous network were obtained with the weighted average fusion network representation learning method. Finally, we merged the representations of lncRNAs and related target genes to form lncRNA-gene pairs, trained the XGBoost classifier and predicted potential lncRNA target genes. In five-cross validations on the training and independent datasets, the experimental results demonstrated that WAFNRLTG obtained better AUC scores (0.9410, 0.9350) and AUPR scores (0.9391, 0.9350). Moreover, case studies of three common lncRNAs were performed for predicting their potential lncRNA target genes and the results confirmed the effectiveness of WAFNRLTG. The source codes and all data of WAFNRLTG can be freely downloaded at https://github.com/HGDYZW/WAFNRLTG.
Collapse
Affiliation(s)
- Jianwei Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
- Hebei Province Key Laboratory of Big Data Calculation, Hebei University of Technology, Tianjin, China
- *Correspondence: Jianwei Li,
| | - Zhenwu Yang
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Duanyang Wang
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| | - Zhiguang Li
- School of Artificial Intelligence, Institute of Computational Medicine, Hebei University of Technology, Tianjin, China
| |
Collapse
|
33
|
Yan C, Duan G, Zhang Y, Wu FX, Pan Y, Wang J. Predicting Drug-Drug Interactions Based on Integrated Similarity and Semi-Supervised Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:168-179. [PMID: 32310779 DOI: 10.1109/tcbb.2020.2988018] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A drug-drug interaction (DDI) is defined as an association between two drugs where the pharmacological effects of a drug are influenced by another drug. Positive DDIs can usually improve the therapeutic effects of patients, but negative DDIs cause the major cause of adverse drug reactions and even result in the drug withdrawal from the market and the patient death. Therefore, identifying DDIs has become a key component of the drug development and disease treatment. In this study, we propose a novel method to predict DDIs based on the integrated similarity and semi-supervised learning (DDI-IS-SL). DDI-IS-SL integrates the drug chemical, biological and phenotype data to calculate the feature similarity of drugs with the cosine similarity method. The Gaussian Interaction Profile kernel similarity of drugs is also calculated based on known DDIs. A semi-supervised learning method (the Regularized Least Squares classifier) is used to calculate the interaction possibility scores of drug-drug pairs. In terms of the 5-fold cross validation, 10-fold cross validation and de novo drug validation, DDI-IS-SL can achieve the better prediction performance than other comparative methods. In addition, the average computation time of DDI-IS-SL is shorter than that of other comparative methods. Finally, case studies further demonstrate the performance of DDI-IS-SL in practical applications.
Collapse
|
34
|
Gong Y, Zhu W, Sun M, Shi L. Bioinformatics Analysis of Long Non-coding RNA and Related Diseases: An Overview. Front Genet 2021; 12:813873. [PMID: 34956340 PMCID: PMC8692768 DOI: 10.3389/fgene.2021.813873] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 11/26/2021] [Indexed: 12/30/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are usually located in the nucleus and cytoplasm of cells. The transcripts of lncRNAs are >200 nucleotides in length and do not encode proteins. Compared with small RNAs, lncRNAs have longer sequences, more complex spatial structures, and more diverse and complex mechanisms involved in the regulation of gene expression. LncRNAs are widely involved in the biological processes of cells, and in the occurrence and development of many human diseases. Many studies have shown that lncRNAs can induce the occurrence of diseases, and some lncRNAs undergo specific changes in tumor cells. Research into the roles of lncRNAs has covered the diagnosis of, for example, cardiovascular, cerebrovascular, and central nervous system diseases. The bioinformatics of lncRNAs has gradually become a research hotspot and has led to the discovery of a large number of lncRNAs and associated biological functions, and lncRNA databases and recognition models have been developed. In this review, the research progress of lncRNAs is discussed, and lncRNA-related databases and the mechanisms and modes of action of lncRNAs are described. In addition, disease-related lncRNA methods and the relationships between lncRNAs and human lung adenocarcinoma, rectal cancer, colon cancer, heart disease, and diabetes are discussed. Finally, the significance and existing problems of lncRNA research are considered.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
35
|
Peng L, Yuan R, Shen L, Gao P, Zhou L. LPI-EnEDT: an ensemble framework with extra tree and decision tree classifiers for imbalanced lncRNA-protein interaction data classification. BioData Min 2021; 14:50. [PMID: 34861891 PMCID: PMC8642957 DOI: 10.1186/s13040-021-00277-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Accepted: 08/22/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias. RESULTS In this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717. CONCLUSIONS Fusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.,College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ruya Yuan
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Ling Shen
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Pengfei Gao
- College of Life Sciences and Chemistry, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, No.88, Taishan West Road, Tianyuan District, Zhuzhou, China.
| |
Collapse
|
36
|
Zhang Y, Chen M, Huang L, Xie X, Li X, Jin H, Wang X, Wei H. Fusion of KATZ measure and space projection to fast probe potential lncRNA-disease associations in bipartite graphs. PLoS One 2021; 16:e0260329. [PMID: 34807960 PMCID: PMC8608294 DOI: 10.1371/journal.pone.0260329] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2021] [Accepted: 11/06/2021] [Indexed: 11/19/2022] Open
Abstract
It is well known that numerous long noncoding RNAs (lncRNAs) closely relate to the physiological and pathological processes of human diseases and can serves as potential biomarkers. Therefore, lncRNA-disease associations that are identified by computational methods as the targeted candidates reduce the cost of biological experiments focusing on deep study furtherly. However, inaccurate construction of similarity networks and inadequate numbers of observed known lncRNA–disease associations, such inherent problems make many mature computational methods that have been developed for many years still exit some limitations. It motivates us to explore a new computational method that was fused with KATZ measure and space projection to fast probing potential lncRNA-disease associations (namely KATZSP). KATZSP is comprised of following key steps: combining all the global information with which to change Boolean network of known lncRNA–disease associations into the weighted networks; changing the similarities calculation into counting the number of walks that connect lncRNA nodes and disease nodes in bipartite graphs; obtaining the space projection scores to refine the primary prediction scores. The process to fuse KATZ measure and space projection was simplified and uncomplicated with needing only one attenuation factor. The leave-one-out cross validation (LOOCV) experimental results showed that, compared with other state-of-the-art methods (NCPLDA, LDAI-ISPS and IIRWR), KATZSP had a higher predictive accuracy shown with area-under-the-curve (AUC) value on the three datasets built, while KATZSP well worked on inferring potential associations related to new lncRNAs (or isolated diseases). The results from real cases study (such as pancreas cancer, lung cancer and colorectal cancer) further confirmed that KATZSP is capable of superior predictive ability to be applied as a guide for traditional biological experiments.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
- Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of Technology, Guilin, China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, China
- The Future Laboratory, Tsinghua University, Beijing, China
| | - Xiaolan Xie
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xin Li
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Hong Jin
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Xiaohua Wang
- Pharmacy School, Guilin Medical University, Guilin, China
| | - Hanyan Wei
- Pharmacy School, Guilin Medical University, Guilin, China
| |
Collapse
|
37
|
Zeng M, Lu C, Fei Z, Wu FX, Li Y, Wang J, Li M. DMFLDA: A Deep Learning Framework for Predicting lncRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2353-2363. [PMID: 32248123 DOI: 10.1109/tcbb.2020.2983958] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A growing amount of evidence suggests that long non-coding RNAs (lncRNAs) play important roles in the regulation of biological processes in many human diseases. However, the number of experimentally verified lncRNA-disease associations is very limited. Thus, various computational approaches are proposed to predict lncRNA-disease associations. Current matrix factorization-based methods cannot capture the complex non-linear relationship between lncRNAs and diseases, and traditional machine learning-based methods are not sufficiently powerful to learn the representation of lncRNAs and diseases. Considering these limitations in existing computational methods, we propose a deep matrix factorization model to predict lncRNA-disease associations (DMFLDA in short). DMFLDA uses a cascade of non-linear hidden layers to learn latent representation to represent lncRNAs and diseases. By using non-linear hidden layers, DMFLDA captures the more complex non-linear relationship between lncRNAs and diseases than traditional matrix factorization-based methods. In addition, DMFLDA learns features directly from the lncRNA-disease interaction matrix and thus can obtain more accurate representation learning for lncRNAs and diseases than traditional machine learning methods. The low dimensional representations of the lncRNAs and diseases are fused to estimate the new interaction value. To evaluate the performance of DMFLDA, we perform leave-one-out cross-validation and 5-fold cross-validation on known experimentally verified lncRNA-disease associations. The experimental results show that DMFLDA performs better than the existing methods. The case studies show that many predicted interactions of colorectal cancer, prostate cancer, and renal cancer have been verified by recent biomedical literature. The source code and datasets can be obtained from https://github.com/CSUBioGroup/DMFLDA.
Collapse
|
38
|
Wang H, Zhao S, Zhao J, Feng Z. A model for predicting drug-disease associations based on dense convolutional attention network. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:7419-7439. [PMID: 34814256 DOI: 10.3934/mbe.2021367] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The development of new drugs is a time-consuming and labor-intensive process. Therefore, researchers use computational methods to explore other therapeutic effects of existing drugs, and drug-disease association prediction is an important branch of it. The existing drug-disease association prediction method ignored the prior knowledge contained in the drug-disease association data, which provided a strong basis for the research. Moreover, the previous methods only paid attention to the high-level features in the network when extracting features, and directly fused or connected them in series, resulting in the loss of information. Therefore, we propose a novel deep learning model for drug-disease association prediction, called DCNN. The model introduces the Gaussian interaction profile kernel similarity for drugs and diseases, and combines them with the structural similarity of drugs and the semantic similarity of diseases to construct the feature space jointly. Then dense convolutional neural network (DenseCNN) is used to capture the feature information of drugs and diseases, and introduces a convolutional block attention module (CBAM) to weight features from the channel and space levels to achieve adaptive optimization of features. The ten-fold cross-validation results of the model DCNN and the experimental results of the case study show that it is superior to the existing drug-disease association predictors and effectively predicts the drug-disease associations.
Collapse
Affiliation(s)
- Huiqing Wang
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Sen Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Jing Zhao
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| | - Zhipeng Feng
- College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China
| |
Collapse
|
39
|
Baltoumas FA, Zafeiropoulou S, Karatzas E, Koutrouli M, Thanati F, Voutsadaki K, Gkonta M, Hotova J, Kasionis I, Hatzis P, Pavlopoulos GA. Biomolecule and Bioentity Interaction Databases in Systems Biology: A Comprehensive Review. Biomolecules 2021; 11:1245. [PMID: 34439912 PMCID: PMC8391349 DOI: 10.3390/biom11081245] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 08/16/2021] [Accepted: 08/18/2021] [Indexed: 02/06/2023] Open
Abstract
Technological advances in high-throughput techniques have resulted in tremendous growth of complex biological datasets providing evidence regarding various biomolecular interactions. To cope with this data flood, computational approaches, web services, and databases have been implemented to deal with issues such as data integration, visualization, exploration, organization, scalability, and complexity. Nevertheless, as the number of such sets increases, it is becoming more and more difficult for an end user to know what the scope and focus of each repository is and how redundant the information between them is. Several repositories have a more general scope, while others focus on specialized aspects, such as specific organisms or biological systems. Unfortunately, many of these databases are self-contained or poorly documented and maintained. For a clearer view, in this article we provide a comprehensive categorization, comparison and evaluation of such repositories for different bioentity interaction types. We discuss most of the publicly available services based on their content, sources of information, data representation methods, user-friendliness, scope and interconnectivity, and we comment on their strengths and weaknesses. We aim for this review to reach a broad readership varying from biomedical beginners to experts and serve as a reference article in the field of Network Biology.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Sofia Zafeiropoulou
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Foteini Thanati
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Kleanthi Voutsadaki
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Maria Gkonta
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Joana Hotova
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Ioannis Kasionis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
| | - Pantelis Hatzis
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, Biomedical Sciences Research Center “Alexander Fleming”, 16672 Vari, Greece; (S.Z.); (E.K.); (M.K.); (F.T.); (K.V.); (M.G.); (J.H.); (I.K.); (P.H.)
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, 11527 Athens, Greece
| |
Collapse
|
40
|
Deng L, Li W, Zhang J. LDAH2V: Exploring Meta-Paths Across Multiple Networks for lncRNA-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1572-1581. [PMID: 31725386 DOI: 10.1109/tcbb.2019.2946257] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Accumulating evidence has demonstrated dysfunctions of long non-coding RNAs (lncRNAs) are involved in various complex human diseases. However, even today, the relationships between lncRNAs and diseases remain unknown in most cases. Developing effective computational approaches to identify potential lncRNA-disease associations has become a hot topic. Existing network-based approaches are usually focused on the intrinsic features of lncRNAs and diseases but ignore the heterogeneous information of biological networks. Considering the limitations in previous methods, we propose LDAH2V, an efficient computational framework for predicting potential lncRNA-disease associations. LDAH2V uses the HIN2Vec to calculate the meta-path and feature vector for each lncRNA-disease pair in the heterogeneous information network (HIN), which consists of lncRNA similarity network, disease similarity network, miRNA similarity network, and the associations between them. Then, a Gradient Boosting Tree (GBT) classifier to predict lncRNA-disease associations is built with the feature vectors. The results show that LDAH2V performs significantly better than the four existing state-of-the-art methods and gains an AUC of 0.97 in the 10-fold cross-validation test. Furthermore, case studies of colon cancer and ovarian cancer-related lncRNAs have been confirmed in related databases and medical literature.
Collapse
|
41
|
BiGAN: LncRNA-disease association prediction based on bidirectional generative adversarial network. BMC Bioinformatics 2021; 22:357. [PMID: 34193046 PMCID: PMC8247109 DOI: 10.1186/s12859-021-04273-7] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 06/15/2021] [Indexed: 12/11/2022] Open
Abstract
Background An increasing number of studies have shown that lncRNAs are crucial for the control of hormones and the regulation of various physiological processes in the human body, and deletion mutations in RNA are related to many human diseases. LncRNA- disease association prediction is very useful for understanding pathogenesis, diagnosis, and prevention of diseases, and is helpful for labelling relevant biological information. Results In this manuscript, we propose a computational model named bidirectional generative adversarial network (BiGAN), which consists of an encoder, a generator, and a discriminator to predict new lncRNA-disease associations. We construct features between lncRNA and disease pairs by utilizing the disease semantic similarity, lncRNA sequence similarity, and Gaussian interaction profile kernel similarities of lncRNAs and diseases. The BiGAN maps the latent features of similarity features to predict unverified association between lncRNAs and diseases. The computational results have proved that the BiGAN performs significantly better than other state-of-the-art approaches in cross-validation. We employed the proposed model to predict candidate lncRNAs for renal cancer and colon cancer. The results are promising. Case studies show that almost 70% of lncRNAs in the top 10 prediction lists are verified by recent biological research. Conclusion The experimental results indicated that our proposed model had an accurate predictive ability for the association of lncRNA-disease pairs.
Collapse
|
42
|
Yuan L, Zhao J, Sun T, Shen Z. A machine learning framework that integrates multi-omics data predicts cancer-related LncRNAs. BMC Bioinformatics 2021; 22:332. [PMID: 34134612 PMCID: PMC8210375 DOI: 10.1186/s12859-021-04256-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 06/07/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND LncRNAs (Long non-coding RNAs) are a type of non-coding RNA molecule with transcript length longer than 200 nucleotides. LncRNA has been novel candidate biomarkers in cancer diagnosis and prognosis. However, it is difficult to discover the true association mechanism between lncRNAs and complex diseases. The unprecedented enrichment of multi-omics data and the rapid development of machine learning technology provide us with the opportunity to design a machine learning framework to study the relationship between lncRNAs and complex diseases. RESULTS In this article, we proposed a new machine learning approach, namely LGDLDA (LncRNA-Gene-Disease association networks based LncRNA-Disease Association prediction), for disease-related lncRNAs association prediction based multi-omics data, machine learning methods and neural network neighborhood information aggregation. Firstly, LGDLDA calculates the similarity matrix of lncRNA, gene and disease respectively, and it calculates the similarity between lncRNAs through the lncRNA expression profile matrix, lncRNA-miRNA interaction matrix and lncRNA-protein interaction matrix. We obtain gene similarity matrix by calculating the lncRNA-gene association matrix and the gene-disease association matrix, and we obtain disease similarity matrix by calculating the disease ontology, the disease-miRNA association matrix, and Gaussian interaction profile kernel similarity. Secondly, LGDLDA integrates the neighborhood information in similarity matrices by using nonlinear feature learning of neural network. Thirdly, LGDLDA uses embedded node representations to approximate the observed matrices. Finally, LGDLDA ranks candidate lncRNA-disease pairs and then selects potential disease-related lncRNAs. CONCLUSIONS Compared with lncRNA-disease prediction methods, our proposed method takes into account more critical information and obtains the performance improvement cancer-related lncRNA predictions. Randomly split data experiment results show that the stability of LGDLDA is better than IDHI-MIRW, NCPLDA, LncDisAP and NCPHLDA. The results on different simulation data sets show that LGDLDA can accurately and effectively predict the disease-related lncRNAs. Furthermore, we applied the method to three real cancer data including gastric cancer, colorectal cancer and breast cancer to predict potential cancer-related lncRNAs.
Collapse
Affiliation(s)
- Lin Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Daxue Road 3501, Jinan, 250353, Shandong, China
| | - Jing Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Daxue Road 3501, Jinan, 250353, Shandong, China
| | - Tao Sun
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Daxue Road 3501, Jinan, 250353, Shandong, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Changjiang Road 80, Nanyang, 473004, Henan, China.
| |
Collapse
|
43
|
Du B, Tang L, Liu L, Zhou W. Predicting LncRNA-Disease Association Based on Generative Adversarial Network. Curr Gene Ther 2021; 22:144-151. [PMID: 33998988 DOI: 10.2174/1566523221666210506131055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 02/19/2021] [Accepted: 02/24/2021] [Indexed: 11/22/2022]
Abstract
BACKGROUND Increasing research reveals that long non-coding RNAs (lncRNAs) play an important role in various biological processes of human diseases. Nonetheless, only a handful of lncRNA-disease associations have been experimentally verified. The study of lncRNA-disease association prediction based on the computational model has provided a preliminary basis for biological experiments to a great degree so as to cut down the huge cost of wet lab experiments. OBJECTIVE This study aims to learn the real distribution of lncRNA-disease association from a limited number of known lncRNA-disease association data. This paper proposes a new lncRNA-disease association prediction model called LDA-GAN based on a generative adversarial network (GAN). METHOD Aiming at the problems of slow convergence rate, training instabilities, and unavailability of discrete data in traditional GAN, LDA-GAN utilizes the Gumbel-softmax technology to construct a differentiable process for simulating discrete sampling. Meanwhile, the generator and the discriminator of LDA-GAN are integrated to establish the overall optimization goal based on the pairwise loss function. RESULTS Experiments on standard datasets demonstrate that LDA-GAN achieves not only high stability and high efficiency in the process of confrontation learning but also gives full play to the semi-supervised learning advantage of generative adversarial learning framework for unlabeled data, which further improves the prediction accuracy of lncRNA-disease association. Besides, case studies show that LDA-GAN can accurately generate potential diseases for several lncRNAs.
Collapse
Affiliation(s)
- Biao Du
- School of Information, Yunnan Normal University, Kunming. China
| | - Lin Tang
- Key Laboratory of Educational Informatization for Nationalities Ministry of Education, Yunnan Normal University, Kunming. China
| | - Lin Liu
- School of Information, Yunnan Normal University, Kunming. China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming. China
| |
Collapse
|
44
|
Chen Q, Lai D, Lan W, Wu X, Chen B, Liu J, Chen YPP, Wang J. ILDMSF: Inferring Associations Between Long Non-Coding RNA and Disease Based on Multi-Similarity Fusion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1106-1112. [PMID: 31443046 DOI: 10.1109/tcbb.2019.2936476] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The dysregulation and mutation of long non-coding RNAs (lncRNAs) have been proved to result in a variety of human diseases. Identifying potential disease-related lncRNAs may benefit disease diagnosis, treatment and prognosis. A number of methods have been proposed to predict the potential lncRNA-disease relationships. However, most of them may give rise to incorrect results due to relying on single similarity measure. This article proposes a novel framework (ILDMSF) by fusing the lncRNA similarities and disease similarities, which are measured by lncRNA-related gene and known lncRNA-disease interaction and disease semantic interaction, and known lncRNA-disease interaction, respectively. Further, the support vector machine is employed to identify the potential lncRNA-disease associations based on the integrated similarity. The leave-one-out cross validation is performed to compare ILDMSF with other state of the art methods. The experimental results demonstrate our method is prospective in exploring potential correlations between lncRNA and disease.
Collapse
|
45
|
A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations. BMC Bioinformatics 2021; 22:136. [PMID: 33745450 PMCID: PMC7983260 DOI: 10.1186/s12859-021-04073-z] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 03/11/2021] [Indexed: 01/01/2023] Open
Abstract
Background Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately. Results We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach. Conclusion Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04073-z.
Collapse
|
46
|
Yan C, Duan G, Wu FX, Pan Y, Wang J. MCHMDA:Predicting Microbe-Disease Associations Based on Similarities and Low-Rank Matrix Completion. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:611-620. [PMID: 31295117 DOI: 10.1109/tcbb.2019.2926716] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
With the development of high-through sequencing technology and microbiology, many studies have evidenced that microbes are associated with human diseases, such as obesity, liver cancer, and so on. Therefore, identifying the association between microbes and diseases has become an important study topic in current bioinformatics. The emergence of microbe-disease association database has provided an unprecedented opportunity to develop computational method for predicting microbe-disease associations. In the study, we propose a low-rank matrix completion method (called MCHMDA) to predict microbe-disease associations by integrating similarities of microbes and diseases and known microbe-disease associations into a heterogeneous network. The microbe similarity is computed from Gaussian Interaction Profile (GIP) kernel similarity based on the known microbe-disease associations. Then, we further improve the microbe similarity by taking into account the inhabiting organs of these microbes in human body. The disease similarity is computed by the average of disease GIP similarity, disease symptom-based similarity, and disease functional similarity. Then, we construct a heterogeneous microbe-disease association network by integrating the microbe similarity network, disease similarity network, and known microbe-disease association network. Finally, a matrix completion method is used to calculate the association scores of unknown microbe-disease pairs by the fast Singular Value Thresholding (SVT) algorithm. Via 5-fold Cross Validation (5CV) and Leave-One-Out Cross Validation (LOOCV), we evaluate the prediction performances of MCHMDA and other state-of-the-art methods which include BRWMDA, NGRHMDA, LRLSHMDA, and KATZHMDA. On benchmark dataset HMDAD, the experimental results show that MCHMDA outperforms other methods in terms of area under the receiver operating characteristic curve (AUC). MCHMDA achieves the AUC values of 0.9251 and 0.9495 in 5CV and LOOCV, respectively, which are the highest values among the competing methods. In addition, we also further indicate the prediction generality of MCHMDA on an expanded microbe-disease associations dataset (HMDAD-SUP). Finally, case studies prove the prediction ability in practical applications.
Collapse
|
47
|
Lin Y, Ma X. Predicting lincRNA-Disease Association in Heterogeneous Networks Using Co-regularized Non-negative Matrix Factorization. Front Genet 2021; 11:622234. [PMID: 33510774 PMCID: PMC7835800 DOI: 10.3389/fgene.2020.622234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/03/2020] [Indexed: 02/02/2023] Open
Abstract
Long intergenic non-coding ribonucleic acids (lincRNAs) are critical regulators for many complex diseases, and identification of disease-lincRNA association is both costly and time-consuming. Therefore, it is necessary to design computational approaches to predict the disease-lincRNA associations that shed light on the mechanisms of diseases. In this study, we develop a co-regularized non-negative matrix factorization (aka Cr-NMF) to identify potential disease-lincRNA associations by integrating the gene expression of lincRNAs, genetic interaction network for mRNA genes, gene-lincRNA associations, and disease-gene associations. The Cr-NMF algorithm factorizes the disease-lincRNA associations, while the other associations/interactions are integrated using regularization. Furthermore, the regularization does not only preserve the topological structure of the lincRNA co-expression network, but also maintains the links “lincRNA → gene → disease.” Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy on predicting the disease-lincRNA associations. The model and algorithm provide an effective way to explore disease-lncRNA associations.
Collapse
Affiliation(s)
- Yong Lin
- School of Physics and Electronic Information Engineering, Ningxia Normal University, Guyuan, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
48
|
Ning L, Cui T, Zheng B, Wang N, Luo J, Yang B, Du M, Cheng J, Dou Y, Wang D. MNDR v3.0: mammal ncRNA-disease repository with increased coverage and annotation. Nucleic Acids Res 2021; 49:D160-D164. [PMID: 32833025 PMCID: PMC7779040 DOI: 10.1093/nar/gkaa707] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/12/2020] [Accepted: 08/14/2020] [Indexed: 02/07/2023] Open
Abstract
Many studies have indicated that non-coding RNA (ncRNA) dysfunction is closely related to numerous diseases. Recently, accumulated ncRNA-disease associations have made related databases insufficient to meet the demands of biomedical research. The constant updating of ncRNA-disease resources has become essential. Here, we have updated the mammal ncRNA-disease repository (MNDR, http://www.rna-society.org/mndr/) to version 3.0, containing more than one million entries, four-fold increment in data compared to the previous version. Experimental and predicted circRNA-disease associations have been integrated, increasing the number of categories of ncRNAs to five, and the number of mammalian species to 11. Moreover, ncRNA-disease related drug annotations and associations, as well as ncRNA subcellular localizations and interactions, were added. In addition, three ncRNA-disease (miRNA/lncRNA/circRNA) prediction tools were provided, and the website was also optimized, making it more practical and user-friendly. In summary, MNDR v3.0 will be a valuable resource for the investigation of disease mechanisms and clinical treatment strategies.
Collapse
Affiliation(s)
- Lin Ning
- Dermatology Hospital, Southern Medical University, Guangzhou 510091, China
| | - Tianyu Cui
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Boyang Zheng
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Nuo Wang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Jiaxin Luo
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Beilei Yang
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Mengze Du
- Qingyuan People's Hospital, The Sixth Affiliated Hospital of Guangzhou Medical University, B24 Yinquan South Road, Qingyuan 511518, Guangdong Province, People's Republic of China
| | - Jun Cheng
- Affiliated Foshan Maternity & Child Healthcare Hospital, Southern Medical University (Foshan Maternity & Child Healthcare Hospital)
| | - Yiying Dou
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
| | - Dong Wang
- Dermatology Hospital, Southern Medical University, Guangzhou 510091, China
- Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou 510515, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 611731, China
| |
Collapse
|
49
|
Alam T, Al-Absi HRH, Schmeier S. Deep Learning in LncRNAome: Contribution, Challenges, and Perspectives. Noncoding RNA 2020; 6:E47. [PMID: 33266128 PMCID: PMC7711891 DOI: 10.3390/ncrna6040047] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 10/27/2020] [Accepted: 11/06/2020] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNA), the pervasively transcribed part of the mammalian genome, have played a significant role in changing our protein-centric view of genomes. The abundance of lncRNAs and their diverse roles across cell types have opened numerous avenues for the research community regarding lncRNAome. To discover and understand lncRNAome, many sophisticated computational techniques have been leveraged. Recently, deep learning (DL)-based modeling techniques have been successfully used in genomics due to their capacity to handle large amounts of data and produce relatively better results than traditional machine learning (ML) models. DL-based modeling techniques have now become a choice for many modeling tasks in the field of lncRNAome as well. In this review article, we summarized the contribution of DL-based methods in nine different lncRNAome research areas. We also outlined DL-based techniques leveraged in lncRNAome, highlighting the challenges computational scientists face while developing DL-based models for lncRNAome. To the best of our knowledge, this is the first review article that summarizes the role of DL-based techniques in multiple areas of lncRNAome.
Collapse
Affiliation(s)
- Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Hamada R. H. Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Sebastian Schmeier
- School of Natural and Computational Sciences, Massey University, Auckland 0632, New Zealand;
| |
Collapse
|
50
|
Wang J, Wang L. Prediction and prioritization of autism-associated long non-coding RNAs using gene expression and sequence features. BMC Bioinformatics 2020; 21:505. [PMID: 33160303 PMCID: PMC7648398 DOI: 10.1186/s12859-020-03843-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 10/27/2020] [Indexed: 01/04/2023] Open
Abstract
Background Autism spectrum disorders (ASD) refer to a range of neurodevelopmental conditions, which are genetically complex and heterogeneous with most of the genetic risk factors also found in the unaffected general population. Although all the currently known ASD risk genes code for proteins, long non-coding RNAs (lncRNAs) as essential regulators of gene expression have been implicated in ASD. Some lncRNAs show altered expression levels in autistic brains, but their roles in ASD pathogenesis are still unclear. Results In this study, we have developed a new machine learning approach to predict candidate lncRNAs associated with ASD. Particularly, the knowledge learnt from protein-coding ASD risk genes was transferred to the prediction and prioritization of ASD-associated lncRNAs. Both developmental brain gene expression data and transcript sequence were found to contain relevant information for ASD risk gene prediction. During the pre-training phase of model construction, an autoencoder network was implemented for a representation learning of the gene expression data, and a random-forest-based feature selection was applied to the transcript-sequence-derived k-mers. Our models, including logistic regression, support vector machine and random forest, showed robust performance based on tenfold cross-validations as well as candidate prioritization with hypothetical loci. We then utilized the models to predict and prioritize a list of candidate lncRNAs, including some reported to be cis-regulators of known ASD risk genes, for further investigation.
Conclusions Our results suggest that ASD risk genes can be accurately predicted using developmental brain gene expression data and transcript sequence features, and the models may provide useful information for functional characterization of the candidate lncRNAs associated with ASD.
Collapse
Affiliation(s)
- Jun Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, 29634, USA
| | - Liangjiang Wang
- Department of Genetics and Biochemistry, Clemson University, Clemson, SC, 29634, USA. .,Center for Human Genetics, Clemson University, Clemson, SC, 29634, USA.
| |
Collapse
|