1
|
Zhao X, Wu J, Zhao X, Yin M. Multi-view contrastive heterogeneous graph attention network for lncRNA-disease association prediction. Brief Bioinform 2023; 24:6931723. [PMID: 36528809 DOI: 10.1093/bib/bbac548] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/23/2022] [Accepted: 11/11/2022] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Exploring the potential long noncoding RNA (lncRNA)-disease associations (LDAs) plays a critical role for understanding disease etiology and pathogenesis. Given the high cost of biological experiments, developing a computational method is a practical necessity to effectively accelerate experimental screening process of candidate LDAs. However, under the high sparsity of LDA dataset, many computational models hardly exploit enough knowledge to learn comprehensive patterns of node representations. Moreover, although the metapath-based GNN has been recently introduced into LDA prediction, it discards intermediate nodes along the meta-path and results in information loss. RESULTS This paper presents a new multi-view contrastive heterogeneous graph attention network (GAT) for lncRNA-disease association prediction, MCHNLDA for brevity. Specifically, MCHNLDA firstly leverages rich biological data sources of lncRNA, gene and disease to construct two-view graphs, feature structural graph of feature schema view and lncRNA-gene-disease heterogeneous graph of network topology view. Then, we design a cross-contrastive learning task to collaboratively guide graph embeddings of the two views without relying on any labels. In this way, we can pull closer the nodes of similar features and network topology, and push other nodes away. Furthermore, we propose a heterogeneous contextual GAT, where long short-term memory network is incorporated into attention mechanism to effectively capture sequential structure information along the meta-path. Extensive experimental comparisons against several state-of-the-art methods show the effectiveness of proposed framework.The code and data of proposed framework is freely available at https://github.com/zhaoxs686/MCHNLDA.
Collapse
Affiliation(s)
- Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jun Wu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Minghao Yin
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
2
|
MSF-UBRW: An Improved Unbalanced Bi-Random Walk Method to Infer Human lncRNA-Disease Associations. Genes (Basel) 2022; 13:genes13112032. [DOI: 10.3390/genes13112032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 10/24/2022] [Accepted: 10/28/2022] [Indexed: 11/06/2022] Open
Abstract
Long-non-coding RNA (lncRNA) is a transcription product that exerts its biological functions through a variety of mechanisms. The occurrence and development of a series of human diseases are closely related to abnormal expression levels of lncRNAs. Scientists have developed many computational models to identify the lncRNA-disease associations (LDAs). However, many potential LDAs are still unknown. In this paper, a novel method, namely MSF-UBRW (multiple similarities fusion based on unbalanced bi-random walk), is designed to explore new LDAs. First, two similarities (functional similarity and Gaussian Interaction Profile kernel similarity) of lncRNAs are calculated and fused linearly, also for disease data. Then, the known association matrix is preprocessed. Next, the linear neighbor similarities of lncRNAs and diseases are calculated, respectively. After that, the potential associations are predicted based on unbalanced bi-random walk. The fusion of multiple similarities improves the prediction performance of MSF-UBRW to a large extent. Finally, the prediction ability of the MSF-UBRW algorithm is measured by two statistical methods, leave-one-out cross-validation (LOOCV) and 5-fold cross-validation (5-fold CV). The AUCs of 0.9391 in LOOCV and 0.9183 (±0.0054) in 5-fold CV confirmed the reliable prediction ability of the MSF-UBRW method. Case studies of three common diseases also show that the MSF-UBRW method can infer new LDAs effectively.
Collapse
|
3
|
Zhao X, Zhao X, Yin M. Heterogeneous graph attention network based on meta-paths for lncRNA-disease association prediction. Brief Bioinform 2021; 23:6377515. [PMID: 34585231 DOI: 10.1093/bib/bbab407] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 08/26/2021] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION Discovering long noncoding RNA (lncRNA)-disease associations is a fundamental and critical part in understanding disease etiology and pathogenesis. However, only a few lncRNA-disease associations have been identified because of the time-consuming and expensive biological experiments. As a result, an efficient computational method is of great importance and urgently needed for identifying potential lncRNA-disease associations. With the ability of exploiting node features and relationships in network, graph-based learning models have been commonly utilized by these biomolecular association predictions. However, the capability of these methods in comprehensively fusing node features, heterogeneous topological structures and semantic information is distant from optimal or even satisfactory. Moreover, there are still limitations in modeling complex associations between lncRNAs and diseases. RESULTS In this paper, we develop a novel heterogeneous graph attention network framework based on meta-paths for predicting lncRNA-disease associations, denoted as HGATLDA. At first, we conduct a heterogeneous network by incorporating lncRNA and disease feature structural graphs, and lncRNA-disease topological structural graph. Then, for the heterogeneous graph, we conduct multiple metapath-based subgraphs and then utilize graph attention network to learn node embeddings from neighbors of these homogeneous and heterogeneous subgraphs. Next, we implement attention mechanism to adaptively assign weights to multiple metapath-based subgraphs and get more semantic information. In addition, we combine neural inductive matrix completion to reconstruct lncRNA-disease associations, which is applied for capturing complicated associations between lncRNAs and diseases. Moreover, we incorporate cost-sensitive neural network into the loss function to tackle the commonly imbalance problem in lncRNA-disease association prediction. Finally, extensive experimental results demonstrate the effectiveness of our proposed framework.
Collapse
Affiliation(s)
- Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Minghao Yin
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
4
|
Lin Y, Ma X. Predicting lincRNA-Disease Association in Heterogeneous Networks Using Co-regularized Non-negative Matrix Factorization. Front Genet 2021; 11:622234. [PMID: 33510774 PMCID: PMC7835800 DOI: 10.3389/fgene.2020.622234] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2020] [Accepted: 12/03/2020] [Indexed: 02/02/2023] Open
Abstract
Long intergenic non-coding ribonucleic acids (lincRNAs) are critical regulators for many complex diseases, and identification of disease-lincRNA association is both costly and time-consuming. Therefore, it is necessary to design computational approaches to predict the disease-lincRNA associations that shed light on the mechanisms of diseases. In this study, we develop a co-regularized non-negative matrix factorization (aka Cr-NMF) to identify potential disease-lincRNA associations by integrating the gene expression of lincRNAs, genetic interaction network for mRNA genes, gene-lincRNA associations, and disease-gene associations. The Cr-NMF algorithm factorizes the disease-lincRNA associations, while the other associations/interactions are integrated using regularization. Furthermore, the regularization does not only preserve the topological structure of the lincRNA co-expression network, but also maintains the links “lincRNA → gene → disease.” Experimental results demonstrate that the proposed algorithm outperforms state-of-the-art methods in terms of accuracy on predicting the disease-lincRNA associations. The model and algorithm provide an effective way to explore disease-lncRNA associations.
Collapse
Affiliation(s)
- Yong Lin
- School of Physics and Electronic Information Engineering, Ningxia Normal University, Guyuan, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
5
|
Zhang Y, Chen M, Li A, Cheng X, Jin H, Liu Y. LDAI-ISPS: LncRNA-Disease Associations Inference Based on Integrated Space Projection Scores. Int J Mol Sci 2020; 21:E1508. [PMID: 32098405 PMCID: PMC7073162 DOI: 10.3390/ijms21041508] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/18/2020] [Accepted: 02/19/2020] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNAs (long ncRNAs, lncRNAs) of all kinds have been implicated in a range of cell developmental processes and diseases, while they are not translated into proteins. Inferring diseases associated lncRNAs by computational methods can be helpful to understand the pathogenesis of diseases, but those current computational methods still have not achieved remarkable predictive performance: such as the inaccurate construction of similarity networks and inadequate numbers of known lncRNA-disease associations. In this research, we proposed a lncRNA-disease associations inference based on integrated space projection scores (LDAI-ISPS) composed of the following key steps: changing the Boolean network of known lncRNA-disease associations into the weighted networks via combining all the global information (e.g., disease semantic similarities, lncRNA functional similarities, and known lncRNA-disease associations); obtaining the space projection scores via vector projections of the weighted networks to form the final prediction scores without biases. The leave-one-out cross validation (LOOCV) results showed that, compared with other methods, LDAI-ISPS had a higher accuracy with area-under-the-curve (AUC) value of 0.9154 for inferring diseases, with AUC value of 0.8865 for inferring new lncRNAs (whose associations related to diseases are unknown), with AUC value of 0.7518 for inferring isolated diseases (whose associations related to lncRNAs are unknown). A case study also confirmed the predictive performance of LDAI-ISPS as a helper for traditional biological experiments in inferring the potential LncRNA-disease associations and isolated diseases.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Min Chen
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang 421002, China
| | - Ang Li
- Hunan Institute of Technology, School of Computer Science and Technology, Hengyang 421002, China
| | - Xiaohui Cheng
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Hong Jin
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| | - Yarong Liu
- School of Information Science and Engineering, Guilin University of Technology, Guilin 541004, China
| |
Collapse
|
6
|
Biswas AK, Kim DC, Kang M, Gao JX. Robust Inductive Matrix Completion Strategy to Explore Associations Between LincRNAs and Human Disease Phenotypes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2066-2077. [PMID: 29994224 DOI: 10.1109/tcbb.2018.2844816] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Over the past few years, it has been established that a number of long intergenic non-coding RNAs (lincRNAs) are linked to a wide variety of human diseases. The relationship among many other lincRNAs still remains as puzzle. Validation of such link between the two entities through biological experiments is expensive. However, piles of information about the two are becoming available, thanks to the High Throughput Sequencing (HTS) platforms, Genome Wide Association Studies (GWAS), etc., thereby opening opportunity for cutting-edge machine learning and data mining approaches. However, there are only a few in silico lincRNA-disease association inference tools available to date, and none of these utilizes side information of both the entities. The recently developed Inductive Matrix Completion (IMC) technique provides a recommendation platform among two entities considering respective side information. But, the formulation of IMC is incapable of handling noise and outliers that may present in the dataset, while data sparsity consideration is another issue with the standard IMC method. Thus, a robust version of IMC is needed that can solve these two issues. As a remedy, in this paper, we propose Robust Inductive Matrix Completion (RIMC) using l2,1 norm loss function as well as l2,1 norm based regularization. We applied RIMC to the available association data between human lincRNAs and OMIM disease phenotypes as well as a diverse set of side information about the lincRNAs and the diseases. Our method performs better than the state-of-the-art methods in terms of precision@k and recall@k at the top- k disease prioritization to the subject lincRNAs. We also demonstrate that RIMC is equally effective for querying about novel lincRNAs, as well as predicting rank of a newly known disease for a set of well-characterized lincRNAs. Availability: All the supporting datasets are available at the publicly accessible URL located at http://biomecis.uta.edu/~ashis/res/RIMC/.
Collapse
|
7
|
Zhang J, Zhang Z, Chen Z, Deng L. Integrating Multiple Heterogeneous Networks for Novel LncRNA-Disease Association Inference. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:396-406. [PMID: 28489543 DOI: 10.1109/tcbb.2017.2701379] [Citation(s) in RCA: 85] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Accumulating experimental evidence has indicated that long non-coding RNAs (lncRNAs) are critical for the regulation of cellular biological processes implicated in many human diseases. However, only relatively few experimentally supported lncRNA-disease associations have been reported. Developing effective computational methods to infer lncRNA-disease associations is becoming increasingly important. Current network-based algorithms typically use a network representation to identify novel associations between lncRNAs and diseases. But these methods are concentrated on specific entities of interest (lncRNAs and diseases) and they do not allow to consider networks with more than two types of entities. Considering the limitations in previous computational methods, we develop a new global network-based framework, LncRDNetFlow, to prioritize disease-related lncRNAs. LncRDNetFlow utilizes a flow propagation algorithm to integrate multiple networks based on a variety of biological information including lncRNA similarity, protein-protein interactions, disease similarity, and the associations between them to infer lncRNA-disease associations. We show that LncRDNetFlow performs significantly better than the existing state-of-the-art approaches in cross-validation. To further validate the reproducibility of the performance, we use the proposed method to identify the related lncRNAs for ovarian cancer, glioma, and cervical cancer. The results are encouraging. Many predicted lncRNAs in the top list have been verified by the biological studies.
Collapse
|
8
|
Disease genes prioritizing mechanisms: a comprehensive and systematic literature review. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s13721-017-0154-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
9
|
Gligorijević V, Malod-Dognin N, Pržulj N. Integrative methods for analyzing big data in precision medicine. Proteomics 2016; 16:741-58. [PMID: 26677817 DOI: 10.1002/pmic.201500396] [Citation(s) in RCA: 98] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2015] [Revised: 11/16/2015] [Accepted: 12/09/2015] [Indexed: 12/19/2022]
Abstract
We provide an overview of recent developments in big data analyses in the context of precision medicine and health informatics. With the advance in technologies capturing molecular and medical data, we entered the area of "Big Data" in biology and medicine. These data offer many opportunities to advance precision medicine. We outline key challenges in precision medicine and present recent advances in data integration-based methods to uncover personalized information from big data produced by various omics studies. We survey recent integrative methods for disease subtyping, biomarkers discovery, and drug repurposing, and list the tools that are available to domain scientists. Given the ever-growing nature of these big data, we highlight key issues that big data integration methods will face.
Collapse
Affiliation(s)
| | | | - Nataša Pržulj
- Department of Computing, Imperial College London, London, UK
| |
Collapse
|