1
|
Wu J, Lu P, Zhang W. Predicting associations between CircRNA and diseases through structure-aware graph transformer and path-integral convolution. Anal Biochem 2024; 692:115554. [PMID: 38710353 DOI: 10.1016/j.ab.2024.115554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 04/27/2024] [Accepted: 04/30/2024] [Indexed: 05/08/2024]
Abstract
A series of biological experiments has demonstrated that circular RNAs play a crucial regulatory role in cellular processes and may be potentially associated with diseases. Uncovering these connections helps in understanding potential disease mechanisms and advancing the development of treatment strategies. However, in biology, traditional experiments face limitations in terms of efficiency and cost, especially when enumerating possible associations. To address these limitations, several computational methods have been proposed, but existing methods only measure from a nodal perspective and cannot capture structural similarities between edges. In this study, we introduce an advanced computational method called SATPIC2CD for analyzing potential associations between circular RNAs and diseases. Specifically, we first employ an Structure-Aware Graph Transformer (SAT), which extracts five predefined metapath representations before calculating attention. This adaptive network integrates structural information into the original self-attention by aggregating information within and between paths. Subsequently, we use Path Integral Convolutional Networks (PACN) to integrate feature information for all path weights between two nodes. Afterward, we complement the network node features with feature loss and feature smoothing using Gated Recurrent Units (GRU) and node centrality. Finally, a Multi-Layer Perceptron (MLP) is employed to obtain the ultimate prediction scores for each circular RNA-disease pair. SATPIC2CD performs remarkably well, with an accuracy of up to 0.9715 measured by the Area Under the Curve (AUC) in a 5-fold cross-validation, surpassing other comparative models. Case studies further emphasize the high precision of our method in identifying circular RNA-disease associations, laying a solid foundation for guiding future biological research efforts.
Collapse
Affiliation(s)
- Jinkai Wu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China
| | - PengLi Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China.
| | - Wenqi Zhang
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050, Gansu, PR China
| |
Collapse
|
2
|
Xuan P, Lu S, Cui H, Wang S, Nakaguchi T, Zhang T. Learning Association Characteristics by Dynamic Hypergraph and Gated Convolution Enhanced Pairwise Attributes for Prediction of Disease-Related lncRNAs. J Chem Inf Model 2024; 64:3569-3578. [PMID: 38523267 DOI: 10.1021/acs.jcim.4c00245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2024]
Abstract
As the long non-coding RNAs (lncRNAs) play important roles during the incurrence and development of various human diseases, identifying disease-related lncRNAs can contribute to clarifying the pathogenesis of diseases. Most of the recent lncRNA-disease association prediction methods utilized the multi-source data about the lncRNAs and diseases. A single lncRNA may participate in multiple disease processes, and multiple lncRNAs usually are involved in the same disease process synergistically. However, the previous methods did not completely exploit the biological characteristics to construct the informative prediction models. We construct a prediction model based on adaptive hypergraph and gated convolution for lncRNA-disease association prediction (AGLDA), to embed and encode the biological characteristics about lncRNA-disease associations, the topological features from the entire heterogeneous graph perspective, and the gated enhanced pairwise features. First, the strategy for constructing hyperedges is designed to reflect the biological characteristic that multiple lncRNAs are involved in multiple disease processes. Furthermore, each hyperedge has its own biological perspective, and multiple hyperedges are beneficial for revealing the diverse relationships among multiple lncRNAs and diseases. Second, we encode the biological features of each lncRNA (disease) node using a strategy based on dynamic hypergraph convolutional networks. The strategy may adaptively learn the features of the hyperedges and formulate the dynamically evolved hypergraph topological structure. Third, a group convolutional network is established to integrate the entire heterogeneous topological structure and multiple types of node attributes within an lncRNA-disease-miRNA graph. Finally, a gated convolutional strategy is proposed to enhance the informative features of the lncRNA-disease node pairs. The comparison experiments indicate that AGLDA outperforms seven advanced prediction methods. The ablation studies confirm the effectiveness of major innovations, and the case studies validate AGLDA's ability in application for discovering potential disease-related lncRNA candidates.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Siyuan Lu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Shuai Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
3
|
He J, Li M, Qiu J, Pu X, Guo Y. HOPEXGB: A Consensual Model for Predicting miRNA/lncRNA-Disease Associations Using a Heterogeneous Disease-miRNA-lncRNA Information Network. J Chem Inf Model 2024; 64:2863-2877. [PMID: 37604142 DOI: 10.1021/acs.jcim.3c00856] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Predicting disease-related microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) is crucial to find new biomarkers for the prevention, diagnosis, and treatment of complex human diseases. Computational predictions for miRNA/lncRNA-disease associations are of great practical significance, since traditional experimental detection is expensive and time-consuming. In this paper, we proposed a consensual machine-learning technique-based prediction approach to identify disease-related miRNAs and lncRNAs by high-order proximity preserved embedding (HOPE) and eXtreme Gradient Boosting (XGB), named HOPEXGB. By connecting lncRNA, miRNA, and disease nodes based on their correlations and relationships, we first created a heterogeneous disease-miRNA-lncRNA (DML) information network to achieve an effective fusion of information on similarities, correlations, and interactions among miRNAs, lncRNAs, and diseases. In addition, a more rational negative data set was generated based on the similarities of unknown associations with the known ones, so as to effectively reduce the false negative rate in the data set for model construction. By 10-fold cross-validation, HOPE shows better performance than other graph embedding methods. The final consensual HOPEXGB model yields robust performance with a mean prediction accuracy of 0.9569 and also demonstrates high sensitivity and specificity advantages compared to lncRNA/miRNA-specific predictions. Moreover, it is superior to other existing methods and gives promising performance on the external testing data, indicating that integrating the information on lncRNA-miRNA interactions and the similarities of lncRNAs/miRNAs is beneficial for improving the prediction performance of the model. Finally, case studies on lung, stomach, and breast cancers indicate that HOPEXGB could be a powerful tool for preclinical biomarker detection and bioexperiment preliminary screening for the diagnosis and prognosis of cancers. HOPEXGB is publicly available at https://github.com/airpamper/HOPEXGB.
Collapse
Affiliation(s)
- Jian He
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Jiangguo Qiu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu 610064, China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu 610064, China
| |
Collapse
|
4
|
Hu X, Zhang P, Liu D, Zhang J, Zhang Y, Dong Y, Fan Y, Deng L. IGCNSDA: unraveling disease-associated snoRNAs with an interpretable graph convolutional network. Brief Bioinform 2024; 25:bbae179. [PMID: 38647155 PMCID: PMC11033953 DOI: 10.1093/bib/bbae179] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 12/15/2023] [Accepted: 03/27/2024] [Indexed: 04/25/2024] Open
Abstract
Accurately delineating the connection between short nucleolar RNA (snoRNA) and disease is crucial for advancing disease detection and treatment. While traditional biological experimental methods are effective, they are labor-intensive, costly and lack scalability. With the ongoing progress in computer technology, an increasing number of deep learning techniques are being employed to predict snoRNA-disease associations. Nevertheless, the majority of these methods are black-box models, lacking interpretability and the capability to elucidate the snoRNA-disease association mechanism. In this study, we introduce IGCNSDA, an innovative and interpretable graph convolutional network (GCN) approach tailored for the efficient inference of snoRNA-disease associations. IGCNSDA leverages the GCN framework to extract node feature representations of snoRNAs and diseases from the bipartite snoRNA-disease graph. SnoRNAs with high similarity are more likely to be linked to analogous diseases, and vice versa. To facilitate this process, we introduce a subgraph generation algorithm that effectively groups similar snoRNAs and their associated diseases into cohesive subgraphs. Subsequently, we aggregate information from neighboring nodes within these subgraphs, iteratively updating the embeddings of snoRNAs and diseases. The experimental results demonstrate that IGCNSDA outperforms the most recent, highly relevant methods. Additionally, our interpretability analysis provides compelling evidence that IGCNSDA adeptly captures the underlying similarity between snoRNAs and diseases, thus affording researchers enhanced insights into the snoRNA-disease association mechanism. Furthermore, we present illustrative case studies that demonstrate the utility of IGCNSDA as a valuable tool for efficiently predicting potential snoRNA-disease associations. The dataset and source code for IGCNSDA are openly accessible at: https://github.com/altriavin/IGCNSDA.
Collapse
Affiliation(s)
- Xiaowen Hu
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Pan Zhang
- Hunan Provincial Key Laboratory of Clinical Epidemiology, Xiangya School of Public Health, Central South University, 410078, ChangshaChina
| | - Dayun Liu
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Jiaxuan Zhang
- Department of Electrical and Computer Engineering, University of California, San Diego, 92093, CA, United States
| | - Yuanpeng Zhang
- School of Software, Xinjiang University, 830046, Urumqi, China
| | - Yihan Dong
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Yanhao Fan
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410075, Changsha, China
| |
Collapse
|
5
|
Ma Y, Zhao Y, Ma Y. Kernel Bayesian nonlinear matrix factorization based on variational inference for human-virus protein-protein interaction prediction. Sci Rep 2024; 14:5693. [PMID: 38454139 PMCID: PMC10920681 DOI: 10.1038/s41598-024-56208-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024] Open
Abstract
Identification of potential human-virus protein-protein interactions (PPIs) contributes to the understanding of the mechanisms of viral infection and to the development of antiviral drugs. Existing computational models often have more hyperparameters that need to be adjusted manually, which limits their computational efficiency and generalization ability. Based on this, this study proposes a kernel Bayesian logistic matrix decomposition model with automatic rank determination, VKBNMF, for the prediction of human-virus PPIs. VKBNMF introduces auxiliary information into the logistic matrix decomposition and sets the prior probabilities of the latent variables to build a Bayesian framework for automatic parameter search. In addition, we construct the variational inference framework of VKBNMF to ensure the solution efficiency. The experimental results show that for the scenarios of paired PPIs, VKBNMF achieves an average AUPR of 0.9101, 0.9316, 0.8727, and 0.9517 on the four benchmark datasets, respectively, and for the scenarios of new human (viral) proteins, VKBNMF still achieves a higher hit rate. The case study also further demonstrated that VKBNMF can be used as an effective tool for the prediction of human-virus PPIs.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yongbiao Zhao
- School of Computer, Central China Normal University, Wuhan, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| |
Collapse
|
6
|
Zhou L, Peng X, Zeng L, Peng L. Finding potential lncRNA-disease associations using a boosting-based ensemble learning model. Front Genet 2024; 15:1356205. [PMID: 38495672 PMCID: PMC10940470 DOI: 10.3389/fgene.2024.1356205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/01/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: Long non-coding RNAs (lncRNAs) have been in the clinical use as potential prognostic biomarkers of various types of cancer. Identifying associations between lncRNAs and diseases helps capture the potential biomarkers and design efficient therapeutic options for diseases. Wet experiments for identifying these associations are costly and laborious. Methods: We developed LDA-SABC, a novel boosting-based framework for lncRNA-disease association (LDA) prediction. LDA-SABC extracts LDA features based on singular value decomposition (SVD) and classifies lncRNA-disease pairs (LDPs) by incorporating LightGBM and AdaBoost into the convolutional neural network. Results: The LDA-SABC performance was evaluated under five-fold cross validations (CVs) on lncRNAs, diseases, and LDPs. It obviously outperformed four other classical LDA inference methods (SDLDA, LDNFSGB, LDASR, and IPCAF) through precision, recall, accuracy, F1 score, AUC, and AUPR. Based on the accurate LDA prediction performance of LDA-SABC, we used it to find potential lncRNA biomarkers for lung cancer. The results elucidated that 7SK and HULC could have a relationship with non-small-cell lung cancer (NSCLC) and lung adenocarcinoma (LUAD), respectively. Conclusion: We hope that our proposed LDA-SABC method can help improve the LDA identification.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Lijun Zeng
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| |
Collapse
|
7
|
Peng L, Huang L, Su Q, Tian G, Chen M, Han G. LDA-VGHB: identifying potential lncRNA-disease associations with singular value decomposition, variational graph auto-encoder and heterogeneous Newton boosting machine. Brief Bioinform 2023; 25:bbad466. [PMID: 38127089 PMCID: PMC10734633 DOI: 10.1093/bib/bbad466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Revised: 10/05/2023] [Accepted: 11/25/2023] [Indexed: 12/23/2023] Open
Abstract
Long noncoding RNAs (lncRNAs) participate in various biological processes and have close linkages with diseases. In vivo and in vitro experiments have validated many associations between lncRNAs and diseases. However, biological experiments are time-consuming and expensive. Here, we introduce LDA-VGHB, an lncRNA-disease association (LDA) identification framework, by incorporating feature extraction based on singular value decomposition and variational graph autoencoder and LDA classification based on heterogeneous Newton boosting machine. LDA-VGHB was compared with four classical LDA prediction methods (i.e. SDLDA, LDNFSGB, IPCARF and LDASR) and four popular boosting models (XGBoost, AdaBoost, CatBoost and LightGBM) under 5-fold cross-validations on lncRNAs, diseases, lncRNA-disease pairs and independent lncRNAs and independent diseases, respectively. It greatly outperformed the other methods with its prominent performance under four different cross-validations on the lncRNADisease and MNDR databases. We further investigated potential lncRNAs for lung cancer, breast cancer, colorectal cancer and kidney neoplasms and inferred the top 20 lncRNAs associated with them among all their unobserved lncRNAs. The results showed that most of the predicted top 20 lncRNAs have been verified by biomedical experiments provided by the Lnc2Cancer 3.0, lncRNADisease v2.0 and RNADisease databases as well as publications. We found that HAR1A, KCNQ1DN, ZFAT-AS1 and HAR1B could associate with lung cancer, breast cancer, colorectal cancer and kidney neoplasms, respectively. The results need further biological experimental validation. We foresee that LDA-VGHB was capable of identifying possible lncRNAs for complex diseases. LDA-VGHB is publicly available at https://github.com/plhhnu/LDA-VGHB.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
- College of Life Sciences and Chemistry, Hunan University of Technology, 412007, Hunan, China
| | - Liangliang Huang
- School of Computer Science, Hunan University of Technology, 412007, Hunan, China
| | - Qiongli Su
- Department of Pharmacy, the Affiliated Zhuzhou Hospital Xiangya Medical College CSU, 412007, Hunan, China
| | - Geng Tian
- Geneis (Beijing) Co. Ltd, China, 100102, Beijing, China
| | - Min Chen
- School of Computer Science, Hunan Institute of Technology, 421002, No. 18 Henghua Road, Zhuhui District, Hengyang, Hunan, China
| | - Guosheng Han
- School of Mathematics and Computational Science, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, 411105, Yuhu District, Xiangtan, Hunan, China
| |
Collapse
|
8
|
Wang MN, Xie XJ, You ZH, Wong L, Li LP, Chen ZH. Combining K Nearest Neighbor With Nonnegative Matrix Factorization for Predicting Circrna-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2610-2618. [PMID: 35675235 DOI: 10.1109/tcbb.2022.3180903] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Accumulating evidences show that circular RNAs (circRNAs) play an important role in regulating gene expression, and involve in many complex human diseases. Identifying associations of circRNA with disease helps to understand the pathogenesis, treatment and diagnosis of complex diseases. Since inferring circRNA-disease associations by biological experiments is costly and time-consuming, there is an urgently need to develop a computational model to identify the association between them. In this paper, we proposed a novel method named KNN-NMF, which combines K nearest neighbors with nonnegative matrix factorization to infer associations between circRNA and disease (KNN-NMF). Frist, we compute the Gaussian Interaction Profile (GIP) kernel similarity of circRNA and disease, the semantic similarity of disease, respectively. Then, the circRNA-disease new interaction profiles are established using weight K nearest neighbors to reduce the false negative association impact on prediction performance. Finally, Nonnegative Matrix Factorization is implemented to predict associations of circRNA with disease. The experiment results indicate that the prediction performance of KNN-NMF outperforms the competing methods under five-fold cross-validation. Moreover, case studies of two common diseases further show that KNN-NMF can identify potential circRNA-disease associations effectively.
Collapse
|
9
|
Su Z, Lu H, Wu Y, Li Z, Duan L. Predicting potential lncRNA biomarkers for lung cancer and neuroblastoma based on an ensemble of a deep neural network and LightGBM. Front Genet 2023; 14:1238095. [PMID: 37655066 PMCID: PMC10466784 DOI: 10.3389/fgene.2023.1238095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 07/19/2023] [Indexed: 09/02/2023] Open
Abstract
Introduction: Lung cancer is one of the most frequent neoplasms worldwide with approximately 2.2 million new cases and 1.8 million deaths each year. The expression levels of programmed death ligand-1 (PDL1) demonstrate a complex association with lung cancer. Neuroblastoma is a high-risk malignant tumor and is mainly involved in childhood patients. Identification of new biomarkers for these two diseases can significantly promote their diagnosis and therapy. However, in vivo experiments to discover potential biomarkers are costly and laborious. Consequently, artificial intelligence technologies, especially machine learning methods, provide a powerful avenue to find new biomarkers for various diseases. Methods: We developed a machine learning-based method named LDAenDL to detect potential long noncoding RNA (lncRNA) biomarkers for lung cancer and neuroblastoma using an ensemble of a deep neural network and LightGBM. LDAenDL first computes the Gaussian kernel similarity and functional similarity of lncRNAs and the Gaussian kernel similarity and semantic similarity of diseases to obtain their similar networks. Next, LDAenDL combines a graph convolutional network, graph attention network, and convolutional neural network to learn the biological features of the lncRNAs and diseases based on their similarity networks. Third, these features are concatenated and fed to an ensemble model composed of a deep neural network and LightGBM to find new lncRNA-disease associations (LDAs). Finally, the proposed LDAenDL method is applied to identify possible lncRNA biomarkers associated with lung cancer and neuroblastoma. Results: The experimental results show that LDAenDL computed the best AUCs of 0.8701, 107 0.8953, and 0.9110 under cross-validation on lncRNAs, diseases, and lncRNA-disease pairs on Dataset 1, respectively, and 0.9490, 0.9157, and 0.9708 on Dataset 2, respectively. Furthermore, AUPRs of 0.8903, 0.9061, and 0.9166 under three cross-validations were obtained on Dataset 1, and 0.9582, 0.9122, and 0.9743 on Dataset 2. The results demonstrate that LDAenDL significantly outperformed the other four classical LDA prediction methods (i.e., SDLDA, LDNFSGB, IPCAF, and LDASR). Case studies demonstrate that CCDC26 and IFNG-AS1 may be new biomarkers of lung cancer, SNHG3 may associate with PDL1 for lung cancer, and HOTAIR and BDNF-AS may be potential biomarkers of neuroblastoma. Conclusion: We hope that the proposed LDAenDL method can help the development of targeted therapies for these two diseases.
Collapse
Affiliation(s)
- Zhenguo Su
- Clinical Lab, Yantai Affiliated Hospital of Binzhou Medical University, Yantai, China
| | - Huihui Lu
- Department of Thoracic Cardiovascular Surgery, Hunan Province Directly Affiliated TCM Hospital, Zhuzhou, China
| | - Yan Wu
- Geneis (Beijing) Co., Ltd., Beijing, China
| | - Zejun Li
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lian Duan
- Faculty of Pediatrics, The Chinese PLA General Hospital, Beijing, China
- Department of Pediatric Surgery, The Seventh Medical Center of PLA General Hospital, Beijing, China
- National Engineering Laboratory for Birth Defects Prevention and Control of Key Technology, Beijing, China
- Beijing Key Laboratory of Pediatric Organ Failure, Beijing, China
| |
Collapse
|
10
|
Li Y, Zhang M, Shang J, Li F, Ren Q, Liu JX. iLncDA-RSN: identification of lncRNA-disease associations based on reliable similarity networks. Front Genet 2023; 14:1249171. [PMID: 37614816 PMCID: PMC10442839 DOI: 10.3389/fgene.2023.1249171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 07/27/2023] [Indexed: 08/25/2023] Open
Abstract
Identification of disease-associated long non-coding RNAs (lncRNAs) is crucial for unveiling the underlying genetic mechanisms of complex diseases. Multiple types of similarity networks of lncRNAs (or diseases) can complementary and comprehensively characterize their similarities. Hence, in this study, we presented a computational model iLncDA-RSN based on reliable similarity networks for identifying potential lncRNA-disease associations (LDAs). Specifically, for constructing reliable similarity networks of lncRNAs and diseases, miRNA heuristic information with lncRNAs and diseases is firstly introduced to construct their respective Jaccard similarity networks; then Gaussian interaction profile (GIP) kernel similarity networks and Jaccard similarity networks of lncRNAs and diseases are provided based on the lncRNA-disease association network; a random walk with restart strategy is finally applied on Jaccard similarity networks, GIP kernel similarity networks, as well as lncRNA functional similarity network and disease semantic similarity network to construct reliable similarity networks. Depending on the lncRNA-disease association network and the reliable similarity networks, feature vectors of lncRNA-disease pairs are integrated from lncRNA and disease perspectives respectively, and then dimensionality reduced by the elastic net. Two random forests are at last used together on different lncRNA-disease association feature sets to identify potential LDAs. The iLncDA-RSN is evaluated by five-fold cross-validation to analyse its prediction performance, results of which show that the iLncDA-RSN outperforms the compared models. Furthermore, case studies of different complex diseases demonstrate the effectiveness of the iLncDA-RSN in identifying potential LDAs.
Collapse
Affiliation(s)
| | | | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, China
| | | | | | | |
Collapse
|
11
|
Wong L, Wang L, You ZH, Yuan CA, Huang YA, Cao MY. GKLOMLI: a link prediction model for inferring miRNA-lncRNA interactions by using Gaussian kernel-based method on network profile and linear optimization algorithm. BMC Bioinformatics 2023; 24:188. [PMID: 37158823 PMCID: PMC10169329 DOI: 10.1186/s12859-023-05309-w] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 04/27/2023] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND The limited knowledge of miRNA-lncRNA interactions is considered as an obstruction of revealing the regulatory mechanism. Accumulating evidence on Human diseases indicates that the modulation of gene expression has a great relationship with the interactions between miRNAs and lncRNAs. However, such interaction validation via crosslinking-immunoprecipitation and high-throughput sequencing (CLIP-seq) experiments that inevitably costs too much money and time but with unsatisfactory results. Therefore, more and more computational prediction tools have been developed to offer many reliable candidates for a better design of further bio-experiments. METHODS In this work, we proposed a novel link prediction model based on Gaussian kernel-based method and linear optimization algorithm for inferring miRNA-lncRNA interactions (GKLOMLI). Given an observed miRNA-lncRNA interaction network, the Gaussian kernel-based method was employed to output two similarity matrixes of miRNAs and lncRNAs. Based on the integrated matrix combined with similarity matrixes and the observed interaction network, a linear optimization-based link prediction model was trained for inferring miRNA-lncRNA interactions. RESULTS To evaluate the performance of our proposed method, k-fold cross-validation (CV) and leave-one-out CV were implemented, in which each CV experiment was carried out 100 times on a training set generated randomly. The high area under the curves (AUCs) at 0.8623 ± 0.0027 (2-fold CV), 0.9053 ± 0.0017 (5-fold CV), 0.9151 ± 0.0013 (10-fold CV), and 0.9236 (LOO-CV), illustrated the precision and reliability of our proposed method. CONCLUSION GKLOMLI with high performance is anticipated to be used to reveal underlying interactions between miRNA and their target lncRNAs, and deciphers the potential mechanisms of the complex diseases.
Collapse
Affiliation(s)
- Leon Wong
- Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, 200092, Shanghai, China
| | - Lei Wang
- Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277160, China.
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710139, China.
| | - Chang-An Yuan
- Guangxi Key Lab of Human-machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China
| | - Yu-An Huang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710139, China
| | - Mei-Yuan Cao
- School of Electrical and Electronic Engineering, Guangdong Technology College, Zhaoqing, 526100, China
- Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM, 43600, Bangi, Selangor, Malaysia
| |
Collapse
|
12
|
Wang MN, Li Y, Lei LL, Ding DW, Xie XJ. Combining non-negative matrix factorization with graph Laplacian regularization for predicting drug-miRNA associations based on multi-source information fusion. Front Pharmacol 2023; 14:1132012. [PMID: 36817132 PMCID: PMC9931722 DOI: 10.3389/fphar.2023.1132012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 01/16/2023] [Indexed: 02/05/2023] Open
Abstract
Increasing evidences suggest that miRNAs play a key role in the occurrence and progression of many complex human diseases. Therefore, targeting dysregulated miRNAs with small molecule drugs in the clinical has become a new treatment. Nevertheless, it is high cost and time-consuming for identifying miRNAs-targeted with drugs by biological experiments. Thus, more reliable computational method for identification associations of drugs with miRNAs urgently need to be developed. In this study, we proposed an efficient method, called GNMFDMA, to predict potential associations of drug with miRNA by combining graph Laplacian regularization with non-negative matrix factorization. We first calculated the overall similarity matrices of drugs and miRNAs according to the collected different biological information. Subsequently, the new drug-miRNA association adjacency matrix was reformulated based on the K nearest neighbor profiles so as to put right the false negative associations. Finally, graph Laplacian regularization collaborative non-negative matrix factorization was used to calculate the association scores of drugs with miRNAs. In the cross validation, GNMFDMA obtains AUC of 0.9193, which outperformed the existing methods. In addition, case studies on three common drugs (i.e., 5-Aza-CdR, 5-FU and Gemcitabine), 30, 31 and 34 of the top-50 associations inferred by GNMFDMA were verified. These results reveal that GNMFDMA is a reliable and efficient computational approach for identifying the potential drug-miRNA associations.
Collapse
Affiliation(s)
- Mei-Neng Wang
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - Yu Li
- School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou, China,*Correspondence: Yu Li,
| | - Li-Lan Lei
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - De-Wu Ding
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| | - Xue-Jun Xie
- School of Mathematics and Computer Science, Yichun University, Yichun, China
| |
Collapse
|
13
|
Sheng N, Huang L, Lu Y, Wang H, Yang L, Gao L, Xie X, Fu Y, Wang Y. Data resources and computational methods for lncRNA-disease association prediction. Comput Biol Med 2023; 153:106527. [PMID: 36610216 DOI: 10.1016/j.compbiomed.2022.106527] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/08/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Increasing interest has been attracted in deciphering the potential disease pathogenesis through lncRNA-disease association (LDA) prediction, regarding to the diverse functional roles of lncRNAs in genome regulation. Whilst, computational models and algorithms benefit systematic biology research, even facilitate the classical biological experimental procedures. In this review, we introduce representative diseases associated with lncRNAs, such as cancers, cardiovascular diseases, and neurological diseases. Current publicly available resources related to lncRNAs and diseases have also been included. Furthermore, all of the 64 computational methods for LDA prediction have been divided into 5 groups, including machine learning-based methods, network propagation-based methods, matrix factorization- and completion-based methods, deep learning-based methods, and graph neural network-based methods. The common evaluation methods and metrics in LDA prediction have also been discussed. Finally, the challenges and future trends in LDA prediction have been discussed. Recent advances in LDA prediction approaches have been summarized in the GitHub repository at https://github.com/sheng-n/lncRNA-disease-methods.
Collapse
Affiliation(s)
- Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| | - Yuting Lu
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Hao Wang
- Department of Hepatopancreatobiliary Surgery, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Ling Gao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|
14
|
Li S, Chang M, Tong L, Wang Y, Wang M, Wang F. Screening potential lncRNA biomarkers for breast cancer and colorectal cancer combining random walk and logistic matrix factorization. Front Genet 2023; 13:1023615. [PMID: 36744179 PMCID: PMC9895102 DOI: 10.3389/fgene.2022.1023615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Accepted: 10/10/2022] [Indexed: 01/21/2023] Open
Abstract
Breast cancer and colorectal cancer are two of the most common malignant tumors worldwide. They cause the leading causes of cancer mortality. Many researches have demonstrated that long noncoding RNAs (lncRNAs) have close linkages with the occurrence and development of the two cancers. Therefore, it is essential to design an effective way to identify potential lncRNA biomarkers for them. In this study, we developed a computational method (LDA-RWLMF) by integrating random walk with restart and Logistic Matrix Factorization to investigate the roles of lncRNA biomarkers in the prognosis and diagnosis of the two cancers. We first fuse disease semantic and Gaussian association profile similarities and lncRNA functional and Gaussian association profile similarities. Second, we design a negative selection algorithm to extract negative LncRNA-Disease Associations (LDA) based on random walk. Third, we develop a logistic matrix factorization model to predict possible LDAs. We compare our proposed LDA-RWLMF method with four classical LDA prediction methods, that is, LNCSIM1, LNCSIM2, ILNCSIM, and IDSSIM. The results from 5-fold cross validation on the MNDR dataset show that LDA-RWLMF computes the best AUC value of 0.9312, outperforming the above four LDA prediction methods. Finally, we rank all lncRNA biomarkers for the two cancers after determining the performance of LDA-RWLMF, respectively. We find that 48 and 50 lncRNAs have the highest association scores with breast cancer and colorectal cancer among all lncRNAs known to associate with them on the MNDR dataset, respectively. We predict that lncRNAs HULC and HAR1A could be separately potential biomarkers for breast cancer and colorectal cancer and need to biomedical experimental validation.
Collapse
|
15
|
Zhao X, Wu J, Zhao X, Yin M. Multi-view contrastive heterogeneous graph attention network for lncRNA-disease association prediction. Brief Bioinform 2023; 24:6931723. [PMID: 36528809 DOI: 10.1093/bib/bbac548] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/23/2022] [Accepted: 11/11/2022] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Exploring the potential long noncoding RNA (lncRNA)-disease associations (LDAs) plays a critical role for understanding disease etiology and pathogenesis. Given the high cost of biological experiments, developing a computational method is a practical necessity to effectively accelerate experimental screening process of candidate LDAs. However, under the high sparsity of LDA dataset, many computational models hardly exploit enough knowledge to learn comprehensive patterns of node representations. Moreover, although the metapath-based GNN has been recently introduced into LDA prediction, it discards intermediate nodes along the meta-path and results in information loss. RESULTS This paper presents a new multi-view contrastive heterogeneous graph attention network (GAT) for lncRNA-disease association prediction, MCHNLDA for brevity. Specifically, MCHNLDA firstly leverages rich biological data sources of lncRNA, gene and disease to construct two-view graphs, feature structural graph of feature schema view and lncRNA-gene-disease heterogeneous graph of network topology view. Then, we design a cross-contrastive learning task to collaboratively guide graph embeddings of the two views without relying on any labels. In this way, we can pull closer the nodes of similar features and network topology, and push other nodes away. Furthermore, we propose a heterogeneous contextual GAT, where long short-term memory network is incorporated into attention mechanism to effectively capture sequential structure information along the meta-path. Extensive experimental comparisons against several state-of-the-art methods show the effectiveness of proposed framework.The code and data of proposed framework is freely available at https://github.com/zhaoxs686/MCHNLDA.
Collapse
Affiliation(s)
- Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jun Wu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Minghao Yin
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
16
|
Lin L, Chen R, Zhu Y, Xie W, Jing H, Chen L, Zou M. SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA-disease associations. Front Microbiol 2023; 13:1093615. [PMID: 36713213 PMCID: PMC9874942 DOI: 10.3389/fmicb.2022.1093615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 11/30/2022] [Indexed: 01/13/2023] Open
Abstract
Accumulating evidence has demonstrated various associations of long non-coding RNAs (lncRNAs) with human diseases, such as abnormal expression due to microbial influences that cause disease. Gaining a deeper understanding of lncRNA-disease associations is essential for disease diagnosis, treatment, and prevention. In recent years, many matrix decomposition methods have also been used to predict potential lncRNA-disease associations. However, these methods do not consider the use of microbe-disease association information to enrich disease similarity, and also do not make more use of similarity information in the decomposition process. To address these issues, we here propose a correction-based similarity-constrained probability matrix decomposition method (SCCPMD) to predict lncRNA-disease associations. The microbe-disease associations are first used to enrich the disease semantic similarity matrix, and then the logistic function is used to correct the lncRNA and disease similarity matrix, and then these two corrected similarity matrices are added to the probability matrix decomposition as constraints to finally predict the potential lncRNA-disease associations. The experimental results show that SCCPMD outperforms the five advanced comparison algorithms. In addition, SCCPMD demonstrated excellent prediction performance in a case study for breast cancer, lung cancer, and renal cell carcinoma, with prediction accuracy reaching 80, 100, and 100%, respectively. Therefore, SCCPMD shows excellent predictive performance in identifying unknown lncRNA-disease associations.
Collapse
Affiliation(s)
- Lieqing Lin
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
| | - Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Yinting Zhu
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Weijie Xie
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Huaiguo Jing
- Sports Department, Guangdong University of Technology, Guangzhou, China,*Correspondence: Huaiguo Jing,
| | - Langcheng Chen
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China,Langcheng Chen,
| | - Minqing Zou
- Department of Experiment Teaching, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
17
|
Wang MN, Xie XJ, You ZH, Ding DW, Wong L. A weighted non-negative matrix factorization approach to predict potential associations between drug and disease. J Transl Med 2022; 20:552. [PMID: 36463215 PMCID: PMC9719187 DOI: 10.1186/s12967-022-03757-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 11/06/2022] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Associations of drugs with diseases provide important information for expediting drug development. Due to the number of known drug-disease associations is still insufficient, and considering that inferring associations between them through traditional in vitro experiments is time-consuming and costly. Therefore, more accurate and reliable computational methods urgent need to be developed to predict potential associations of drugs with diseases. METHODS In this study, we present the model called weighted graph regularized collaborative non-negative matrix factorization for drug-disease association prediction (WNMFDDA). More specifically, we first calculated the drug similarity and disease similarity based on the chemical structures of drugs and medical description information of diseases, respectively. Then, to extend the model to work for new drugs and diseases, weighted [Formula: see text] nearest neighbor was used as a preprocessing step to reconstruct the interaction score profiles of drugs with diseases. Finally, a graph regularized non-negative matrix factorization model was used to identify potential associations between drug and disease. RESULTS During the cross-validation process, WNMFDDA achieved the AUC values of 0.939 and 0.952 on Fdataset and Cdataset under ten-fold cross validation, respectively, which outperforms other competing prediction methods. Moreover, case studies for several drugs and diseases were carried out to further verify the predictive performance of WNMFDDA. As a result, 13(Doxorubicin), 13(Amiodarone), 12(Obesity) and 12(Asthma) of the top 15 corresponding candidate diseases or drugs were confirmed by existing databases. CONCLUSIONS The experimental results adequately demonstrated that WNMFDDA is a very effective method for drug-disease association prediction. We believe that WNMFDDA is helpful for relevant biomedical researchers in follow-up studies.
Collapse
Affiliation(s)
- Mei-Neng Wang
- grid.449868.f0000 0000 9798 3808School of Mathematics and Computer Science, Yichun University, Yichun, 336000 Jiangxi China
| | - Xue-Jun Xie
- grid.449868.f0000 0000 9798 3808School of Mathematics and Computer Science, Yichun University, Yichun, 336000 Jiangxi China
| | - Zhu-Hong You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - De-Wu Ding
- grid.449868.f0000 0000 9798 3808School of Mathematics and Computer Science, Yichun University, Yichun, 336000 Jiangxi China
| | - Leon Wong
- grid.9227.e0000000119573309Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011 China ,grid.410726.60000 0004 1797 8419University of Chinese Academy of Sciences, Beijing, 100049 China
| |
Collapse
|
18
|
Zheng K, Zhang XL, Wang L, You ZH, Zhan ZH, Li HY. Line graph attention networks for predicting disease-associated Piwi-interacting RNAs. Brief Bioinform 2022; 23:6748487. [PMID: 36198846 DOI: 10.1093/bib/bbac393] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/08/2022] [Accepted: 08/12/2022] [Indexed: 12/14/2022] Open
Abstract
PIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlate with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. Line graphs can extend many heterogeneous networks to replace dichotomous networks. In this study, we present a new graph neural network framework, line graph attention networks (LGAT). And we apply it to predict PiRNA disease association (GAPDA). In the experiment, GAPDA performs excellently in 5-fold cross-validation with an AUC of 0.9038. Not only that, it still has superior performance compared with methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical research.
Collapse
Affiliation(s)
- Kai Zheng
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | | | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhao-Hui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Hao-Yuan Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
19
|
Wang MN, Lei LL, He W, Ding DW. SPCMLMI: A structural perturbation-based matrix completion method to predict lncRNA–miRNA interactions. Front Genet 2022; 13:1032428. [DOI: 10.3389/fgene.2022.1032428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 10/28/2022] [Indexed: 11/17/2022] Open
Abstract
Accumulating evidence indicated that the interaction between lncRNA and miRNA is crucial for gene regulation, which can regulate gene transcription, further affecting the occurrence and development of many complex diseases. Accurate identification of interactions between lncRNAs and miRNAs is helpful for the diagnosis and therapeutics of complex diseases. However, the number of known interactions of lncRNA with miRNA is still very limited, and identifying their interactions through biological experiments is time-consuming and expensive. There is an urgent need to develop more accurate and efficient computational methods to infer lncRNA–miRNA interactions. In this work, we developed a matrix completion approach based on structural perturbation to infer lncRNA–miRNA interactions (SPCMLMI). Specifically, we first calculated the similarities of lncRNA and miRNA, including the lncRNA expression profile similarity, miRNA expression profile similarity, lncRNA sequence similarity, and miRNA sequence similarity. Second, a bilayer network was constructed by integrating the known interaction network, lncRNA similarity network, and miRNA similarity network. Finally, a structural perturbation-based matrix completion method was used to predict potential interactions of lncRNA with miRNA. To evaluate the prediction performance of SPCMLMI, five-fold cross validation and a series of comparison experiments were implemented. SPCMLMI achieved AUCs of 0.8984 and 0.9891 on two different datasets, which is superior to other compared methods. Case studies for lncRNA XIST and miRNA hsa-mir-195–5-p further confirmed the effectiveness of our method in inferring lncRNA–miRNA interactions. Furthermore, we found that the structural consistency of the bilayer network was higher than that of other related networks. The results suggest that SPCMLMI can be used as a useful tool to predict interactions between lncRNAs and miRNAs.
Collapse
|
20
|
Ma J, Zhang L, Li S, Liu H. BRPCA: Bounded Robust Principal Component Analysis to Incorporate Similarity Network for N7-Methylguanosine(m 7G) Site-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3295-3306. [PMID: 34469307 DOI: 10.1109/tcbb.2021.3109055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent studies have revealed that N7-methylguanosine(m7G) plays a pivotal role in various biological processes and disease pathogenesis. To date, transcriptome-wide m7G modification sites have been identified by high-throughput sequencing approaches, and some related information has been recorded in a few biological databases. However, the mechanism of site action in disease remains uncharted. Wet experiments can help identify true m7G sites with high confidence, but it is time-consuming to find the true ones in such a large number of sites, which will also cost too much. Thus, computational methods are emergently needed to predict the associations between m7G sites and various diseases, thus help to uncover potential active sites for specific diseases. In this article, we proposed a bounded robust principal component analysis (BRPCA) method to predict unknown m7G-disease association based on similarity information. Importantly, BRPCA tolerates the noise and redundancy existing in association and similarity information. Moreover, a suitable bounded constraint is incorporated into BRPCA to ensure that the predicted association scores locate in a meaningful interval. The extensive experiments demonstrate the superiority and robustness of the BRPCA.
Collapse
|
21
|
Duan T, Kuang Z, Deng L. SVMMDR: Prediction of miRNAs-drug resistance using support vector machines based on heterogeneous network. Front Oncol 2022; 12:987609. [PMID: 36338674 PMCID: PMC9632662 DOI: 10.3389/fonc.2022.987609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 09/14/2022] [Indexed: 11/21/2022] Open
Abstract
In recent years, the miRNA is considered as a potential high-value therapeutic target because of its complex and delicate mechanism of gene regulation. The abnormal expression of miRNA can cause drug resistance, affecting the therapeutic effect of the disease. Revealing the associations between miRNAs-drug resistance can help in the design of effective drugs or possible drug combinations. However, current conventional experiments for identification of miRNAs-drug resistance are time-consuming and high-cost. Therefore, it’s of pretty realistic value to develop an accurate and efficient computational method to predicting miRNAs-drug resistance. In this paper, a method based on the Support Vector Machines (SVM) to predict the association between MiRNA and Drug Resistance (SVMMDR) is proposed. The SVMMDR integrates miRNAs-drug resistance association, miRNAs sequence similarity, drug chemical structure similarity and other similarities, extracts path-based Hetesim features, and obtains inclined diffusion feature through restart random walk. By combining the multiple feature, the prediction score between miRNAs and drug resistance is obtained based on the SVM. The innovation of the SVMMDR is that the inclined diffusion feature is obtained by inclined restart random walk, the node information and path information in heterogeneous network are integrated, and the SVM is used to predict potential miRNAs-drug resistance associations. The average AUC of SVMMDR obtained is 0.978 in 10-fold cross-validation.
Collapse
|
22
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
23
|
Ma Y, Liu Q. Generalized matrix factorization based on weighted hypergraph learning for microbe-drug association prediction. Comput Biol Med 2022; 145:105503. [DOI: 10.1016/j.compbiomed.2022.105503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/28/2022] [Accepted: 04/04/2022] [Indexed: 11/03/2022]
|
24
|
Peng L, Yang C, Huang L, Chen X, Fu X, Liu W. RNMFLP: Predicting circRNA-disease associations based on robust nonnegative matrix factorization and label propagation. Brief Bioinform 2022; 23:6582881. [PMID: 35534179 DOI: 10.1093/bib/bbac155] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Revised: 03/09/2022] [Accepted: 04/06/2022] [Indexed: 12/22/2022] Open
Abstract
Circular RNAs (circRNAs) are a class of structurally stable endogenous noncoding RNA molecules. Increasing studies indicate that circRNAs play vital roles in human diseases. However, validating disease-related circRNAs in vivo is costly and time-consuming. A reliable and effective computational method to identify circRNA-disease associations deserves further studies. In this study, we propose a computational method called RNMFLP that combines robust nonnegative matrix factorization (RNMF) and label propagation algorithm (LP) to predict circRNA-disease associations. First, to reduce the impact of false negative data, the original circRNA-disease adjacency matrix is updated by matrix multiplication using the integrated circRNA similarity and the disease similarity information. Subsequently, the RNMF algorithm is used to obtain the restricted latent space to capture potential circRNA-disease pairs from the association matrix. Finally, the LP algorithm is utilized to predict more accurate circRNA-disease associations from the integrated circRNA similarity network and integrated disease similarity network, respectively. Fivefold cross-validation of four datasets shows that RNMFLP is superior to the state-of-the-art methods. In addition, case studies on lung cancer, hepatocellular carcinoma and colorectal cancer further demonstrate the reliability of our method to discover disease-related circRNAs.
Collapse
Affiliation(s)
- Li Peng
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China.,Hunan Key Laboratory for Service computing and Novel Software Technology
| | - Cheng Yang
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
| | - Li Huang
- Academy of Arts and Design, Tsinghua University, 10084, Beijing, China.,The Future Laboratory, Tsinghua University, 10084, Beijing, China
| | - Xiang Chen
- School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
| | - Xiangzheng Fu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Wei Liu
- College of Information Engineering, Xiangtan University, Xiangtan, 411105, Hunan, China
| |
Collapse
|
25
|
Deep Link-Prediction Based on the Local Structure of Bipartite Networks. ENTROPY 2022; 24:e24050610. [PMID: 35626496 PMCID: PMC9140406 DOI: 10.3390/e24050610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/21/2022] [Accepted: 04/26/2022] [Indexed: 01/04/2023]
Abstract
Link prediction based on bipartite networks can not only mine hidden relationships between different types of nodes, but also reveal the inherent law of network evolution. Existing bipartite network link prediction is mainly based on the global structure that cannot analyze the role of the local structure in link prediction. To tackle this problem, this paper proposes a deep link-prediction (DLP) method by leveraging the local structure of bipartite networks. The method first extracts the local structure between target nodes and observes structural information between nodes from a local perspective. Then, representation learning of the local structure is performed on the basis of the graph neural network to extract latent features between target nodes. Lastly, a deep-link prediction model is trained on the basis of latent features between target nodes to achieve link prediction. Experimental results on five datasets showed that DLP achieved significant improvement over existing state-of-the-art link prediction methods. In addition, this paper analyzes the relationship between local structure and link prediction, confirming the effectiveness of a local structure in link prediction.
Collapse
|
26
|
Zhang HY, Wang L, You ZH, Hu L, Zhao BW, Li ZW, Li YM. iGRLCDA: identifying circRNA-disease association based on graph representation learning. Brief Bioinform 2022; 23:6552271. [PMID: 35323894 DOI: 10.1093/bib/bbac083] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 02/16/2022] [Accepted: 02/17/2022] [Indexed: 12/18/2022] Open
Abstract
While the technologies of ribonucleic acid-sequence (RNA-seq) and transcript assembly analysis have continued to improve, a novel topology of RNA transcript was uncovered in the last decade and is called circular RNA (circRNA). Recently, researchers have revealed that they compete with messenger RNA (mRNA) and long noncoding for combining with microRNA in gene regulation. Therefore, circRNA was assumed to be associated with complex disease and discovering the relationship between them would contribute to medical research. However, the work of identifying the association between circRNA and disease in vitro takes a long time and usually without direction. During these years, more and more associations were verified by experiments. Hence, we proposed a computational method named identifying circRNA-disease association based on graph representation learning (iGRLCDA) for the prediction of the potential association of circRNA and disease, which utilized a deep learning model of graph convolution network (GCN) and graph factorization (GF). In detail, iGRLCDA first derived the hidden feature of known associations between circRNA and disease using the Gaussian interaction profile (GIP) kernel combined with disease semantic information to form a numeric descriptor. After that, it further used the deep learning model of GCN and GF to extract hidden features from the descriptor. Finally, the random forest classifier is introduced to identify the potential circRNA-disease association. The five-fold cross-validation of iGRLCDA shows strong competitiveness in comparison with other excellent prediction models at the gold standard data and achieved an average area under the receiver operating characteristic curve of 0.9289 and an area under the precision-recall curve of 0.9377. On reviewing the prediction results from the relevant literature, 22 of the top 30 predicted circRNA-disease associations were noted in recent published papers. These exceptional results make us believe that iGRLCDA can provide reliable circRNA-disease associations for medical research and reduce the blindness of wet-lab experiments.
Collapse
Affiliation(s)
- Han-Yuan Zhang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lei Wang
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China.,College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Bo-Wei Zhao
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China
| | - Zheng-Wei Li
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Yang-Ming Li
- College of Engineering Technology, Rochester Institute of Technology, Rochester, NY 14623, USA
| |
Collapse
|
27
|
Ma Y. DeepMNE: Deep Multi-network Embedding for lncRNA-Disease Association prediction. IEEE J Biomed Health Inform 2022; 26:3539-3549. [PMID: 35180094 DOI: 10.1109/jbhi.2022.3152619] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Long non-coding RNA (lncRNA) participates in various biological processes, hence its mutations and disorders play an important role in the pathogenesis of multiple human diseases. Identifying disease-related lncRNAs is crucial for the diagnosis, prevention, and treatment of diseases. Although a large number of computational approaches have been developed, effectively integrating multi-omics data and accurately predicting potential lncRNA-disease associations remains a challenge, especially regarding new lncRNAs and new diseases. In this work, we propose a new method with deep multi-network embedding, called DeepMNE, to discover potential lncRNA disease associations, especially for novel diseases and lncRNAs. DeepMNE extracts multi-omics data to describe diseases and lncRNAs, and proposes a network fusion method based on deep learning to integrate multi-source information. Moreover, DeepMNE complements the sparse association network and uses kernel neighborhood similarity to construct disease similarity and lncRNA similarity networks. Furthermore, A graph embedding method is adopted to predict potential associations. Experimental results demonstrate that compared to other state-of-the-art methods, DeepMNE has a higher predictive performance on new associations, new lncRNAs and new diseases. Besides, DeepMNE also elicits a considerable predictive performance on perturbed datasets. Additionally, the results of two different types of case studies indicate that DeepMNE can be used as an effective tool for disease-related lncRNA prediction. The code of DeepMNE is freely available at https://github.com/Mayingjun20179/ DeepMNE.
Collapse
|
28
|
Sheng N, Huang L, Wang Y, Zhao J, Xuan P, Gao L, Cao Y. Multi-channel graph attention autoencoders for disease-related lncRNAs prediction. Brief Bioinform 2022; 23:6519791. [PMID: 35108355 DOI: 10.1093/bib/bbab604] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Predicting disease-related long non-coding RNAs (lncRNAs) can be used as the biomarkers for disease diagnosis and treatment. The development of effective computational prediction approaches to predict lncRNA-disease associations (LDAs) can provide insights into the pathogenesis of complex human diseases and reduce experimental costs. However, few of the existing methods use microRNA (miRNA) information and consider the complex relationship between inter-graph and intra-graph in complex-graph for assisting prediction. RESULTS In this paper, the relationships between the same types of nodes and different types of nodes in complex-graph are introduced. We propose a multi-channel graph attention autoencoder model to predict LDAs, called MGATE. First, an lncRNA-miRNA-disease complex-graph is established based on the similarity and correlation among lncRNA, miRNA and diseases to integrate the complex association among them. Secondly, in order to fully extract the comprehensive information of the nodes, we use graph autoencoder networks to learn multiple representations from complex-graph, inter-graph and intra-graph. Thirdly, a graph-level attention mechanism integration module is adopted to adaptively merge the three representations, and a combined training strategy is performed to optimize the whole model to ensure the complementary and consistency among the multi-graph embedding representations. Finally, multiple classifiers are explored, and Random Forest is used to predict the association score between lncRNA and disease. Experimental results on the public dataset show that the area under receiver operating characteristic curve and area under precision-recall curve of MGATE are 0.964 and 0.413, respectively. MGATE performance significantly outperformed seven state-of-the-art methods. Furthermore, the case studies of three cancers further demonstrate the ability of MGATE to identify potential disease-correlated candidate lncRNAs. The source code and supplementary data are available at https://github.com/sheng-n/MGATE. CONTACT huanglan@jlu.edu.cn, wy6868@jlu.edu.cn.
Collapse
Affiliation(s)
- Nan Sheng
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Lan Huang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yan Wang
- Key laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.,School of Artificial Intelligence, Jilin University, Changchun 130012, China
| | - Jing Zhao
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus OH 43210, USA
| | - Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Ling Gao
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Yangkun Cao
- School of Artificial Intelligence, Jilin University, Changchun 130012, China
| |
Collapse
|
29
|
Ma Y, Ma Y. Hypergraph-based logistic matrix factorization for metabolite-disease interaction prediction. Bioinformatics 2022; 38:435-443. [PMID: 34499104 DOI: 10.1093/bioinformatics/btab652] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/08/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Function-related metabolites, the terminal products of the cell regulation, show a close association with complex diseases. The identification of disease-related metabolites is critical to the diagnosis, prevention and treatment of diseases. However, most existing computational approaches build networks by calculating pairwise relationships, which is inappropriate for mining higher-order relationships. RESULTS In this study, we presented a novel approach with hypergraph-based logistic matrix factorization, HGLMF, to predict the potential interactions between metabolites and disease. First, the molecular structures and gene associations of metabolites and the hierarchical structures and GO functional annotations of diseases were extracted to build various similarity measures of metabolites and diseases. Next, the kernel neighborhood similarity of metabolites (or diseases) was calculated according to the completed interactive network. Second, multiple networks of metabolites and diseases were fused, respectively, and the hypergraph structures of metabolites and diseases were built. Finally, a logistic matrix factorization based on hypergraph was proposed to predict potential metabolite-disease interactions. In computational experiments, HGLMF accurately predicted the metabolite-disease interaction, and performed better than other state-of-the-art methods. Moreover, HGLMF could be used to predict new metabolites (or diseases). As suggested from the case studies, the proposed method could discover novel disease-related metabolites, which has been confirmed in existing studies. AVAILABILITY AND IMPLEMENTATION The codes and dataset are available at: https://github.com/Mayingjun20179/HGLMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Applied Mathematics, Xiamen University of Technology, Xiamen 361024, China
| | - Yuanyuan Ma
- School of Computer & Information Engineering, Anyang Normal University, Anyang 455000, China
| |
Collapse
|
30
|
Duan T, Kuang Z, Wang J, Ma Z. GBDTLRL2D Predicts LncRNA-Disease Associations Using MetaGraph2Vec and K-Means Based on Heterogeneous Network. Front Cell Dev Biol 2021; 9:753027. [PMID: 34977011 PMCID: PMC8718797 DOI: 10.3389/fcell.2021.753027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 11/22/2021] [Indexed: 12/16/2022] Open
Abstract
In recent years, the long noncoding RNA (lncRNA) has been shown to be involved in many disease processes. The prediction of the lncRNA-disease association is helpful to clarify the mechanism of disease occurrence and bring some new methods of disease prevention and treatment. The current methods for predicting the potential lncRNA-disease association seldom consider the heterogeneous networks with complex node paths, and these methods have the problem of unbalanced positive and negative samples. To solve this problem, a method based on the Gradient Boosting Decision Tree (GBDT) and logistic regression (LR) to predict the lncRNA-disease association (GBDTLRL2D) is proposed in this paper. MetaGraph2Vec is used for feature learning, and negative sample sets are selected by using K-means clustering. The innovation of the GBDTLRL2D is that the clustering algorithm is used to select a representative negative sample set, and the use of MetaGraph2Vec can better retain the semantic and structural features in heterogeneous networks. The average area under the receiver operating characteristic curve (AUC) values of GBDTLRL2D obtained on the three datasets are 0.98, 0.98, and 0.96 in 10-fold cross-validation.
Collapse
Affiliation(s)
| | - Zhufang Kuang
- School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha, China
| | | | | |
Collapse
|
31
|
Liu Y, Li D, Wan S, Wang F, Dou W, Xu X, Li S, Ma R, Qi L. A long short‐term memory‐based model for greenhouse climate prediction. INT J INTELL SYST 2021. [DOI: 10.1002/int.22620] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- Yuwen Liu
- School of Computer Science Qufu Normal University Rizhao China
| | - Dejuan Li
- Weifang Key Laboratory of Blockchain on Agricultural Vegetables Weifang University of Science and Technology Shouguang China
| | - Shaohua Wan
- School of Information and Safety Engineering Zhongnan University of Economics and Law Wuhan China
| | - Fan Wang
- School of Computer Science Qufu Normal University Rizhao China
| | - Wanchun Dou
- State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology Nanjing University Nanjing China
| | - Xiaolong Xu
- School of Computer and Software, Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology Nanjing University of Information Science and Technology Nanjing China
| | - Shancang Li
- Department of Computer Science and Creative Technologies University of the West of England Bristol UK
| | - Rui Ma
- General Education Department Shandong First Medical University (Shandong Academy of Medical Sciences) Taian China
| | - Lianyong Qi
- School of Computer Science Qufu Normal University Rizhao China
| |
Collapse
|
32
|
Wang L, Yan X, You ZH, Zhou X, Li HY, Huang YA. SGANRDA: semi-supervised generative adversarial networks for predicting circRNA-disease associations. Brief Bioinform 2021; 22:6175330. [PMID: 33734296 DOI: 10.1093/bib/bbab028] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 12/31/2022] Open
Abstract
Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA-disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA-disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA-disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA-disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA-disease association and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Lei Wang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China
| | - Hao-Yuan Li
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China
| | - Yu-An Huang
- Department of Computing, Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
33
|
Ding Y, Lei X, Liao B, Wu FX. Machine learning approaches for predicting biomolecule-disease associations. Brief Funct Genomics 2021; 20:273-287. [PMID: 33554238 DOI: 10.1093/bfgp/elab002] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Biomolecules, such as microRNAs, circRNAs, lncRNAs and genes, are functionally interdependent in human cells, and all play critical roles in diverse fundamental and vital biological processes. The dysregulations of such biomolecules can cause diseases. Identifying the associations between biomolecules and diseases can uncover the mechanisms of complex diseases, which is conducive to their diagnosis, treatment, prognosis and prevention. Due to the time consumption and cost of biologically experimental methods, many computational association prediction methods have been proposed in the past few years. In this study, we provide a comprehensive review of machine learning-based approaches for predicting disease-biomolecule associations with multi-view data sources. Firstly, we introduce some databases and general strategies for integrating multi-view data sources in the prediction models. Then we discuss several feature representation methods for machine learning-based prediction models. Thirdly, we comprehensively review machine learning-based prediction approaches in three categories: basic machine learning methods, matrix completion-based methods and deep learning-based methods, while discussing their advantages and disadvantages. Finally, we provide some perspectives for further improving biomolecule-disease prediction methods.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering at the University of Saskatchewan
| | - Xiujuan Lei
- School of Computer Science at Shaanxi Normal University
| | - Bo Liao
- School of Mathematics and Statistics at Hainan Normal University, Haikou, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan
| |
Collapse
|
34
|
Jia LN, Yan X, You ZH, Zhou X, Li LP, Wang L, Song KJ. NLPEI: A Novel Self-Interacting Protein Prediction Model Based on Natural Language Processing and Evolutionary Information. Evol Bioinform Online 2020; 16:1176934320984171. [PMID: 33488064 PMCID: PMC7768313 DOI: 10.1177/1176934320984171] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 12/01/2020] [Indexed: 12/13/2022] Open
Abstract
The study of protein self-interactions (SIPs) can not only reveal the function of proteins at the molecular level, but is also crucial to understand activities such as growth, development, differentiation, and apoptosis, providing an important theoretical basis for exploring the mechanism of major diseases. With the rapid advances in biotechnology, a large number of SIPs have been discovered. However, due to the long period and high cost inherent to biological experiments, the gap between the identification of SIPs and the accumulation of data is growing. Therefore, fast and accurate computational methods are needed to effectively predict SIPs. In this study, we designed a new method, NLPEI, for predicting SIPs based on natural language understanding theory and evolutionary information. Specifically, we first understand the protein sequence as natural language and use natural language processing algorithms to extract its features. Then, we use the Position-Specific Scoring Matrix (PSSM) to represent the evolutionary information of the protein and extract its features through the Stacked Auto-Encoder (SAE) algorithm of deep learning. Finally, we fuse the natural language features of proteins with evolutionary features and make accurate predictions by Extreme Learning Machine (ELM) classifier. In the SIPs gold standard data sets of human and yeast, NLPEI achieved 94.19% and 91.29% prediction accuracy. Compared with different classifier models, different feature models, and other existing methods, NLPEI obtained the best results. These experimental results indicated that NLPEI is an effective tool for predicting SIPs and can provide reliable candidates for biological experiments.
Collapse
Affiliation(s)
- Li-Na Jia
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
| | - Xin Yan
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
- School of Foreign Languages, Zaozhuang University, Zaozhuang, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Xi Zhou
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Li-Ping Li
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
- Lei Wang, Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.
| | - Ke-Jian Song
- School of information engineering, Jiangxi University of Science and Technology, Ganzhou, China
| |
Collapse
|