1
|
Ma Y, Ma Y. Kernel Bayesian logistic tensor decomposition with automatic rank determination for predicting multiple types of miRNA-disease associations. PLoS Comput Biol 2024; 20:e1012287. [PMID: 38976761 PMCID: PMC11257412 DOI: 10.1371/journal.pcbi.1012287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Revised: 07/18/2024] [Accepted: 06/27/2024] [Indexed: 07/10/2024] Open
Abstract
Identifying the association and corresponding types of miRNAs and diseases is crucial for studying the molecular mechanisms of disease-related miRNAs. Compared to traditional biological experiments, computational models can not only save time and reduce costs, but also discover potential associations on a large scale. Although some computational models based on tensor decomposition have been proposed, these models usually require manual specification of numerous hyperparameters, leading to a decrease in computational efficiency and generalization ability. Additionally, these linear models struggle to analyze complex, higher-order nonlinear relationships. Based on this, we propose a novel framework, KBLTDARD, to identify potential multiple types of miRNA-disease associations. Firstly, KBLTDARD extracts information from biological networks and high-order association network, and then fuses them to obtain more precise similarities of miRNAs (diseases). Secondly, we combine logistic tensor decomposition and Bayesian methods to achieve automatic hyperparameter search by introducing sparse-induced priors of multiple latent variables, and incorporate auxiliary information to improve prediction capabilities. Finally, an efficient deterministic Bayesian inference algorithm is developed to ensure computational efficiency. Experimental results on two benchmark datasets show that KBLTDARD has better Top-1 precision, Top-1 recall, and Top-1 F1 for new type predictions, and higher AUPR, AUC, and F1 values for new triplet predictions, compared to other state-of-the-art methods. Furthermore, case studies demonstrate the efficiency of KBLTDARD in predicting multiple types of miRNA-disease associations.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China
| |
Collapse
|
2
|
Ma Y, Zhao Y, Ma Y. Kernel Bayesian nonlinear matrix factorization based on variational inference for human-virus protein-protein interaction prediction. Sci Rep 2024; 14:5693. [PMID: 38454139 PMCID: PMC10920681 DOI: 10.1038/s41598-024-56208-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 03/04/2024] [Indexed: 03/09/2024] Open
Abstract
Identification of potential human-virus protein-protein interactions (PPIs) contributes to the understanding of the mechanisms of viral infection and to the development of antiviral drugs. Existing computational models often have more hyperparameters that need to be adjusted manually, which limits their computational efficiency and generalization ability. Based on this, this study proposes a kernel Bayesian logistic matrix decomposition model with automatic rank determination, VKBNMF, for the prediction of human-virus PPIs. VKBNMF introduces auxiliary information into the logistic matrix decomposition and sets the prior probabilities of the latent variables to build a Bayesian framework for automatic parameter search. In addition, we construct the variational inference framework of VKBNMF to ensure the solution efficiency. The experimental results show that for the scenarios of paired PPIs, VKBNMF achieves an average AUPR of 0.9101, 0.9316, 0.8727, and 0.9517 on the four benchmark datasets, respectively, and for the scenarios of new human (viral) proteins, VKBNMF still achieves a higher hit rate. The case study also further demonstrated that VKBNMF can be used as an effective tool for the prediction of human-virus PPIs.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen, China
| | - Yongbiao Zhao
- School of Computer, Central China Normal University, Wuhan, China
| | - Yuanyuan Ma
- School of Computer Engineering, Hubei University of Arts and Science, Xiangyang, China.
- Hubei Key Laboratory of Power System Design and Test for Electrical Vehicle, Hubei University of Arts and Science, Xiangyang, China.
| |
Collapse
|
3
|
Ma Y, Zhong J, Zhu N. Weighted hypergraph learning and adaptive inductive matrix completion for SARS-CoV-2 drug repositioning. Methods 2023; 219:102-110. [PMID: 37804962 DOI: 10.1016/j.ymeth.2023.10.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 09/14/2023] [Accepted: 10/03/2023] [Indexed: 10/09/2023] Open
Abstract
MOTIVATION The outbreak of the human coronavirus (SARS-CoV-2) has placed a huge burden on public health and the world economy. Compared with de novo drug discovery, drug repurposing is a promising therapeutic strategy that facilitates rapid clinical treatment decisions, shortens the development process, and reduces costs. RESULTS In this study, we propose a weighted hypergraph learning and adaptive inductive matrix completion method, WHAIMC, for predicting potential virus-drug associations. Firstly, we integrate multi-source data to describe viruses and drugs from multiple perspectives, including drug chemical structures, drug targets, virus complete genome sequences, and virus-drug associations. Then, WHAIMC establishes an adaptive inductive matrix completion model to improve performance through adaptive learning of similarity relations. Finally, WHAIMC introduces weighted hypergraph learning into adaptive inductive matrix completion to capture higher-order relationships of viruses (or drugs). The results showed that WHAIMC had a strong predictive performance for new virus-drug associations, new viruses, and new drugs. The case study further demonstrates that WHAIMC is highly effective for repositioning antiviral drugs against SARS-CoV-2 and provides a new perspective for virus-drug association prediction. The code and data in this study is freely available at https://github.com/Mayingjun20179/WHAIMC.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen 361024, China.
| | - Junjiang Zhong
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen 361024, China
| | - Nenghui Zhu
- School of Mathematics and Statistics, Xiamen University of Technology, Xiamen 361024, China
| |
Collapse
|
4
|
Gu C, Li X. Prediction of disease-related miRNAs by voting with multiple classifiers. BMC Bioinformatics 2023; 24:177. [PMID: 37122001 PMCID: PMC10150488 DOI: 10.1186/s12859-023-05308-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Accepted: 04/26/2023] [Indexed: 05/02/2023] Open
Abstract
There is strong evidence to support that mutations and dysregulation of miRNAs are associated with a variety of diseases, including cancer. However, the experimental methods used to identify disease-related miRNAs are expensive and time-consuming. Effective computational approaches to identify disease-related miRNAs are in high demand and would aid in the detection of lncRNA biomarkers for disease diagnosis, treatment, and prevention. In this study, we develop an ensemble learning framework to reveal the potential associations between miRNAs and diseases (ELMDA). The ELMDA framework does not rely on the known associations when calculating miRNA and disease similarities and uses multi-classifiers voting to predict disease-related miRNAs. As a result, the average AUC of the ELMDA framework was 0.9229 for the HMDD v2.0 database in a fivefold cross-validation. All potential associations in the HMDD V2.0 database were predicted, and 90% of the top 50 results were verified with the updated HMDD V3.2 database. The ELMDA framework was implemented to investigate gastric neoplasms, prostate neoplasms and colon neoplasms, and 100%, 94%, and 90%, respectively, of the top 50 potential miRNAs were validated by the HMDD V3.2 database. Moreover, the ELMDA framework can predict isolated disease-related miRNAs. In conclusion, ELMDA appears to be a reliable method to uncover disease-associated miRNAs.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, 410082, Hunan, China.
| |
Collapse
|
5
|
Li Y, Sun H, Fang W, Ma Q, Han S, Wang-Sattler R, Du W, Yu Q. SURE: Screening Unlabeled Samples for Reliable Negative Samples Based on Reinforcement Learning. Inf Sci (N Y) 2023. [DOI: 10.1016/j.ins.2023.01.112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
6
|
Hyperbolic matrix factorization improves prediction of drug-target associations. Sci Rep 2023; 13:959. [PMID: 36653463 PMCID: PMC9849222 DOI: 10.1038/s41598-023-27995-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Accepted: 01/11/2023] [Indexed: 01/19/2023] Open
Abstract
Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks.
Collapse
|
7
|
Gao M, Liu S, Qi Y, Guo X, Shang X. GAE-LGA: integration of multi-omics data with graph autoencoders to identify lncRNA-PCG associations. Brief Bioinform 2022; 23:6775590. [PMID: 36305456 DOI: 10.1093/bib/bbac452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/20/2022] [Accepted: 09/22/2022] [Indexed: 12/14/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) can disrupt the biological functions of protein-coding genes (PCGs) to cause cancer. However, the relationship between lncRNAs and PCGs remains unclear and difficult to predict. Machine learning has achieved a satisfactory performance in association prediction, but to our knowledge, it is currently less used in lncRNA-PCG association prediction. Therefore, we introduce GAE-LGA, a powerful deep learning model with graph autoencoders as components, to recognize potential lncRNA-PCG associations. GAE-LGA jointly explored lncRNA-PCG learning and cross-omics correlation learning for effective lncRNA-PCG association identification. The functional similarity and multi-omics similarity of lncRNAs and PCGs were accumulated and encoded by graph autoencoders to extract feature representations of lncRNAs and PCGs, which were subsequently used for decoding to obtain candidate lncRNA-PCG pairs. Comprehensive evaluation demonstrated that GAE-LGA can successfully capture lncRNA-PCG associations with strong robustness and outperformed other machine learning-based identification methods. Furthermore, multi-omics features were shown to improve the performance of lncRNA-PCG association identification. In conclusion, GAE-LGA can act as an efficient application for lncRNA-PCG association prediction with the following advantages: It fuses multi-omics information into the similarity network, making the feature representation more accurate; it can predict lncRNA-PCG associations for new lncRNAs and identify potential lncRNA-PCG associations with high accuracy.
Collapse
Affiliation(s)
- Meihong Gao
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Shuhui Liu
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Yang Qi
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xinpeng Guo
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xuequn Shang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China
| |
Collapse
|
8
|
Ma Y, Liu Q. Generalized matrix factorization based on weighted hypergraph learning for microbe-drug association prediction. Comput Biol Med 2022; 145:105503. [DOI: 10.1016/j.compbiomed.2022.105503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/28/2022] [Accepted: 04/04/2022] [Indexed: 11/03/2022]
|
9
|
Ma Y, He T, Tan Y, Jiang X. Seq-BEL: Sequence-Based Ensemble Learning for Predicting Virus-Human Protein-Protein Interaction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1322-1333. [PMID: 32750886 DOI: 10.1109/tcbb.2020.3008157] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Infectious diseases are currently the most important and widespread health problem, and identifying viral infection mechanisms is critical for controlling diseases caused by highly infectious viruses. Because of the lack of non-interactive protein pairs and serious imbalance between positive and negative sample ratios, the supervised learning algorithm is not suitable for prediction. At the same time, due to the lack of information on viral proteins and significant dissimilarity in sequence, some ensemble learning models have poor generalization ability. In this paper, we propose a Sequence-Based Ensemble Learning (Seq-BEL) method to predict the potential virus-human PPIs. Specifically, based on the amino acid sequence of proteins and the currently known virus-human PPI network, Seq-BEL calculates various features and similarities of human proteins and viral proteins, and then combines these similarities and features to score the potential of virus-human PPIs. The computational results show that Seq-BEL achieves success in predicting potential virus-human PPIs and outperforms other state-of-the-art methods. More importantly, Seq-BEL also has good predictive performance for new human proteins and new viral proteins. In addition, the model has the advantages of strong robustness and good generalization ability, and can be used as an effective tool for virus-human PPI prediction.
Collapse
|
10
|
Peng L, Tan J, Tian X, Zhou L. EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models. Interdiscip Sci 2022; 14:209-232. [PMID: 35006529 DOI: 10.1007/s12539-021-00483-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 01/08/2023]
Abstract
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China.
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China.
| |
Collapse
|
11
|
Ma Y, Ma Y. Hypergraph-based logistic matrix factorization for metabolite-disease interaction prediction. Bioinformatics 2022; 38:435-443. [PMID: 34499104 DOI: 10.1093/bioinformatics/btab652] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/08/2021] [Accepted: 09/06/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Function-related metabolites, the terminal products of the cell regulation, show a close association with complex diseases. The identification of disease-related metabolites is critical to the diagnosis, prevention and treatment of diseases. However, most existing computational approaches build networks by calculating pairwise relationships, which is inappropriate for mining higher-order relationships. RESULTS In this study, we presented a novel approach with hypergraph-based logistic matrix factorization, HGLMF, to predict the potential interactions between metabolites and disease. First, the molecular structures and gene associations of metabolites and the hierarchical structures and GO functional annotations of diseases were extracted to build various similarity measures of metabolites and diseases. Next, the kernel neighborhood similarity of metabolites (or diseases) was calculated according to the completed interactive network. Second, multiple networks of metabolites and diseases were fused, respectively, and the hypergraph structures of metabolites and diseases were built. Finally, a logistic matrix factorization based on hypergraph was proposed to predict potential metabolite-disease interactions. In computational experiments, HGLMF accurately predicted the metabolite-disease interaction, and performed better than other state-of-the-art methods. Moreover, HGLMF could be used to predict new metabolites (or diseases). As suggested from the case studies, the proposed method could discover novel disease-related metabolites, which has been confirmed in existing studies. AVAILABILITY AND IMPLEMENTATION The codes and dataset are available at: https://github.com/Mayingjun20179/HGLMF. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Applied Mathematics, Xiamen University of Technology, Xiamen 361024, China
| | - Yuanyuan Ma
- School of Computer & Information Engineering, Anyang Normal University, Anyang 455000, China
| |
Collapse
|
12
|
Yu H, Shen ZA, Du PF. NPI-RGCNAE: Fast predicting ncRNA-protein interactions using the Relational Graph Convolutional Network Auto-Encoder. IEEE J Biomed Health Inform 2021; 26:1861-1871. [PMID: 34699377 DOI: 10.1109/jbhi.2021.3122527] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
- ncRNAs play important roles in a variety of biological processes by interacting with RNA-binding proteins. Therefore, identifying ncRNA-protein interactions is important to understanding the biological functions of ncRNAs. Since experimental methods to determine ncRNA-protein interactions are always costly and time-consuming, computational methods have been proposed as alternative approaches. We developed a novel method NPI-RGCNAE (predicting ncRNA-Protein Interactions by the Relational Graph Convolutional Network Auto-Encoder). With a reliable negative sample selection strategy, we applied the Relational Graph Convolutional Network encoder and the DistMult decoder to predict ncRNA-protein interactions in an accurate and efficient way. By using the 5-fold cross-validation, we found that our method achieved a comparable performance to all state-of-the-art methods. Our method requires less than 10% training time of all state-of-the-art methods. It is a more efficient choice with large datasets in practice. All datasets and source codes of NPI-RGCNAE have been deposited in a public Github repository (https://github.com/Angelia0hh/NPI-RGCNAE).
Collapse
|
13
|
Zheng X, Gao Y, Yu C, Fan G, Li P, Zhang M, Yu J, Xu M. Identification of immune-related subtypes of colorectal cancer to improve antitumor immunotherapy. Sci Rep 2021; 11:19432. [PMID: 34593914 PMCID: PMC8484460 DOI: 10.1038/s41598-021-98966-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 09/15/2021] [Indexed: 02/07/2023] Open
Abstract
Immunotherapy involving immune checkpoint inhibitors (ICIs) for enhancing immune system activation is promising for tumor management. However, the patients' responses to ICIs are different. Here, we applied a non-negative matrix factorization algorithm to establish a robust immune molecular classification system for colorectal cancer (CRC). We obtained data of 1503 CRC patients (training cohort: 488 from The Cancer Genome Atlas; validation cohort: 1015 from the Gene Expression Omnibus). In the training cohort, 42.8% of patients who exhibited significantly higher immunocyte infiltration and enrichment of immune response-associated signatures were subdivided into immune classes. Within the immune class, 53.1% of patients were associated with a worse overall prognosis and belonged to the immune-suppressed subclass, characterized by the activation of stroma-related signatures, genes, immune-suppressive cells, and signaling. The remaining immune class patients belonged to the immune-activated subclass, which was associated with a better prognosis and response to anti-PD-1 therapy. Immune-related subtypes were associated with different copy number alterations, tumor-infiltrating lymphocyte enrichment, PD-1/PD-L1 expression, mutation landscape, and cancer stemness. These results were validated in patients with microsatellite instable CRC. We described a novel immune-related class of CRC, which may be used for selecting candidate patients with CRC for immunotherapy and tailoring optimal immunotherapeutic treatment.
Collapse
Affiliation(s)
- Xiaobo Zheng
- Department of Liver Surgery, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Yong Gao
- Department of Gastroenterology, Second Affiliated Hospital, Army Medical University, Chongqing, 400037, China
| | - Chune Yu
- Laboratory of Tumor Targeted and Immune Therapy, State Key Laboratory of Biotherapy, Clinical Research Center for Breast, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
| | - Guiquan Fan
- Department of General Surgery, First People's Hospital of Liangshan Yi Autonomous Prefecture, Liangshan, 615000, Sichuan, China
| | - Pengwu Li
- Department of Hepatobiliary Surgery, Chongzhou People's Hospital, Chengdu, 611200, Sichuan, China
| | - Ming Zhang
- Department of Liver Surgery, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China
- Department of General Surgery, Mianzhu Hospital of West China Hospital, Sichuan University, Mianzhu, 618200, Sichuan, China
| | - Jing Yu
- Laboratory of Tumor Targeted and Immune Therapy, State Key Laboratory of Biotherapy, Clinical Research Center for Breast, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
| | - Mingqing Xu
- Department of Liver Surgery, West China Hospital, Sichuan University, Chengdu, 610041, Sichuan, China.
- Department of Hepatopancreatobiliary Surgery, Meishan City People's Hospital, Meishan Hospital of West China Hospital, Sichuan University, Meishan, 610041, Sichuan, China.
| |
Collapse
|
14
|
Yu H, Shen ZA, Zhou YK, Du PF. Recent advances in predicting protein-lncRNA interactions using machine learning methods. Curr Gene Ther 2021; 22:228-244. [PMID: 34254917 DOI: 10.2174/1566523221666210712190718] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 05/01/2021] [Accepted: 05/31/2021] [Indexed: 11/22/2022]
Abstract
Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semi-supervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.
Collapse
Affiliation(s)
- Han Yu
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Zi-Ang Shen
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Yuan-Ke Zhou
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin 300350, China
| |
Collapse
|
15
|
Philip M, Chen T, Tyagi S. A Survey of Current Resources to Study lncRNA-Protein Interactions. Noncoding RNA 2021; 7:ncrna7020033. [PMID: 34201302 PMCID: PMC8293367 DOI: 10.3390/ncrna7020033] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 05/28/2021] [Accepted: 06/07/2021] [Indexed: 12/15/2022] Open
Abstract
Phenotypes are driven by regulated gene expression, which in turn are mediated by complex interactions between diverse biological molecules. Protein-DNA interactions such as histone and transcription factor binding are well studied, along with RNA-RNA interactions in short RNA silencing of genes. In contrast, lncRNA-protein interaction (LPI) mechanisms are comparatively unknown, likely directed by the difficulties in studying LPI. However, LPI are emerging as key interactions in epigenetic mechanisms, playing a role in development and disease. Their importance is further highlighted by their conservation across kingdoms. Hence, interest in LPI research is increasing. We therefore review the current state of the art in lncRNA-protein interactions. We specifically surveyed recent computational methods and databases which researchers can exploit for LPI investigation. We discovered that algorithm development is heavily reliant on a few generic databases containing curated LPI information. Additionally, these databases house information at gene-level as opposed to transcript-level annotations. We show that early methods predict LPI using molecular docking, have limited scope and are slow, creating a data processing bottleneck. Recently, machine learning has become the strategy of choice in LPI prediction, likely due to the rapid growth in machine learning infrastructure and expertise. While many of these methods have notable limitations, machine learning is expected to be the basis of modern LPI prediction algorithms.
Collapse
Affiliation(s)
- Melcy Philip
- School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia; (M.P.); (T.C.)
| | - Tyrone Chen
- School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia; (M.P.); (T.C.)
| | - Sonika Tyagi
- School of Biological Sciences, Monash University, 25 Rainforest Walk, Clayton, VIC 3800, Australia; (M.P.); (T.C.)
- Monash eResearch Centre, Monash University, Clayton, VIC 3800, Australia
- Department of Infectious Disease, Monash University (Alfred Campus), 85 Commercial Road, Melbourne, VIC 3004, Australia
- Correspondence:
| |
Collapse
|
16
|
Zhou YK, Hu J, Shen ZA, Zhang WY, Du PF. LPI-SKF: Predicting lncRNA-Protein Interactions Using Similarity Kernel Fusions. Front Genet 2020; 11:615144. [PMID: 33362868 PMCID: PMC7758075 DOI: 10.3389/fgene.2020.615144] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Accepted: 11/16/2020] [Indexed: 01/24/2023] Open
Abstract
Long non-coding RNAs (lncRNAs) play an important role in serval biological activities, including transcription, splicing, translation, and some other cellular regulation processes. lncRNAs perform their biological functions by interacting with various proteins. The studies on lncRNA-protein interactions are of great value to the understanding of lncRNA functional mechanisms. In this paper, we proposed a novel model to predict potential lncRNA-protein interactions using the SKF (similarity kernel fusion) and LapRLS (Laplacian regularized least squares) algorithms. We named this method the LPI-SKF. Various similarities of both lncRNAs and proteins were integrated into the LPI-SKF. LPI-SKF can be applied in predicting potential interactions involving novel proteins or lncRNAs. We obtained an AUROC (area under receiver operating curve) of 0.909 in a 5-fold cross-validation, which outperforms other state-of-the-art methods. A total of 19 out of the top 20 ranked interaction predictions were verified by existing data, which implied that the LPI-SKF had great potential in discovering unknown lncRNA-protein interactions accurately. All data and codes of this work can be downloaded from a GitHub repository (https://github.com/zyk2118216069/LPI-SKF).
Collapse
Affiliation(s)
| | | | | | | | - Pu-Feng Du
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| |
Collapse
|
17
|
Ma Y, He T, Jiang X. Multi-network logistic matrix factorization for metabolite-disease interaction prediction. FEBS Lett 2020; 594:1675-1684. [PMID: 32246474 DOI: 10.1002/1873-3468.13782] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Revised: 03/03/2020] [Accepted: 03/20/2020] [Indexed: 11/11/2022]
Abstract
Identifying disease-related metabolites is of great significance for the diagnosis, prevention, and treatment of disease. In this study, we propose a novel computational model of multiple-network logistic matrix factorization (MN-LMF) for predicting metabolite-disease interactions, which is especially relevant for new diseases and new metabolites. First, MN-LMF builds disease (or metabolite) similarity network by integrating heterogeneous omics data. Second, it combines these similarities with known metabolite-disease interaction networks, using modified logistic matrix factorization to predict potential metabolite-disease interactions. Experimental results show that MN-LMF accurately predicts metabolite-disease interactions, and outperforms other state-of-the-art methods. Moreover, case studies also demonstrated the effectiveness of the model to infer unknown metabolite-disease interactions for novel diseases without any known associations.
Collapse
Affiliation(s)
- Yingjun Ma
- School of Computer, Central China Normal University, Wuhan, China.,School of Mathematics & Statistics, Central China Normal University, Wuhan, China
| | - Tingting He
- School of Computer, Central China Normal University, Wuhan, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China
| | - Xingpeng Jiang
- School of Computer, Central China Normal University, Wuhan, China.,Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, Central China Normal University, Wuhan, China
| |
Collapse
|